RE: Index Skip Scan

Started by Floris Van Neealmost 6 years ago84 messages

florisvannee@Optiver.com

almost 6 years ago

Hi all,

I reviewed the latest version of the patch. Overall some good improvements I think. Please find my feedback below.

- I think I mentioned this before - it's not that big of a deal, but it just looks weird and inconsistent to me:
create table t2 as (select a, b, c, 10 d from generate_series(1,5) a, generate_series(1,100) b, generate_series(1,10000) c); create index on t2 (a,b,c desc);

postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b>=5 and b<=5 order by a,b,c desc;
QUERY PLAN
---------------------------------------------------------------------------------
Index Only Scan using t2_a_b_c_idx on t2 (cost=0.43..216.25 rows=500 width=12)
Skip scan: true
Index Cond: ((a = 2) AND (b >= 5) AND (b <= 5))
(3 rows)

postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b=5 order by a,b,c desc;
QUERY PLAN
-----------------------------------------------------------------------------------------
Unique (cost=0.43..8361.56 rows=500 width=12)
-> Index Only Scan using t2_a_b_c_idx on t2 (cost=0.43..8361.56 rows=9807 width=12)
Index Cond: ((a = 2) AND (b = 5))
(3 rows)

When doing a distinct on (params) and having equality conditions for all params, it falls back to the regular index scan even though there's no reason not to use the skip scan here. It's much faster to write b between 5 and 5 now rather than writing b=5. I understand this was a limitation of the unique-keys patch at the moment which could be addressed in the future. I think for the sake of consistency it would make sense if this eventually gets addressed.

- nodeIndexScan.c, line 126
This sets xs_want_itup to true in all cases (even for non skip-scans). I don't think this is acceptable given the side-effects this has (page will never be unpinned in between returned tuples in _bt_drop_lock_and_maybe_pin)

- nbsearch.c, _bt_skip, line 1440
_bt_update_skip_scankeys(scan, indexRel); This function is called twice now - once in the else {} and immediately after that outside of the else. The second call can be removed I think.

- nbtsearch.c _bt_skip line 1597
LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
scan->xs_itup = (IndexTuple) PageGetItem(page, itemid);

This is an UNLOCK followed by a read of the unlocked page. That looks incorrect?

- nbtsearch.c _bt_skip line 1440
if (BTScanPosIsValid(so->currPos) &&
_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))

Is it allowed to look at the high key / low key of the page without have a read lock on it?

- nbtsearch.c line 1634
if (_bt_readpage(scan, indexdir, offnum)) ...
else
error()

Is it really guaranteed that a match can be found on the page itself? Isn't it possible that an extra index condition, not part of the scan key, makes none of the keys on the page match?

- nbtsearch.c in general
Most of the code seems to rely quite heavily on the fact that xs_want_itup forces _bt_drop_lock_and_maybe_pin to never release the buffer pin. Have you considered that compacting of a page may still happen even if you hold the pin? [1]https://postgrespro.com/list/id/1566683972147.11682@Optiver.com I've been trying to come up with cases in which this may break the patch, but I haven't able to produce such a scenario - so it may be fine. But it would be good to consider again. One thing I was thinking of was a scenario where page splits and/or compacting would happen in between returning tuples. Could this break the _bt_scankey_within_page check such that it thinks the scan key is within the current page, while it actually isn't? Mainly for backward and/or cursor scans. Forward scans shouldn't be a problem I think. Apologies for being a bit vague as I don't have a clear example ready when it would go wrong. It may well be fine, but it was one of the things on my mind.

[1]: https://postgrespro.com/list/id/1566683972147.11682@Optiver.com

-Floris

Jesper Pedersen

jesper.pedersen@redhat.com

almost 6 years ago

In reply to: Floris Van Nee (#1)

2 attachment(s)

Re: Index Skip Scan

Hi Floris,

On 1/15/20 8:33 AM, Floris Van Nee wrote:

I reviewed the latest version of the patch. Overall some good improvements I think. Please find my feedback below.

Thanks for your review !

- I think I mentioned this before - it's not that big of a deal, but it just looks weird and inconsistent to me:
create table t2 as (select a, b, c, 10 d from generate_series(1,5) a, generate_series(1,100) b, generate_series(1,10000) c); create index on t2 (a,b,c desc);

postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b>=5 and b<=5 order by a,b,c desc;
QUERY PLAN
---------------------------------------------------------------------------------
Index Only Scan using t2_a_b_c_idx on t2 (cost=0.43..216.25 rows=500 width=12)
Skip scan: true
Index Cond: ((a = 2) AND (b >= 5) AND (b <= 5))
(3 rows)

postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b=5 order by a,b,c desc;
QUERY PLAN
-----------------------------------------------------------------------------------------
Unique (cost=0.43..8361.56 rows=500 width=12)
-> Index Only Scan using t2_a_b_c_idx on t2 (cost=0.43..8361.56 rows=9807 width=12)
Index Cond: ((a = 2) AND (b = 5))
(3 rows)

When doing a distinct on (params) and having equality conditions for all params, it falls back to the regular index scan even though there's no reason not to use the skip scan here. It's much faster to write b between 5 and 5 now rather than writing b=5. I understand this was a limitation of the unique-keys patch at the moment which could be addressed in the future. I think for the sake of consistency it would make sense if this eventually gets addressed.

Agreed, that it is an improvement that should be made. I would like
David's view on this since it relates to the UniqueKey patch.

- nodeIndexScan.c, line 126
This sets xs_want_itup to true in all cases (even for non skip-scans). I don't think this is acceptable given the side-effects this has (page will never be unpinned in between returned tuples in _bt_drop_lock_and_maybe_pin)

Correct - fixed.

- nbsearch.c, _bt_skip, line 1440
_bt_update_skip_scankeys(scan, indexRel); This function is called twice now - once in the else {} and immediately after that outside of the else. The second call can be removed I think.

Yes, removed the "else" call site.

- nbtsearch.c _bt_skip line 1597
LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
scan->xs_itup = (IndexTuple) PageGetItem(page, itemid);

This is an UNLOCK followed by a read of the unlocked page. That looks incorrect?

Yes, that needed to be changed.

- nbtsearch.c _bt_skip line 1440
if (BTScanPosIsValid(so->currPos) &&
_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))

Is it allowed to look at the high key / low key of the page without have a read lock on it?

In case of a split the page will still contain a high key and a low key,
so this should be ok.

- nbtsearch.c line 1634
if (_bt_readpage(scan, indexdir, offnum)) ...
else
error()

Is it really guaranteed that a match can be found on the page itself? Isn't it possible that an extra index condition, not part of the scan key, makes none of the keys on the page match?

The logic for this has been changed.

- nbtsearch.c in general
Most of the code seems to rely quite heavily on the fact that xs_want_itup forces _bt_drop_lock_and_maybe_pin to never release the buffer pin. Have you considered that compacting of a page may still happen even if you hold the pin? [1] I've been trying to come up with cases in which this may break the patch, but I haven't able to produce such a scenario - so it may be fine. But it would be good to consider again. One thing I was thinking of was a scenario where page splits and/or compacting would happen in between returning tuples. Could this break the _bt_scankey_within_page check such that it thinks the scan key is within the current page, while it actually isn't? Mainly for backward and/or cursor scans. Forward scans shouldn't be a problem I think. Apologies for being a bit vague as I don't have a clear example ready when it would go wrong. It may well be fine, but it was one of the things on my mind.

There is a BT_READ lock in place when finding the correct leaf page, or
searching within the leaf page itself. _bt_vacuum_one_page deletes only
LP_DEAD tuples, but those are already ignored in _bt_readpage. Peter, do
you have some feedback for this ?

Please, find the updated patches attached that Dmitry and I made.

Thanks again !

Best regards,
Jesper

Attachments:

v31_0001-Unique-key.patchtext/x-patch; charset=UTF-8; name=v31_0001-Unique-key.patchDownload

From 3c540c93307e6cbe792b31b12d4ecb025cd6b327 Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Fri, 15 Nov 2019 09:46:05 -0500
Subject: [PATCH 1/2] Unique key

Design by David Rowley.

Author: Jesper Pedersen
---
 src/backend/nodes/outfuncs.c           |  14 +++
 src/backend/nodes/print.c              |  39 +++++++
 src/backend/optimizer/path/Makefile    |   3 +-
 src/backend/optimizer/path/allpaths.c  |   8 ++
 src/backend/optimizer/path/indxpath.c  |  41 +++++++
 src/backend/optimizer/path/pathkeys.c  |  71 ++++++++++--
 src/backend/optimizer/path/uniquekey.c | 147 +++++++++++++++++++++++++
 src/backend/optimizer/plan/planagg.c   |   1 +
 src/backend/optimizer/plan/planmain.c  |   1 +
 src/backend/optimizer/plan/planner.c   |  17 ++-
 src/backend/optimizer/util/pathnode.c  |  12 ++
 src/include/nodes/nodes.h              |   1 +
 src/include/nodes/pathnodes.h          |  18 +++
 src/include/nodes/print.h              |   1 +
 src/include/optimizer/pathnode.h       |   1 +
 src/include/optimizer/paths.h          |  11 ++
 16 files changed, 373 insertions(+), 13 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekey.c

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index d76fae44b8..16083e7a7e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1723,6 +1723,7 @@ _outPathInfo(StringInfo str, const Path *node)
 	WRITE_FLOAT_FIELD(startup_cost, "%.2f");
 	WRITE_FLOAT_FIELD(total_cost, "%.2f");
 	WRITE_NODE_FIELD(pathkeys);
+	WRITE_NODE_FIELD(uniquekeys);
 }
 
 /*
@@ -2205,6 +2206,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(eq_classes);
 	WRITE_BOOL_FIELD(ec_merging_done);
 	WRITE_NODE_FIELD(canon_pathkeys);
+	WRITE_NODE_FIELD(canon_uniquekeys);
 	WRITE_NODE_FIELD(left_join_clauses);
 	WRITE_NODE_FIELD(right_join_clauses);
 	WRITE_NODE_FIELD(full_join_clauses);
@@ -2214,6 +2216,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(placeholder_list);
 	WRITE_NODE_FIELD(fkey_list);
 	WRITE_NODE_FIELD(query_pathkeys);
+	WRITE_NODE_FIELD(query_uniquekeys);
 	WRITE_NODE_FIELD(group_pathkeys);
 	WRITE_NODE_FIELD(window_pathkeys);
 	WRITE_NODE_FIELD(distinct_pathkeys);
@@ -2401,6 +2404,14 @@ _outPathKey(StringInfo str, const PathKey *node)
 	WRITE_BOOL_FIELD(pk_nulls_first);
 }
 
+static void
+_outUniqueKey(StringInfo str, const UniqueKey *node)
+{
+	WRITE_NODE_TYPE("UNIQUEKEY");
+
+	WRITE_NODE_FIELD(eq_clause);
+}
+
 static void
 _outPathTarget(StringInfo str, const PathTarget *node)
 {
@@ -4092,6 +4103,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PathKey:
 				_outPathKey(str, obj);
 				break;
+			case T_UniqueKey:
+				_outUniqueKey(str, obj);
+				break;
 			case T_PathTarget:
 				_outPathTarget(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 42476724d8..d286b34544 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -459,6 +459,45 @@ print_pathkeys(const List *pathkeys, const List *rtable)
 	printf(")\n");
 }
 
+/*
+ * print_uniquekeys -
+ *	  uniquekeys list of UniqueKeys
+ */
+void
+print_uniquekeys(const List *uniquekeys, const List *rtable)
+{
+	ListCell   *l;
+
+	printf("(");
+	foreach(l, uniquekeys)
+	{
+		UniqueKey *unique_key = (UniqueKey *) lfirst(l);
+		EquivalenceClass *eclass = (EquivalenceClass *) unique_key->eq_clause;
+		ListCell   *k;
+		bool		first = true;
+
+		/* chase up */
+		while (eclass->ec_merged)
+			eclass = eclass->ec_merged;
+
+		printf("(");
+		foreach(k, eclass->ec_members)
+		{
+			EquivalenceMember *mem = (EquivalenceMember *) lfirst(k);
+
+			if (first)
+				first = false;
+			else
+				printf(", ");
+			print_expr((Node *) mem->em_expr, rtable);
+		}
+		printf(")");
+		if (lnext(uniquekeys, l))
+			printf(", ");
+	}
+	printf(")\n");
+}
+
 /*
  * print_tl
  *	  print targetlist in a more legible way.
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..63cc1505d9 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekey.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8286d9cf34..bbc13e6141 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3954,6 +3954,14 @@ print_path(PlannerInfo *root, Path *path, int indent)
 		print_pathkeys(path->pathkeys, root->parse->rtable);
 	}
 
+	if (path->uniquekeys)
+	{
+		for (i = 0; i < indent; i++)
+			printf("\t");
+		printf("  uniquekeys: ");
+		print_uniquekeys(path->uniquekeys, root->parse->rtable);
+	}
+
 	if (join)
 	{
 		JoinPath   *jp = (JoinPath *) path;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2a50272da6..34e7fafa84 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -189,6 +189,7 @@ static Expr *match_clause_to_ordering_op(IndexOptInfo *index,
 static bool ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 									   EquivalenceClass *ec, EquivalenceMember *em,
 									   void *arg);
+static List *get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys);
 
 
 /*
@@ -874,6 +875,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	List	   *orderbyclausecols;
 	List	   *index_pathkeys;
 	List	   *useful_pathkeys;
+	List	   *useful_uniquekeys = NIL;
 	bool		found_lower_saop_clause;
 	bool		pathkeys_possibly_useful;
 	bool		index_is_ordered;
@@ -1036,11 +1038,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	if (index_clauses != NIL || useful_pathkeys != NIL || useful_predicate ||
 		index_only_scan)
 	{
+		if (has_useful_uniquekeys(root))
+			useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 		ipath = create_index_path(root, index,
 								  index_clauses,
 								  orderbyclauses,
 								  orderbyclausecols,
 								  useful_pathkeys,
+								  useful_uniquekeys,
 								  index_is_ordered ?
 								  ForwardScanDirection :
 								  NoMovementScanDirection,
@@ -1063,6 +1069,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 									  orderbyclauses,
 									  orderbyclausecols,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  index_is_ordered ?
 									  ForwardScanDirection :
 									  NoMovementScanDirection,
@@ -1093,11 +1100,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 													index_pathkeys);
 		if (useful_pathkeys != NIL)
 		{
+			if (has_useful_uniquekeys(root))
+				useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 			ipath = create_index_path(root, index,
 									  index_clauses,
 									  NIL,
 									  NIL,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  BackwardScanDirection,
 									  index_only_scan,
 									  outer_relids,
@@ -1115,6 +1126,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 										  NIL,
 										  NIL,
 										  useful_pathkeys,
+										  useful_uniquekeys,
 										  BackwardScanDirection,
 										  index_only_scan,
 										  outer_relids,
@@ -3365,6 +3377,35 @@ match_clause_to_ordering_op(IndexOptInfo *index,
 	return clause;
 }
 
+/*
+ * get_uniquekeys_for_index
+ */
+static List *
+get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys)
+{
+	ListCell *lc;
+
+	if (pathkeys)
+	{
+		List *uniquekeys = NIL;
+		foreach(lc, pathkeys)
+		{
+			UniqueKey *unique_key;
+			PathKey *pk = (PathKey *) lfirst(lc);
+			EquivalenceClass *ec = (EquivalenceClass *) pk->pk_eclass;
+
+			unique_key = makeNode(UniqueKey);
+			unique_key->eq_clause = ec;
+			
+			lappend(uniquekeys, unique_key);
+		}
+
+		if (uniquekeys_contained_in(root->canon_uniquekeys, uniquekeys))
+			return uniquekeys;
+	}
+
+	return NIL;
+}
 
 /****************************************************************************
  *				----  ROUTINES TO DO PARTIAL INDEX PREDICATE TESTS	----
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 71b9d42c99..054df9a617 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -29,6 +29,7 @@
 #include "utils/lsyscache.h"
 
 
+static bool pathkey_is_unique(PathKey *new_pathkey, List *pathkeys);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
 											 RelOptInfo *partrel,
@@ -96,6 +97,29 @@ make_canonical_pathkey(PlannerInfo *root,
 	return pk;
 }
 
+/*
+ * pathkey_is_unique
+ *	   Checks if the new pathkey's equivalence class is the same as that of
+ *     any existing member of the pathkey list.
+ */
+static bool
+pathkey_is_unique(PathKey *new_pathkey, List *pathkeys)
+{
+	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
+	ListCell   *lc;
+
+	/* If same EC already is already in the list, then not unique */
+	foreach(lc, pathkeys)
+	{
+		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
+
+		if (new_ec == old_pathkey->pk_eclass)
+			return false;
+	}
+
+	return true;
+}
+
 /*
  * pathkey_is_redundant
  *	   Is a pathkey redundant with one already in the given list?
@@ -135,22 +159,12 @@ static bool
 pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys)
 {
 	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
-	ListCell   *lc;
 
 	/* Check for EC containing a constant --- unconditionally redundant */
 	if (EC_MUST_BE_REDUNDANT(new_ec))
 		return true;
 
-	/* If same EC already used in list, then redundant */
-	foreach(lc, pathkeys)
-	{
-		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
-
-		if (new_ec == old_pathkey->pk_eclass)
-			return true;
-	}
-
-	return false;
+	return !pathkey_is_unique(new_pathkey, pathkeys);
 }
 
 /*
@@ -1098,6 +1112,41 @@ make_pathkeys_for_sortclauses(PlannerInfo *root,
 	return pathkeys;
 }
 
+/*
+ * make_pathkeys_for_uniquekeyclauses
+ *		Generate a pathkeys list to be used for uniquekey clauses
+ */
+List *
+make_pathkeys_for_uniquekeys(PlannerInfo *root,
+							 List *sortclauses,
+							 List *tlist)
+{
+	List	   *pathkeys = NIL;
+	ListCell   *l;
+
+	foreach(l, sortclauses)
+	{
+		SortGroupClause *sortcl = (SortGroupClause *) lfirst(l);
+		Expr	   *sortkey;
+		PathKey    *pathkey;
+
+		sortkey = (Expr *) get_sortgroupclause_expr(sortcl, tlist);
+		Assert(OidIsValid(sortcl->sortop));
+		pathkey = make_pathkey_from_sortop(root,
+										   sortkey,
+										   root->nullable_baserels,
+										   sortcl->sortop,
+										   sortcl->nulls_first,
+										   sortcl->tleSortGroupRef,
+										   true);
+
+		if (pathkey_is_unique(pathkey, pathkeys))
+			pathkeys = lappend(pathkeys, pathkey);
+	}
+
+	return pathkeys;
+}
+
 /****************************************************************************
  *		PATHKEYS AND MERGECLAUSES
  ****************************************************************************/
diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c
new file mode 100644
index 0000000000..13d4ebb98c
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekey.c
@@ -0,0 +1,147 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekey.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekey.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "nodes/pg_list.h"
+
+static UniqueKey *make_canonical_uniquekey(PlannerInfo *root, EquivalenceClass *eclass);
+
+/*
+ * Build a list of unique keys
+ */
+List*
+build_uniquekeys(PlannerInfo *root, List *sortclauses)
+{
+	List *result = NIL;
+	List *sortkeys;
+	ListCell *l;
+
+	sortkeys = make_pathkeys_for_uniquekeys(root,
+											sortclauses,
+											root->processed_tlist);
+
+	/* Create a uniquekey and add it to the list */
+	foreach(l, sortkeys)
+	{
+		PathKey    *pathkey = (PathKey *) lfirst(l);
+		EquivalenceClass *ec = pathkey->pk_eclass;
+		UniqueKey *unique_key = make_canonical_uniquekey(root, ec);
+
+		result = lappend(result, unique_key);
+	}
+
+	return result;
+}
+
+/*
+ * uniquekeys_contained_in
+ *	  Are the keys2 included in the keys1 superset
+ */
+bool
+uniquekeys_contained_in(List *keys1, List *keys2)
+{
+	ListCell   *key1,
+			   *key2;
+
+	/*
+	 * Fall out quickly if we are passed two identical lists.  This mostly
+	 * catches the case where both are NIL, but that's common enough to
+	 * warrant the test.
+	 */
+	if (keys1 == keys2)
+		return true;
+
+	foreach(key2, keys2)
+	{
+		bool found = false;
+		UniqueKey  *uniquekey2 = (UniqueKey *) lfirst(key2);
+
+		foreach(key1, keys1)
+		{
+			UniqueKey  *uniquekey1 = (UniqueKey *) lfirst(key1);
+
+			if (uniquekey1->eq_clause == uniquekey2->eq_clause)
+			{
+				found = true;
+				break;
+			}
+		}
+
+		if (!found)
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * has_useful_uniquekeys
+ *		Detect whether the planner could have any uniquekeys that are
+ *		useful.
+ */
+bool
+has_useful_uniquekeys(PlannerInfo *root)
+{
+	if (root->query_uniquekeys != NIL)
+		return true;	/* there are some */
+	return false;		/* definitely useless */
+}
+
+/*
+ * make_canonical_uniquekey
+ *	  Given the parameters for a UniqueKey, find any pre-existing matching
+ *	  uniquekey in the query's list of "canonical" uniquekeys.  Make a new
+ *	  entry if there's not one already.
+ *
+ * Note that this function must not be used until after we have completed
+ * merging EquivalenceClasses.  (We don't try to enforce that here; instead,
+ * equivclass.c will complain if a merge occurs after root->canon_uniquekeys
+ * has become nonempty.)
+ */
+static UniqueKey *
+make_canonical_uniquekey(PlannerInfo *root,
+						 EquivalenceClass *eclass)
+{
+	UniqueKey  *uk;
+	ListCell   *lc;
+	MemoryContext oldcontext;
+
+	/* The passed eclass might be non-canonical, so chase up to the top */
+	while (eclass->ec_merged)
+		eclass = eclass->ec_merged;
+
+	foreach(lc, root->canon_uniquekeys)
+	{
+		uk = (UniqueKey *) lfirst(lc);
+		if (eclass == uk->eq_clause)
+			return uk;
+	}
+
+	/*
+	 * Be sure canonical uniquekeys are allocated in the main planning context.
+	 * Not an issue in normal planning, but it is for GEQO.
+	 */
+	oldcontext = MemoryContextSwitchTo(root->planner_cxt);
+
+	uk = makeNode(UniqueKey);
+	uk->eq_clause = eclass;
+
+	root->canon_uniquekeys = lappend(root->canon_uniquekeys, uk);
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return uk;
+}
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 8634940efc..dd64775d8f 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -511,6 +511,7 @@ minmax_qp_callback(PlannerInfo *root, void *extra)
 									  root->parse->targetList);
 
 	root->query_pathkeys = root->sort_pathkeys;
+	root->query_uniquekeys = NIL;
 }
 
 /*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..3a372af91b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -70,6 +70,7 @@ query_planner(PlannerInfo *root,
 	root->join_rel_level = NULL;
 	root->join_cur_level = 0;
 	root->canon_pathkeys = NIL;
+	root->canon_uniquekeys = NIL;
 	root->left_join_clauses = NIL;
 	root->right_join_clauses = NIL;
 	root->full_join_clauses = NIL;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d6f2153593..984fca0696 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3657,15 +3657,30 @@ standard_qp_callback(PlannerInfo *root, void *extra)
 	 * much easier, since we know that the parser ensured that one is a
 	 * superset of the other.
 	 */
+	root->query_uniquekeys = NIL;
+
 	if (root->group_pathkeys)
+	{
 		root->query_pathkeys = root->group_pathkeys;
+
+		if (!root->parse->hasAggs)
+			root->query_uniquekeys = build_uniquekeys(root, qp_extra->groupClause);
+	}
 	else if (root->window_pathkeys)
 		root->query_pathkeys = root->window_pathkeys;
 	else if (list_length(root->distinct_pathkeys) >
 			 list_length(root->sort_pathkeys))
+	{
 		root->query_pathkeys = root->distinct_pathkeys;
+		root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else if (root->sort_pathkeys)
+	{
 		root->query_pathkeys = root->sort_pathkeys;
+
+		if (root->distinct_pathkeys)
+			root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else
 		root->query_pathkeys = NIL;
 }
@@ -6222,7 +6237,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
 
 	/* Estimate the cost of index scan */
 	indexScanPath = create_index_path(root, indexInfo,
-									  NIL, NIL, NIL, NIL,
+									  NIL, NIL, NIL, NIL, NIL,
 									  ForwardScanDirection, false,
 									  NULL, 1.0, false);
 
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e6d08aede5..a006dbbe9c 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -940,6 +940,7 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = parallel_workers;
 	pathnode->pathkeys = NIL;	/* seqscan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_seqscan(pathnode, root, rel, pathnode->param_info);
 
@@ -964,6 +965,7 @@ create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* samplescan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_samplescan(pathnode, root, rel, pathnode->param_info);
 
@@ -1000,6 +1002,7 @@ create_index_path(PlannerInfo *root,
 				  List *indexorderbys,
 				  List *indexorderbycols,
 				  List *pathkeys,
+				  List *uniquekeys,
 				  ScanDirection indexscandir,
 				  bool indexonly,
 				  Relids required_outer,
@@ -1018,6 +1021,7 @@ create_index_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = 0;
 	pathnode->path.pathkeys = pathkeys;
+	pathnode->path.uniquekeys = uniquekeys;
 
 	pathnode->indexinfo = index;
 	pathnode->indexclauses = indexclauses;
@@ -1061,6 +1065,7 @@ create_bitmap_heap_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_degree;
 	pathnode->path.pathkeys = NIL;	/* always unordered */
+	pathnode->path.uniquekeys = NIL;
 
 	pathnode->bitmapqual = bitmapqual;
 
@@ -1922,6 +1927,7 @@ create_functionscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = pathkeys;
+	pathnode->uniquekeys = NIL;
 
 	cost_functionscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1948,6 +1954,7 @@ create_tablefuncscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_tablefuncscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1974,6 +1981,7 @@ create_valuesscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_valuesscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1999,6 +2007,7 @@ create_ctescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* XXX for now, result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2025,6 +2034,7 @@ create_namedtuplestorescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_namedtuplestorescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2051,6 +2061,7 @@ create_resultscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_resultscan(pathnode, root, rel, pathnode->param_info);
 
@@ -2077,6 +2088,7 @@ create_worktablescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	/* Cost is the same as for a regular CTE scan */
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index baced7eec0..a1511b46ea 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 3d3be197e0..4e329f0fb5 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -269,6 +269,8 @@ struct PlannerInfo
 
 	List	   *canon_pathkeys; /* list of "canonical" PathKeys */
 
+	List	   *canon_uniquekeys; /* list of "canonical" UniqueKeys */
+
 	List	   *left_join_clauses;	/* list of RestrictInfos for mergejoinable
 									 * outer join clauses w/nonnullable var on
 									 * left */
@@ -297,6 +299,8 @@ struct PlannerInfo
 
 	List	   *query_pathkeys; /* desired pathkeys for query_planner() */
 
+	List	   *query_uniquekeys; /* unique keys used for the query */
+
 	List	   *group_pathkeys; /* groupClause pathkeys, if any */
 	List	   *window_pathkeys;	/* pathkeys of bottom window, if any */
 	List	   *distinct_pathkeys;	/* distinctClause pathkeys, if any */
@@ -1077,6 +1081,15 @@ typedef struct ParamPathInfo
 	List	   *ppi_clauses;	/* join clauses available from outer rels */
 } ParamPathInfo;
 
+/*
+ * UniqueKey
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+
+	EquivalenceClass *eq_clause;	/* equivalence class */
+} UniqueKey;
 
 /*
  * Type "Path" is used as-is for sequential-scan paths, as well as some other
@@ -1106,6 +1119,9 @@ typedef struct ParamPathInfo
  *
  * "pathkeys" is a List of PathKey nodes (see above), describing the sort
  * ordering of the path's output rows.
+ *
+ * "uniquekeys", if not NIL, is a list of UniqueKey nodes (see above),
+ * describing the XXX.
  */
 typedef struct Path
 {
@@ -1129,6 +1145,8 @@ typedef struct Path
 
 	List	   *pathkeys;		/* sort ordering of path's output */
 	/* pathkeys is a List of PathKey nodes; see above */
+
+	List	   *uniquekeys;	/* the unique keys, or NIL if none */
 } Path;
 
 /* Macro for extracting a path's parameterization relids; beware double eval */
diff --git a/src/include/nodes/print.h b/src/include/nodes/print.h
index 6126b491bf..006248bfb5 100644
--- a/src/include/nodes/print.h
+++ b/src/include/nodes/print.h
@@ -28,6 +28,7 @@ extern char *pretty_format_node_dump(const char *dump);
 extern void print_rt(const List *rtable);
 extern void print_expr(const Node *expr, const List *rtable);
 extern void print_pathkeys(const List *pathkeys, const List *rtable);
+extern void print_uniquekeys(const List *uniquekeys, const List *rtable);
 extern void print_tl(const List *tlist, const List *rtable);
 extern void print_slot(TupleTableSlot *slot);
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e450fe112a..f75ff6f323 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -44,6 +44,7 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									List *indexorderbys,
 									List *indexorderbycols,
 									List *pathkeys,
+									List *uniquekeys,
 									ScanDirection indexscandir,
 									bool indexonly,
 									Relids required_outer,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ab73bd20c..5b6be383b3 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -214,6 +214,9 @@ extern List *build_join_pathkeys(PlannerInfo *root,
 extern List *make_pathkeys_for_sortclauses(PlannerInfo *root,
 										   List *sortclauses,
 										   List *tlist);
+extern List *make_pathkeys_for_uniquekeys(PlannerInfo *root,
+										  List *sortclauses,
+										  List *tlist);
 extern void initialize_mergeclause_eclasses(PlannerInfo *root,
 											RestrictInfo *restrictinfo);
 extern void update_mergeclause_eclasses(PlannerInfo *root,
@@ -240,4 +243,12 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+/*
+ * uniquekey.c
+ *	  Utilities for matching and building unique keys
+ */
+extern List *build_uniquekeys(PlannerInfo *root, List *sortclauses);
+extern bool uniquekeys_contained_in(List *keys1, List *keys2);
+extern bool has_useful_uniquekeys(PlannerInfo *root);
+
 #endif							/* PATHS_H */
-- 
2.21.1

v31-0002-Index-skip-scan.patchtext/x-patch; charset=UTF-8; name=v31-0002-Index-skip-scan.patchDownload

From 249539861efbf9fc15502abb27fce2396f1956e0 Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Mon, 20 Jan 2020 08:01:55 -0500
Subject: [PATCH 2/2] Index skip scan

Implementation of Index Skip Scan (see Loose Index Scan in the wiki [1])
on top of IndexOnlyScan and IndexScan. To make it suitable for both
situations when there are small number of distinct values and
significant amount of distinct values the following approach is taken -
instead of searching from the root for every value we're searching for
then first on the current page, and then if not found continue searching
from the root.

Original patch and design were proposed by Thomas Munro [2], revived and
improved by Dmitry Dolgov and Jesper Pedersen.

[1] https://wiki.postgresql.org/wiki/Loose_indexscan
[2] https://www.postgresql.org/message-id/flat/CADLWmXXbTSBxP-MzJuPAYSsL_2f0iPm5VWPbCvDbVvfX93FKkw%40mail.gmail.com

Author: Jesper Pedersen, Dmitry Dolgov
Reviewed-by: Thomas Munro, David Rowley, Floris Van Nee, Kyotaro Horiguchi, Tomas Vondra, Peter Geoghegan
---
 contrib/bloom/blutils.c                       |   1 +
 doc/src/sgml/config.sgml                      |  15 +
 doc/src/sgml/indexam.sgml                     |  63 +++
 doc/src/sgml/indices.sgml                     |  23 +
 src/backend/access/brin/brin.c                |   1 +
 src/backend/access/gin/ginutil.c              |   1 +
 src/backend/access/gist/gist.c                |   1 +
 src/backend/access/hash/hash.c                |   1 +
 src/backend/access/index/indexam.c            |  18 +
 src/backend/access/nbtree/nbtree.c            |  13 +
 src/backend/access/nbtree/nbtsearch.c         | 366 +++++++++++++
 src/backend/access/spgist/spgutils.c          |   1 +
 src/backend/commands/explain.c                |  25 +
 src/backend/executor/nodeIndexonlyscan.c      |  51 +-
 src/backend/executor/nodeIndexscan.c          |  56 +-
 src/backend/nodes/copyfuncs.c                 |   2 +
 src/backend/nodes/outfuncs.c                  |   2 +
 src/backend/nodes/readfuncs.c                 |   2 +
 src/backend/optimizer/path/costsize.c         |   1 +
 src/backend/optimizer/plan/createplan.c       |  20 +-
 src/backend/optimizer/plan/planner.c          |  76 +++
 src/backend/optimizer/util/pathnode.c         |  40 ++
 src/backend/optimizer/util/plancat.c          |   1 +
 src/backend/utils/misc/guc.c                  |   9 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/amapi.h                    |   8 +
 src/include/access/genam.h                    |   2 +
 src/include/access/nbtree.h                   |   7 +
 src/include/access/sdir.h                     |   7 +
 src/include/nodes/execnodes.h                 |   6 +
 src/include/nodes/pathnodes.h                 |   5 +
 src/include/nodes/plannodes.h                 |   4 +
 src/include/optimizer/cost.h                  |   1 +
 src/include/optimizer/pathnode.h              |   5 +
 src/test/regress/expected/select_distinct.out | 505 ++++++++++++++++++
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/sql/select_distinct.sql      | 186 +++++++
 37 files changed, 1518 insertions(+), 11 deletions(-)

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 0104d02f67..a018b7f3d0 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -133,6 +133,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = blbulkdelete;
 	amroutine->amvacuumcleanup = blvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = blcostestimate;
 	amroutine->amoptions = bloptions;
 	amroutine->amproperty = NULL;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3ccacd528b..f99e702364 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4517,6 +4517,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-indexskipscan" xreflabel="enable_indexskipscan">
+      <term><varname>enable_indexskipscan</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_indexskipscan</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of index-skip-scan plan
+        types (see <xref linkend="indexes-index-skip-scans"/>). The default is
+        <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-material" xreflabel="enable_material">
       <term><varname>enable_material</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 37f8d8760a..a726d80878 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -148,6 +148,7 @@ typedef struct IndexAmRoutine
     amendscan_function amendscan;
     ammarkpos_function ammarkpos;       /* can be NULL */
     amrestrpos_function amrestrpos;     /* can be NULL */
+    amskip_function amskip;             /* can be NULL */
 
     /* interface functions to support parallel index scans */
     amestimateparallelscan_function amestimateparallelscan;    /* can be NULL */
@@ -691,6 +692,68 @@ amrestrpos (IndexScanDesc scan);
 
   <para>
 <programlisting>
+bool
+amskip (IndexScanDesc scan,
+        ScanDirection direction,
+        ScanDirection indexdir,
+        bool scanstart,
+        int prefix);
+</programlisting>
+  Skip past all tuples where the first 'prefix' columns have the same value as
+  the last tuple returned in the current scan. The arguments are:
+
+   <variablelist>
+    <varlistentry>
+     <term><parameter>scan</parameter></term>
+     <listitem>
+      <para>
+       Index scan information
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>direction</parameter></term>
+     <listitem>
+      <para>
+       The direction in which data is advancing.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>indexdir</parameter></term>
+     <listitem>
+      <para>
+        The index direction, in which data must be read.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>scanstart</parameter></term>
+     <listitem>
+      <para>
+        Whether or not it is a start of the scan.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>prefix</parameter></term>
+     <listitem>
+      <para>
+        Distinct prefix size.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+
+  </para>
+
+  <para>
+<programlisting>
 Size
 amestimateparallelscan (void);
 </programlisting>
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index c54bf0dbbd..c429d98fc7 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1254,6 +1254,29 @@ SELECT target FROM tests WHERE subject = 'some-subject' AND success;
    and later will recognize such cases and allow index-only scans to be
    generated, but older versions will not.
   </para>
+
+  <sect2 id="indexes-index-skip-scans">
+    <title>Index Skip Scans</title>
+
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index</primary>
+      <secondary>index-skip scans</secondary>
+    </indexterm>
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index-skip scan</primary>
+    </indexterm>
+
+    <para>
+     When the rows retrieved from an index scan are then deduplicated by
+     eliminating rows matching on a prefix of index keys (e.g. when using
+     <literal>SELECT DISTINCT</literal>), the planner will consider
+     skipping groups of rows with a matching key prefix. When a row with
+     a particular prefix is found, remaining rows with the same key prefix
+     are skipped.  The larger the number of rows with the same key prefix
+     rows (i.e. the lower the number of distinct key prefixes in the index),
+     the more efficient this is.
+    </para>
+  </sect2>
  </sect1>
 
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2e8f67ef10..4db31bb211 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -113,6 +113,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = brinbulkdelete;
 	amroutine->amvacuumcleanup = brinvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = brincostestimate;
 	amroutine->amoptions = brinoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index a7e55caf28..8dd1d30d2a 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -65,6 +65,7 @@ ginhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = ginbulkdelete;
 	amroutine->amvacuumcleanup = ginvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = gincostestimate;
 	amroutine->amoptions = ginoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index aefc302ed2..8c692f7fb4 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -86,6 +86,7 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = gistbulkdelete;
 	amroutine->amvacuumcleanup = gistvacuumcleanup;
 	amroutine->amcanreturn = gistcanreturn;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = gistcostestimate;
 	amroutine->amoptions = gistoptions;
 	amroutine->amproperty = gistproperty;
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 4871b7ff4d..e5fa4c7864 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -83,6 +83,7 @@ hashhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = hashbulkdelete;
 	amroutine->amvacuumcleanup = hashvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = hashcostestimate;
 	amroutine->amoptions = hashoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 01539b6bd6..1047a35ade 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -33,6 +33,7 @@
  *		index_can_return	- does index support index-only scans?
  *		index_getprocid - get a support procedure OID
  *		index_getprocinfo - get a support procedure's lookup info
+ *		index_skip		- advance past duplicate key values in a scan
  *
  * NOTES
  *		This file contains the index_ routines which used
@@ -730,6 +731,23 @@ index_can_return(Relation indexRelation, int attno)
 	return indexRelation->rd_indam->amcanreturn(indexRelation, attno);
 }
 
+/* ----------------
+ *		index_skip
+ *
+ *		Skip past all tuples where the first 'prefix' columns have the
+ *		same value as the last tuple returned in the current scan.
+ * ----------------
+ */
+bool
+index_skip(IndexScanDesc scan, ScanDirection direction,
+		   ScanDirection indexdir, bool scanstart, int prefix)
+{
+	SCAN_CHECKS;
+
+	return scan->indexRelation->rd_indam->amskip(scan, direction,
+												 indexdir, scanstart, prefix);
+}
+
 /* ----------------
  *		index_getprocid
  *
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 5254bc7ef5..8fde56fe60 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -132,6 +132,7 @@ bthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = btbulkdelete;
 	amroutine->amvacuumcleanup = btvacuumcleanup;
 	amroutine->amcanreturn = btcanreturn;
+	amroutine->amskip = btskip;
 	amroutine->amcostestimate = btcostestimate;
 	amroutine->amoptions = btoptions;
 	amroutine->amproperty = btproperty;
@@ -381,6 +382,8 @@ btbeginscan(Relation rel, int nkeys, int norderbys)
 	 */
 	so->currTuples = so->markTuples = NULL;
 
+	so->skipScanKey = NULL;
+
 	scan->xs_itupdesc = RelationGetDescr(rel);
 
 	scan->opaque = so;
@@ -448,6 +451,16 @@ btrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 	_bt_preprocess_array_keys(scan);
 }
 
+/*
+ * btskip() -- skip to the beginning of the next key prefix
+ */
+bool
+btskip(IndexScanDesc scan, ScanDirection direction,
+	   ScanDirection indexdir, bool start, int prefix)
+{
+	return _bt_skip(scan, direction, indexdir, start, prefix);
+}
+
 /*
  *	btendscan() -- close down a scan
  */
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index c573814f01..4189730f3a 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -37,6 +37,10 @@ static bool _bt_parallel_readpage(IndexScanDesc scan, BlockNumber blkno,
 static Buffer _bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot);
 static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir);
 static inline void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir);
+static inline void _bt_update_skip_scankeys(IndexScanDesc scan,
+											Relation indexRel);
+static inline bool _bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+										Buffer buf, ScanDirection dir);
 
 
 /*
@@ -1375,6 +1379,312 @@ _bt_next(IndexScanDesc scan, ScanDirection dir)
 	return true;
 }
 
+/*
+ *  _bt_skip() -- Skip items that have the same prefix as the most recently
+ * 				  fetched index tuple.
+ *
+ * 		The current position is set so that a subsequent call to _bt_next will
+ * 		fetch the first tuple that differs in the leading 'prefix' keys.
+ *
+ * 		There are four different kinds of skipping (depending on dir and
+ * 		indexdir, that are important to distinguish, especially in the presense
+ * 		of an index condition:
+ *
+ * 		* Advancing forward and reading forward
+ * 			simple scan
+ *
+ * 		* Advancing forward and reading backward
+ * 			scan inside a cursor fetching backward, when skipping is necessary
+ * 			right from the start
+ *
+ * 		* Advancing backward and reading forward
+ * 			scan with order by desc inside a cursor fetching forward, when
+ * 			skipping is necessary right from the start
+ *
+ * 		* Advancing backward and reading backward
+ * 			simple scan with order by desc
+ *
+ *      The current page is searched for the next unique value. If none is found
+ *      we will do a scan from the root in order to find the next page with
+ *      a unique value.
+ */
+bool
+_bt_skip(IndexScanDesc scan, ScanDirection dir,
+		 ScanDirection indexdir, bool scanstart, int prefix)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+	BTStack stack;
+	Buffer buf;
+	OffsetNumber offnum;
+	BTScanPosItem *currItem;
+	Relation 	 indexRel = scan->indexRelation;
+
+	/* We want to return tuples, and we need a starting point */
+	Assert(scan->xs_want_itup);
+	Assert(scan->xs_itup);
+
+	/* If skipScanKey is NULL then we initialize it with _bt_mkscankey */
+	if (so->skipScanKey == NULL)
+	{
+		so->skipScanKey = _bt_mkscankey(indexRel, scan->xs_itup);
+		so->skipScanKey->keysz = prefix;
+		so->skipScanKey->scantid = NULL;
+	}
+	_bt_update_skip_scankeys(scan, indexRel);
+
+	/* Check if the next unique key can be found within the current page */
+	if (BTScanPosIsValid(so->currPos) &&
+		_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
+	{
+		bool keyFound = false;
+
+		LockBuffer(so->currPos.buf, BT_READ);
+		offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, so->currPos.buf);
+
+		/* Lock the page for SERIALIZABLE transactions */
+		PredicateLockPage(scan->indexRelation, BufferGetBlockNumber(so->currPos.buf),
+						  scan->xs_snapshot);
+
+		/* We know in which direction to look */
+		_bt_initialize_more_data(so, dir);
+
+		/* Now read the data */
+		keyFound = _bt_readpage(scan, dir, offnum);
+
+		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		if (keyFound)
+		{
+			/* set IndexTuple */
+			currItem = &so->currPos.items[so->currPos.itemIndex];
+			scan->xs_heaptid = currItem->heapTid;
+			scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+			return true;
+		}
+	}
+
+	if (BTScanPosIsValid(so->currPos))
+	{
+		ReleaseBuffer(so->currPos.buf);
+		so->currPos.buf = InvalidBuffer;
+	}
+
+	/*
+	 * We haven't found scan key within the current page, so let's scan from
+	 * the root. Use _bt_search and _bt_binsrch to get the buffer and offset
+	 * number
+	 */
+	so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+	stack = _bt_search(scan->indexRelation, so->skipScanKey,
+					   &buf, BT_READ, scan->xs_snapshot);
+	_bt_freestack(stack);
+	so->currPos.buf = buf;
+	offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+
+	/* Lock the page for SERIALIZABLE transactions */
+	PredicateLockPage(scan->indexRelation, BufferGetBlockNumber(buf),
+					  scan->xs_snapshot);
+
+	/* We know in which direction to look */
+	_bt_initialize_more_data(so, dir);
+
+	/*
+	 * Simplest case is when both directions are forward, when we are already
+	 * at the next distinct key at the beginning of the series (so everything
+	 * else would be done in _bt_readpage)
+	 *
+	 * The case when both directions are backwards is also simple, but we need
+	 * to go one step back, since we need a last element from the previous
+	 * series.
+	 */
+	if (ScanDirectionIsBackward(dir) && ScanDirectionIsBackward(indexdir))
+		 offnum = OffsetNumberPrev(offnum);
+
+	/*
+	 * Andvance backward but read forward. At this moment we are at the next
+	 * distinct key at the beginning of the series. In case if scan just
+	 * started, we can read forward without doing anything else. Otherwise
+	 * find previous distinct key and the beginning of it's series and read
+	 * forward from there. To do so, go back one step, perform binary search
+	 * to find the first item in the series and let _bt_readpage do everything
+	 * else.
+	 */
+	else if (ScanDirectionIsBackward(dir) && ScanDirectionIsForward(indexdir))
+	{
+		if (!scanstart)
+		{
+			offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+
+			/* One step back to find a previous value */
+			_bt_readpage(scan, dir, offnum);
+
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			if (_bt_next(scan, dir))
+			{
+				_bt_update_skip_scankeys(scan, indexRel);
+
+				/*
+				 * And now find the last item from the sequence for the
+				 * current, value with the intention do OffsetNumberNext. As a
+				 * result we end up on a first element from the sequence.
+				 */
+				if (_bt_scankey_within_page(scan, so->skipScanKey,
+											so->currPos.buf, dir))
+					offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				else
+				{
+					ReleaseBuffer(so->currPos.buf);
+					so->currPos.buf = InvalidBuffer;
+
+					stack = _bt_search(scan->indexRelation, so->skipScanKey,
+									   &buf, BT_READ, scan->xs_snapshot);
+					_bt_freestack(stack);
+					so->currPos.buf = buf;
+					offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				}
+			}
+			else
+			{
+				pfree(so->skipScanKey);
+				so->skipScanKey = NULL;
+				return false;
+			}
+		}
+	}
+
+	/*
+	 * Advance forward but read backward. At this moment we are at the next
+	 * distinct key at the beginning of the series. In case if scan just
+	 * started, we can go one step back and read forward without doing
+	 * anything else. Otherwise find the next distinct key and the beginning
+	 * of it's series, go one step back and read backward from there.
+	 *
+	 * An interesting situation can happen if one of distinct keys do not pass
+	 * a corresponding index condition at all. In this case reading backward
+	 * can lead to a previous distinct key being found, creating a loop. To
+	 * avoid that check the value to be returned, and jump one more time if
+	 * it's the same as at the beginning.
+	 */
+	else if (ScanDirectionIsForward(dir) && ScanDirectionIsBackward(indexdir))
+	{
+		if (scanstart)
+			offnum = OffsetNumberPrev(offnum);
+		else
+		{
+			OffsetNumber nextOffset,
+						startOffset;
+
+			nextOffset = startOffset = ItemPointerGetOffsetNumber(&scan->xs_itup->t_tid);
+
+			while (nextOffset == startOffset)
+			{
+				IndexTuple itup;
+
+				/*
+				 * Find a next index tuple to update scan key. It could be at
+				 * the end, so check for max offset
+				 */
+				OffsetNumber curOffnum = offnum;
+				Page page = BufferGetPage(so->currPos.buf);
+				OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+				ItemId itemid = PageGetItemId(page, Min(offnum, maxoff));
+
+				scan->xs_itup = (IndexTuple) PageGetItem(page, itemid);
+				so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+
+				_bt_update_skip_scankeys(scan, indexRel);
+				LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+
+				if (BTScanPosIsValid(so->currPos))
+				{
+					ReleaseBuffer(so->currPos.buf);
+					so->currPos.buf = InvalidBuffer;
+				}
+
+				stack = _bt_search(scan->indexRelation, so->skipScanKey,
+								   &buf, BT_READ, scan->xs_snapshot);
+				_bt_freestack(stack);
+				so->currPos.buf = buf;
+				offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+
+				/*
+				 * Jump to the next key returned the same offset, which means
+				 * we are at the end and need to return
+				 */
+				if (offnum == curOffnum)
+				{
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+
+					BTScanPosUnpinIfPinned(so->currPos);
+					BTScanPosInvalidate(so->currPos)
+
+					pfree(so->skipScanKey);
+					so->skipScanKey = NULL;
+					return false;
+				}
+
+				offnum = OffsetNumberPrev(offnum);
+
+				/* Check if _bt_readpage returns already found item */
+				if (!_bt_readpage(scan, indexdir, offnum))
+				{
+					/*
+					 * There's no actually-matching data on this page.  Try to
+					 * advance to the next page. Return false if there's no
+					 * matching data at all.
+					 */
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+					if (!_bt_steppage(scan, dir))
+					{
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+				}
+
+				currItem = &so->currPos.items[so->currPos.lastItem];
+				itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+				nextOffset = ItemPointerGetOffsetNumber(&itup->t_tid);
+
+				/*
+				 * If the nextOffset is the same as before, it means we are in
+				 * the loop, return offnum to the original position and jump
+				 * further
+				 */
+				if (nextOffset == startOffset)
+					offnum = OffsetNumberNext(offnum);
+			}
+		}
+	}
+
+	/* Now read the data */
+	if (!_bt_readpage(scan, indexdir, offnum))
+	{
+		/*
+		 * There's no actually-matching data on this page.  Try to advance to
+		 * the next page.  Return false if there's no matching data at all.
+		 */
+		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		if (!_bt_steppage(scan, dir))
+		{
+			pfree(so->skipScanKey);
+			so->skipScanKey = NULL;
+			return false;
+		}
+	}
+	else
+	{
+		/* Drop the lock, and maybe the pin, on the current page */
+		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+	}
+
+	/* And set IndexTuple */
+	currItem = &so->currPos.items[so->currPos.itemIndex];
+	scan->xs_heaptid = currItem->heapTid;
+	scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+
+	return true;
+}
+
 /*
  *	_bt_readpage() -- Load data from current index page into so->currPos
  *
@@ -2246,3 +2556,59 @@ _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir)
 	so->numKilled = 0;			/* just paranoia */
 	so->markItemIndex = -1;		/* ditto */
 }
+
+/*
+ * _bt_update_skip_scankeys() -- set up a new values for the existing scankeys
+ * 								 based on the current index tuple
+ */
+static inline void
+_bt_update_skip_scankeys(IndexScanDesc scan, Relation indexRel)
+{
+	TupleDesc		itupdesc;
+	int			indnkeyatts,
+				i;
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	ScanKey			scankeys = so->skipScanKey->scankeys;
+
+	itupdesc = RelationGetDescr(indexRel);
+	indnkeyatts = IndexRelationGetNumberOfKeyAttributes(indexRel);
+	for (i = 0; i < indnkeyatts; i++)
+	{
+		Datum datum;
+		bool null;
+		int flags;
+
+		datum = index_getattr(scan->xs_itup, i + 1, itupdesc, &null);
+		flags = (null ? SK_ISNULL : 0) |
+				(indexRel->rd_indoption[i] << SK_BT_INDOPTION_SHIFT);
+		scankeys[i].sk_flags = flags;
+		scankeys[i].sk_argument = datum;
+	}
+}
+
+/*
+ * _bt_scankey_within_page() -- check if the provided scankey could be found
+ * 								within a page, specified by the buffer.
+ */
+static inline bool
+_bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+						Buffer buf, ScanDirection dir)
+{
+	OffsetNumber low,
+				high,
+				compare_offset;
+	Page page = BufferGetPage(buf);
+	BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+	int 		 compare_value = ScanDirectionIsForward(dir) ? 0 : 1;
+
+	low = P_FIRSTDATAKEY(opaque);
+	high = PageGetMaxOffsetNumber(page);
+
+	if (unlikely(high < low))
+		return false;
+
+	compare_offset = ScanDirectionIsForward(dir) ? high : low;
+
+	return _bt_compare(scan->indexRelation,
+					   key, page, compare_offset) > compare_value;
+}
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 4924ae1c59..fa09a4685e 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -68,6 +68,7 @@ spghandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = spgbulkdelete;
 	amroutine->amvacuumcleanup = spgvacuumcleanup;
 	amroutine->amcanreturn = spgcanreturn;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = spgcostestimate;
 	amroutine->amoptions = spgoptions;
 	amroutine->amproperty = spgproperty;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d189b8d573..60cd801247 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -130,6 +130,7 @@ static void ExplainDummyGroup(const char *objtype, const char *labelname,
 static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
+static void ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
 
 
@@ -1058,6 +1059,22 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 	return planstate_tree_walker(planstate, ExplainPreScanNode, rels_used);
 }
 
+/*
+ * ExplainIndexSkipScanKeys -
+ *	  Append information about index skip scan to es->str.
+ *
+ * Can be used to print the skip prefix size.
+ */
+static void
+ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es)
+{
+	if (skipPrefixSize > 0)
+	{
+		if (es->format != EXPLAIN_FORMAT_TEXT)
+			ExplainPropertyInteger("Distinct Prefix", NULL, skipPrefixSize, es);
+	}
+}
+
 /*
  * ExplainNode -
  *	  Appends a description of a plan tree to es->str
@@ -1380,6 +1397,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
 
+				ExplainIndexSkipScanKeys(indexscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexscan->indexid,
 										indexscan->indexorderdir,
 										es);
@@ -1390,6 +1409,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) plan;
 
+				ExplainIndexSkipScanKeys(indexonlyscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexonlyscan->indexid,
 										indexonlyscan->indexorderdir,
 										es);
@@ -1599,6 +1620,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	switch (nodeTag(plan))
 	{
 		case T_IndexScan:
+			if (((IndexScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyBool("Skip scan", true, es);
 			show_scan_qual(((IndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexScan *) plan)->indexqualorig)
@@ -1612,6 +1635,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			break;
 		case T_IndexOnlyScan:
+			if (((IndexOnlyScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyBool("Skip scan", true, es);
 			show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexOnlyScan *) plan)->indexqual)
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 5617ac29e7..76330f7906 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -65,6 +65,13 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
 	ItemPointer tid;
+	IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) node->ss.ps.plan;
+
+	/*
+	 * tells if the current position was reached via skipping. In this case
+	 * there is no nead for the index_getnext_tid
+	 */
+	bool skipped = false;
 
 	/*
 	 * extract necessary information from index scan node
@@ -72,7 +79,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexOnlyScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexonlyscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -115,14 +122,50 @@ IndexOnlyNext(IndexOnlyScanState *node)
 						 node->ioss_NumOrderByKeys);
 	}
 
+	/*
+	 * Check if we need to skip to the next key prefix, because we've been
+	 * asked to implement DISTINCT.
+	 *
+	 * When fetching a cursor in the direction opposite to a general scan
+	 * direction, the result must be what normal fetching should have
+	 * returned, but in reversed order. In other words, return the last or
+	 * first scanned tuple in a DISTINCT set, depending on a cursor direction.
+	 * Due to that we skip also when the first tuple wasn't emitted yet, but
+	 * the directions are opposite.
+	 */
+	if (node->ioss_SkipPrefixSize > 0 &&
+		(node->ioss_FirstTupleEmitted ||
+		 ScanDirectionsAreOpposite(direction, indexonlyscan->indexorderdir)))
+	{
+		if (!index_skip(scandesc, direction, indexonlyscan->indexorderdir,
+						!node->ioss_FirstTupleEmitted, node->ioss_SkipPrefixSize))
+		{
+			/*
+			 * Reached end of index. At this point currPos is invalidated, and
+			 * we need to reset ioss_FirstTupleEmitted, since otherwise after
+			 * going backwards, reaching the end of index, and going forward
+			 * again we apply skip again. It would be incorrect and lead to an
+			 * extra skipped item.
+			 */
+			node->ioss_FirstTupleEmitted = false;
+			return ExecClearTuple(slot);
+		}
+		else
+		{
+			skipped = true;
+			tid = &scandesc->xs_heaptid;
+		}
+	}
+
 	/*
 	 * OK, now that we have what we need, fetch the next tuple.
 	 */
-	while ((tid = index_getnext_tid(scandesc, direction)) != NULL)
+	while (skipped || (tid = index_getnext_tid(scandesc, direction)) != NULL)
 	{
 		bool		tuple_from_heap = false;
 
 		CHECK_FOR_INTERRUPTS();
+		skipped = false;
 
 		/*
 		 * We can skip the heap fetch if the TID references a heap page on
@@ -250,6 +293,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 							  ItemPointerGetBlockNumber(tid),
 							  estate->es_snapshot);
 
+		node->ioss_FirstTupleEmitted = true;
+
 		return slot;
 	}
 
@@ -504,6 +549,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
 	indexstate->ss.ps.ExecProcNode = ExecIndexOnlyScan;
+	indexstate->ioss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->ioss_FirstTupleEmitted = false;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index d0a96a38e0..449aaec3ac 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -85,6 +85,13 @@ IndexNext(IndexScanState *node)
 	ScanDirection direction;
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
+	IndexScan *indexscan = (IndexScan *) node->ss.ps.plan;
+
+	/*
+	 * tells if the current position was reached via skipping. In this case
+	 * there is no nead for the index_getnext_tid
+	 */
+	bool skipped = false;
 
 	/*
 	 * extract necessary information from index scan node
@@ -92,7 +99,7 @@ IndexNext(IndexScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -117,6 +124,12 @@ IndexNext(IndexScanState *node)
 
 		node->iss_ScanDesc = scandesc;
 
+		/* Index skip scan assumes xs_want_itup, so set it to true */
+		if (indexscan->indexskipprefixsize > 0)
+			node->iss_ScanDesc->xs_want_itup = true;
+		else
+			node->iss_ScanDesc->xs_want_itup = false;
+
 		/*
 		 * If no run-time keys to calculate or they are ready, go ahead and
 		 * pass the scankeys to the index AM.
@@ -127,12 +140,48 @@ IndexNext(IndexScanState *node)
 						 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
 	}
 
+	/*
+	 * Check if we need to skip to the next key prefix, because we've been
+	 * asked to implement DISTINCT.
+	 *
+	 * When fetching a cursor in the direction opposite to a general scan
+	 * direction, the result must be what normal fetching should have
+	 * returned, but in reversed order. In other words, return the last or
+	 * first scanned tuple in a DISTINCT set, depending on a cursor direction.
+	 * Due to that we skip also when the first tuple wasn't emitted yet, but
+	 * the directions are opposite.
+	 */
+	if (node->iss_SkipPrefixSize > 0 &&
+		(node->iss_FirstTupleEmitted ||
+		 ScanDirectionsAreOpposite(direction, indexscan->indexorderdir)))
+	{
+		if (!index_skip(scandesc, direction, indexscan->indexorderdir,
+					   !node->iss_FirstTupleEmitted, node->iss_SkipPrefixSize))
+		{
+			/*
+			 * Reached end of index. At this point currPos is invalidated, and
+			 * we need to reset iss_FirstTupleEmitted, since otherwise after
+			 * going backwards, reaching the end of index, and going forward
+			 * again we apply skip again. It would be incorrect and lead to an
+			 * extra skipped item.
+			 */
+			node->iss_FirstTupleEmitted = false;
+			return ExecClearTuple(slot);
+		}
+		else
+		{
+			skipped = true;
+			index_fetch_heap(scandesc, slot);
+		}
+	}
+
 	/*
 	 * ok, now that we have what we need, fetch the next tuple.
 	 */
-	while (index_getnext_slot(scandesc, direction, slot))
+	while (skipped || index_getnext_slot(scandesc, direction, slot))
 	{
 		CHECK_FOR_INTERRUPTS();
+		skipped = false;
 
 		/*
 		 * If the index was lossy, we have to recheck the index quals using
@@ -149,6 +198,7 @@ IndexNext(IndexScanState *node)
 			}
 		}
 
+		node->iss_FirstTupleEmitted = true;
 		return slot;
 	}
 
@@ -910,6 +960,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
 	indexstate->ss.ps.ExecProcNode = ExecIndexScan;
+	indexstate->iss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->iss_FirstTupleEmitted = false;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 54ad62bb7f..e0cfd710c4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -493,6 +493,7 @@ _copyIndexScan(const IndexScan *from)
 	COPY_NODE_FIELD(indexorderbyorig);
 	COPY_NODE_FIELD(indexorderbyops);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
 
 	return newnode;
 }
@@ -518,6 +519,7 @@ _copyIndexOnlyScan(const IndexOnlyScan *from)
 	COPY_NODE_FIELD(indexorderby);
 	COPY_NODE_FIELD(indextlist);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 16083e7a7e..5f723cda4b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -562,6 +562,7 @@ _outIndexScan(StringInfo str, const IndexScan *node)
 	WRITE_NODE_FIELD(indexorderbyorig);
 	WRITE_NODE_FIELD(indexorderbyops);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
 }
 
 static void
@@ -576,6 +577,7 @@ _outIndexOnlyScan(StringInfo str, const IndexOnlyScan *node)
 	WRITE_NODE_FIELD(indexorderby);
 	WRITE_NODE_FIELD(indextlist);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 551ce6c41c..028d03a56d 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1820,6 +1820,7 @@ _readIndexScan(void)
 	READ_NODE_FIELD(indexorderbyorig);
 	READ_NODE_FIELD(indexorderbyops);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
 
 	READ_DONE();
 }
@@ -1839,6 +1840,7 @@ _readIndexOnlyScan(void)
 	READ_NODE_FIELD(indexorderby);
 	READ_NODE_FIELD(indextlist);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b5a0033721..710edf160a 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -124,6 +124,7 @@ int			max_parallel_workers_per_gather = 2;
 bool		enable_seqscan = true;
 bool		enable_indexscan = true;
 bool		enable_indexonlyscan = true;
+bool		enable_indexskipscan = true;
 bool		enable_bitmapscan = true;
 bool		enable_tidscan = true;
 bool		enable_sort = true;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index dff826a828..7b32f2cc7e 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -175,12 +175,14 @@ static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
 								 Oid indexid, List *indexqual, List *indexqualorig,
 								 List *indexorderby, List *indexorderbyorig,
 								 List *indexorderbyops,
-								 ScanDirection indexscandir);
+								 ScanDirection indexscandir,
+								 int skipprefix);
 static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 Index scanrelid, Oid indexid,
 										 List *indexqual, List *indexorderby,
 										 List *indextlist,
-										 ScanDirection indexscandir);
+										 ScanDirection indexscandir,
+										 int skipprefix);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
 											  List *indexqualorig);
@@ -2910,7 +2912,8 @@ create_indexscan_plan(PlannerInfo *root,
 												fixed_indexquals,
 												fixed_indexorderbys,
 												best_path->indexinfo->indextlist,
-												best_path->indexscandir);
+												best_path->indexscandir,
+												best_path->indexskipprefix);
 	else
 		scan_plan = (Scan *) make_indexscan(tlist,
 											qpqual,
@@ -2921,7 +2924,8 @@ create_indexscan_plan(PlannerInfo *root,
 											fixed_indexorderbys,
 											indexorderbys,
 											indexorderbyops,
-											best_path->indexscandir);
+											best_path->indexscandir,
+											best_path->indexskipprefix);
 
 	copy_generic_path_info(&scan_plan->plan, &best_path->path);
 
@@ -5184,7 +5188,8 @@ make_indexscan(List *qptlist,
 			   List *indexorderby,
 			   List *indexorderbyorig,
 			   List *indexorderbyops,
-			   ScanDirection indexscandir)
+			   ScanDirection indexscandir,
+			   int skipPrefixSize)
 {
 	IndexScan  *node = makeNode(IndexScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5201,6 +5206,7 @@ make_indexscan(List *qptlist,
 	node->indexorderbyorig = indexorderbyorig;
 	node->indexorderbyops = indexorderbyops;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
 
 	return node;
 }
@@ -5213,7 +5219,8 @@ make_indexonlyscan(List *qptlist,
 				   List *indexqual,
 				   List *indexorderby,
 				   List *indextlist,
-				   ScanDirection indexscandir)
+				   ScanDirection indexscandir,
+				   int skipPrefixSize)
 {
 	IndexOnlyScan *node = makeNode(IndexOnlyScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5228,6 +5235,7 @@ make_indexonlyscan(List *qptlist,
 	node->indexorderby = indexorderby;
 	node->indextlist = indextlist;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 984fca0696..c84388e6f7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4834,6 +4834,82 @@ create_distinct_paths(PlannerInfo *root,
 												  path,
 												  list_length(root->distinct_pathkeys),
 												  numDistinctRows));
+
+				/* Consider index skip scan as well */
+				if (enable_indexskipscan &&
+					IsA(path, IndexPath) &&
+					((IndexPath *) path)->indexinfo->amcanskip &&
+					root->distinct_pathkeys != NIL)
+				{
+					ListCell   		*lc;
+					IndexOptInfo 	*index = NULL;
+					bool 			different_columns_order = false,
+									not_empty_qual = false;
+					int 			i = 0;
+					int 			distinctPrefixKeys;
+
+					Assert(path->pathtype == T_IndexOnlyScan ||
+						   path->pathtype == T_IndexScan);
+
+					index = ((IndexPath *) path)->indexinfo;
+					distinctPrefixKeys = list_length(root->query_uniquekeys);
+
+					/*
+					 * Normally we can think about distinctPrefixKeys as just
+					 * a number of distinct keys. But if lets say we have a
+					 * distinct key a, and the index contains b, a in exactly
+					 * this order. In such situation we need to use position
+					 * of a in the index as distinctPrefixKeys, otherwise skip
+					 * will happen only by the first column.
+					 */
+					foreach(lc, root->query_uniquekeys)
+					{
+						UniqueKey *uniquekey = (UniqueKey *) lfirst(lc);
+						EquivalenceMember *em =
+							lfirst_node(EquivalenceMember,
+										list_head(uniquekey->eq_clause->ec_members));
+						Var *var = (Var *) em->em_expr;
+
+						Assert(i < index->ncolumns);
+
+						for (i = 0; i < index->ncolumns; i++)
+						{
+							if (index->indexkeys[i] == var->varattno)
+							{
+								distinctPrefixKeys = Max(i + 1, distinctPrefixKeys);
+								break;
+							}
+						}
+					}
+
+					/*
+					 * XXX: In case of index scan quals evaluation happens
+					 * after ExecScanFetch, which means skip results could be
+					 * fitered out. Consider the following query:
+					 *
+					 * 		select distinct (a, b) a, b, c from t where  c < 100;
+					 *
+					 * Skip scan returns one tuple for one distinct set of (a,
+					 * b) with arbitrary one of c, so if the choosed c does
+					 * not match the qual and there is any c that matches the
+					 * qual, we miss that tuple.
+					 */
+					if (path->pathtype == T_IndexScan &&
+						parse->jointree != NULL &&
+						parse->jointree->quals != NULL &&
+						list_length((List *) parse->jointree->quals) != 0)
+							not_empty_qual = true;
+
+					if (!different_columns_order &&	!not_empty_qual)
+					{
+						add_path(distinct_rel, (Path *)
+								 create_skipscan_unique_path(root,
+															 distinct_rel,
+															 path,
+															 distinctPrefixKeys,
+															 numDistinctRows));
+					}
+				}
 			}
 		}
 
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index a006dbbe9c..2fb18fb372 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2915,6 +2915,46 @@ create_upper_unique_path(PlannerInfo *root,
 	return pathnode;
 }
 
+/*
+ * create_skipscan_unique_path
+ *	  Creates a pathnode the same as an existing IndexPath except based on
+ *	  skipping duplicate values.  This may or may not be cheaper than using
+ *	  create_upper_unique_path.
+ *
+ * The input path must be an IndexPath for an index that supports amskip.
+ */
+IndexPath *
+create_skipscan_unique_path(PlannerInfo *root,
+							RelOptInfo *rel,
+							Path *basepath,
+							int distinctPrefixKeys,
+							double numGroups)
+{
+	IndexPath *pathnode = makeNode(IndexPath);
+
+	Assert(IsA(basepath, IndexPath));
+
+	/* We don't want to modify basepath, so make a copy. */
+	memcpy(pathnode, basepath, sizeof(IndexPath));
+
+	/* The size of the prefix we'll use for skipping. */
+	Assert(pathnode->indexinfo->amcanskip);
+	Assert(distinctPrefixKeys > 0);
+	/*Assert(distinctPrefixKeys <= list_length(pathnode->path.pathkeys));*/
+	pathnode->indexskipprefix = distinctPrefixKeys;
+
+	/*
+	 * The cost to skip to each distinct value should be roughly the same as
+	 * the cost of finding the first key times the number of distinct values
+	 * we expect to find.
+	 */
+	pathnode->path.startup_cost = basepath->startup_cost;
+	pathnode->path.total_cost = basepath->startup_cost * numGroups;
+	pathnode->path.rows = numGroups;
+
+	return pathnode;
+}
+
 /*
  * create_agg_path
  *	  Creates a pathnode that represents performing aggregation/grouping
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d82fc5ab8b..f65b299f37 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -271,6 +271,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			info->amoptionalkey = amroutine->amoptionalkey;
 			info->amsearcharray = amroutine->amsearcharray;
 			info->amsearchnulls = amroutine->amsearchnulls;
+			info->amcanskip = (amroutine->amskip != NULL);
 			info->amcanparallel = amroutine->amcanparallel;
 			info->amhasgettuple = (amroutine->amgettuple != NULL);
 			info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e44f71e991..884cdec916 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -922,6 +922,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_indexskipscan", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of index-skip-scan plans."),
+			NULL
+		},
+		&enable_indexskipscan,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_bitmapscan", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of bitmap-scan plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e1048c0047..a002ee2143 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -353,6 +353,7 @@
 #enable_hashjoin = on
 #enable_indexscan = on
 #enable_indexonlyscan = on
+#enable_indexskipscan = on
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 3b3e22f73d..3d39cd9d07 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -130,6 +130,13 @@ typedef void (*amrescan_function) (IndexScanDesc scan,
 typedef bool (*amgettuple_function) (IndexScanDesc scan,
 									 ScanDirection direction);
 
+/* skip past duplicates in a given prefix */
+typedef bool (*amskip_function) (IndexScanDesc scan,
+								 ScanDirection dir,
+								 ScanDirection indexdir,
+								 bool start,
+								 int prefix);
+
 /* fetch all valid tuples */
 typedef int64 (*amgetbitmap_function) (IndexScanDesc scan,
 									   TIDBitmap *tbm);
@@ -229,6 +236,7 @@ typedef struct IndexAmRoutine
 	amendscan_function amendscan;
 	ammarkpos_function ammarkpos;	/* can be NULL */
 	amrestrpos_function amrestrpos; /* can be NULL */
+	amskip_function amskip;				/* can be NULL */
 
 	/* interface functions to support parallel index scans */
 	amestimateparallelscan_function amestimateparallelscan; /* can be NULL */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 7e9364a50c..815de4e4dd 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,8 @@ extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
 extern IndexBulkDeleteResult *index_vacuum_cleanup(IndexVacuumInfo *info,
 												   IndexBulkDeleteResult *stats);
 extern bool index_can_return(Relation indexRelation, int attno);
+extern bool index_skip(IndexScanDesc scan, ScanDirection direction,
+					   ScanDirection indexdir, bool start, int prefix);
 extern RegProcedure index_getprocid(Relation irel, AttrNumber attnum,
 									uint16 procnum);
 extern FmgrInfo *index_getprocinfo(Relation irel, AttrNumber attnum,
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 20ace69dab..e098c6a1ab 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -662,6 +662,9 @@ typedef struct BTScanOpaqueData
 	 */
 	int			markItemIndex;	/* itemIndex, or -1 if not valid */
 
+	/* Work space for _bt_skip */
+	BTScanInsert	skipScanKey;	/* used to control skipping */
+
 	/* keep these last in struct for efficiency */
 	BTScanPosData currPos;		/* current position data */
 	BTScanPosData markPos;		/* marked position, if any */
@@ -793,6 +796,8 @@ extern OffsetNumber _bt_binsrch_insert(Relation rel, BTInsertState insertstate);
 extern int32 _bt_compare(Relation rel, BTScanInsert key, Page page, OffsetNumber offnum);
 extern bool _bt_first(IndexScanDesc scan, ScanDirection dir);
 extern bool _bt_next(IndexScanDesc scan, ScanDirection dir);
+extern bool _bt_skip(IndexScanDesc scan, ScanDirection dir,
+					 ScanDirection indexdir, bool start, int prefix);
 extern Buffer _bt_get_endpoint(Relation rel, uint32 level, bool rightmost,
 							   Snapshot snapshot);
 
@@ -817,6 +822,8 @@ extern void _bt_end_vacuum_callback(int code, Datum arg);
 extern Size BTreeShmemSize(void);
 extern void BTreeShmemInit(void);
 extern bytea *btoptions(Datum reloptions, bool validate);
+extern bool btskip(IndexScanDesc scan, ScanDirection dir,
+				   ScanDirection indexdir, bool start, int prefix);
 extern bool btproperty(Oid index_oid, int attno,
 					   IndexAMProperty prop, const char *propname,
 					   bool *res, bool *isnull);
diff --git a/src/include/access/sdir.h b/src/include/access/sdir.h
index 23feb90986..094a127464 100644
--- a/src/include/access/sdir.h
+++ b/src/include/access/sdir.h
@@ -55,4 +55,11 @@ typedef enum ScanDirection
 #define ScanDirectionIsForward(direction) \
 	((bool) ((direction) == ForwardScanDirection))
 
+/*
+ * ScanDirectionsAreOpposite
+ *		True iff scan directions are backward/forward or forward/backward.
+ */
+#define ScanDirectionsAreOpposite(dirA, dirB) \
+	((bool) (dirA != NoMovementScanDirection && dirA == -dirB))
+
 #endif							/* SDIR_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 1f6f5bbc20..2c6acc160a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1423,6 +1423,8 @@ typedef struct IndexScanState
 	ExprContext *iss_RuntimeContext;
 	Relation	iss_RelationDesc;
 	struct IndexScanDescData *iss_ScanDesc;
+	int         iss_SkipPrefixSize;
+	bool		iss_FirstTupleEmitted;
 
 	/* These are needed for re-checking ORDER BY expr ordering */
 	pairingheap *iss_ReorderQueue;
@@ -1452,6 +1454,8 @@ typedef struct IndexScanState
  *		TableSlot		   slot for holding tuples fetched from the table
  *		VMBuffer		   buffer in use for visibility map testing, if any
  *		PscanLen		   size of parallel index-only scan descriptor
+ *		SkipPrefixSize	   number of keys for skip-based DISTINCT
+ *		FirstTupleEmitted  has the first tuple been emitted
  * ----------------
  */
 typedef struct IndexOnlyScanState
@@ -1470,6 +1474,8 @@ typedef struct IndexOnlyScanState
 	struct IndexScanDescData *ioss_ScanDesc;
 	TupleTableSlot *ioss_TableSlot;
 	Buffer		ioss_VMBuffer;
+	int         ioss_SkipPrefixSize;
+	bool		ioss_FirstTupleEmitted;
 	Size		ioss_PscanLen;
 } IndexOnlyScanState;
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4e329f0fb5..b0ff9ca3a8 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -839,6 +839,7 @@ struct IndexOptInfo
 	bool		amsearchnulls;	/* can AM search for NULL/NOT NULL entries? */
 	bool		amhasgettuple;	/* does AM have amgettuple interface? */
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
+	bool		amcanskip;		/* can AM skip duplicate values? */
 	bool		amcanparallel;	/* does AM support parallel scan? */
 	/* Rather than include amapi.h here, we declare amcostestimate like this */
 	void		(*amcostestimate) ();	/* AM's cost estimator */
@@ -1189,6 +1190,9 @@ typedef struct Path
  * we need not recompute them when considering using the same index in a
  * bitmap index/heap scan (see BitmapHeapPath).  The costs of the IndexPath
  * itself represent the costs of an IndexScan or IndexOnlyScan plan type.
+ *
+ * 'indexskipprefix' represents the number of columns to consider for skip
+ * scans.
  *----------
  */
 typedef struct IndexPath
@@ -1201,6 +1205,7 @@ typedef struct IndexPath
 	ScanDirection indexscandir;
 	Cost		indextotalcost;
 	Selectivity indexselectivity;
+	int			indexskipprefix;
 } IndexPath;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 32c0d87f80..03a00e8e1d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -409,6 +409,8 @@ typedef struct IndexScan
 	List	   *indexorderbyorig;	/* the same in original form */
 	List	   *indexorderbyops;	/* OIDs of sort ops for ORDER BY exprs */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for distinct
+										 * scans */
 } IndexScan;
 
 /* ----------------
@@ -436,6 +438,8 @@ typedef struct IndexOnlyScan
 	List	   *indexorderby;	/* list of index ORDER BY exprs */
 	List	   *indextlist;		/* TargetEntry list describing index's cols */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for distinct
+										 * scans */
 } IndexOnlyScan;
 
 /* ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index cb012ba198..847f34f02b 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -50,6 +50,7 @@ extern PGDLLIMPORT int max_parallel_workers_per_gather;
 extern PGDLLIMPORT bool enable_seqscan;
 extern PGDLLIMPORT bool enable_indexscan;
 extern PGDLLIMPORT bool enable_indexonlyscan;
+extern PGDLLIMPORT bool enable_indexskipscan;
 extern PGDLLIMPORT bool enable_bitmapscan;
 extern PGDLLIMPORT bool enable_tidscan;
 extern PGDLLIMPORT bool enable_sort;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index f75ff6f323..6c8c9dadbb 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -201,6 +201,11 @@ extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
 												 Path *subpath,
 												 int numCols,
 												 double numGroups);
+extern IndexPath *create_skipscan_unique_path(PlannerInfo *root,
+											  RelOptInfo *rel,
+											  Path *subpath,
+											  int numCols,
+											  double numGroups);
 extern AggPath *create_agg_path(PlannerInfo *root,
 								RelOptInfo *rel,
 								Path *subpath,
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index f3696c6d1d..51e12ac925 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -244,3 +244,508 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+ANALYZE distinct_a;
+SELECT DISTINCT a FROM distinct_a;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+ a 
+---
+ 1
+(1 row)
+
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Index Only Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: true
+(2 rows)
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: true
+   Index Cond: (b = 2)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: true
+   Index Cond: (b = 2)
+(3 rows)
+
+DROP INDEX distinct_a_b_a;
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+FETCH FROM c;
+ a | b 
+---+---
+ 1 | 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+END;
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+FETCH FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+END;
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Index Only Scan using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: true
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 1 | 2
+ 3 | 1 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 1 | 2
+ 1 | 1 | 2
+(2 rows)
+
+END;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Index Only Scan Backward using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: true
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 2 | 2
+ 1 | 2 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 2 | 2
+ 3 | 2 | 2
+(2 rows)
+
+END;
+DROP TABLE distinct_abc;
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+ 2 | 1 | 10
+ 3 | 1 | 10
+ 4 | 1 | 10
+ 5 | 1 | 10
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Index Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: true
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Unique
+   ->  Bitmap Heap Scan on distinct_a
+         Recheck Cond: (a = 1)
+         ->  Bitmap Index Scan on distinct_a_a_b_idx
+               Index Cond: (a = 1)
+(5 rows)
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Unique
+   ->  Index Scan using distinct_a_a_b_idx on distinct_a
+         Index Cond: (b = 2)
+         Filter: (c = 10)
+(4 rows)
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+ a | a 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+ 4 | 4
+ 5 | 5
+(5 rows)
+
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+ a | ?column? 
+---+----------
+ 1 |        1
+ 2 |        1
+ 3 |        1
+ 4 |        1
+ 5 |        1
+(5 rows)
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+FETCH FROM c;
+ a 
+---
+ 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a 
+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+END;
+DROP TABLE distinct_a;
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 |  9999
+ 1 | 10000
+(5 rows)
+
+DROP TABLE distinct_visibility;
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Index Only Scan using distinct_boundaries_a_b_c_idx on distinct_boundaries
+   Skip scan: true
+   Index Cond: ((b >= 1) AND (c = 0))
+(3 rows)
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+ a | b | c 
+---+---+---
+ 1 | 2 | 0
+ 2 | 2 | 0
+ 3 | 2 | 0
+ 4 | 2 | 0
+ 5 | 2 | 0
+(5 rows)
+
+DROP TABLE distinct_boundaries;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index a1c90eb905..bd3b373515 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -78,6 +78,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashjoin                | on
  enable_indexonlyscan           | on
  enable_indexscan               | on
+ enable_indexskipscan           | on
  enable_material                | on
  enable_mergejoin               | on
  enable_nestloop                | on
@@ -89,7 +90,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(17 rows)
+(18 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index a605e86449..4c8a50d153 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -73,3 +73,189 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+ANALYZE distinct_a;
+
+SELECT DISTINCT a FROM distinct_a;
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+DROP INDEX distinct_a_b_a;
+
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+DROP TABLE distinct_abc;
+
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+DROP TABLE distinct_a;
+
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DROP TABLE distinct_visibility;
+
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+DROP TABLE distinct_boundaries;
-- 
2.21.1

Peter Geoghegan

pg@bowt.ie

almost 6 years ago

In reply to: Jesper Pedersen (#2)

Re: Index Skip Scan

On Mon, Jan 20, 2020 at 11:01 AM Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:

- nbtsearch.c _bt_skip line 1440
if (BTScanPosIsValid(so->currPos) &&
_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))

Is it allowed to look at the high key / low key of the page without have a read lock on it?

In case of a split the page will still contain a high key and a low key,
so this should be ok.

This is definitely not okay.

- nbtsearch.c in general
Most of the code seems to rely quite heavily on the fact that xs_want_itup forces _bt_drop_lock_and_maybe_pin to never release the buffer pin. Have you considered that compacting of a page may still happen even if you hold the pin? [1] I've been trying to come up with cases in which this may break the patch, but I haven't able to produce such a scenario - so it may be fine.

Try making _bt_findinsertloc() call _bt_vacuum_one_page() whenever the
page is P_HAS_GARBAGE(), regardless of whether or not the page is
about to split. That will still be correct, while having a much better
chance of breaking the patch during stress-testing.

Relying on a buffer pin to prevent the B-Tree structure itself from
changing in any important way seems likely to be broken already. Even
if it isn't, it sounds fragile.

A leaf page doesn't really have anything called a low key. It usually
has a current first "data item"/non-pivot tuple, which is an
inherently unstable thing. Also, it has a very loose relationship with
the high key of the left sibling page, which the the closest thing to
a low key that exists (often they'll have almost the same key values,
but that is not guaranteed at all). While I haven't studied the patch,
the logic within _bt_scankey_within_page() seems fishy to me for that
reason.

There is a BT_READ lock in place when finding the correct leaf page, or
searching within the leaf page itself. _bt_vacuum_one_page deletes only
LP_DEAD tuples, but those are already ignored in _bt_readpage. Peter, do
you have some feedback for this ?

It sounds like the design of the patch relies on doing something other
than stopping a scan "between" pages, in the sense that is outlined in
the commit message of commit 09cb5c0e. If so, then that's a serious
flaw in its design.

--
Peter Geoghegan

Peter Geoghegan

pg@bowt.ie

almost 6 years ago

In reply to: Peter Geoghegan (#3)

Re: Index Skip Scan

On Mon, Jan 20, 2020 at 1:19 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Jan 20, 2020 at 11:01 AM Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:

- nbtsearch.c _bt_skip line 1440
if (BTScanPosIsValid(so->currPos) &&
_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))

Is it allowed to look at the high key / low key of the page without have a read lock on it?

In case of a split the page will still contain a high key and a low key,
so this should be ok.

This is definitely not okay.

I suggest that you find a way to add assertions to code like
_bt_readpage() that verify that we do in fact have the buffer content
lock. Actually, there is an existing assertion here that covers the
pin, but not the buffer content lock:

static bool
_bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
{
<declare variables>
...

/*
* We must have the buffer pinned and locked, but the usual macro can't be
* used here; this function is what makes it good for currPos.
*/
Assert(BufferIsValid(so->currPos.buf));

You can add another assertion that calls a new utility function in
bufmgr.c. That can use the same logic as this existing assertion in
FlushOneBuffer():

Assert(LWLockHeldByMe(BufferDescriptorGetContentLock(bufHdr)));

We haven't needed assertions like this so far because it's usually it
is clear whether or not a buffer lock is held (plus the bufmgr.c
assertions help on their own). The fact that it isn't clear whether or
not a buffer lock will be held by caller here suggests a problem. Even
still, having some guard rails in the form of these assertions could
be helpful. Also, it seems like _bt_scankey_within_page() should have
a similar set of assertions.

BTW, there is a paper that describes optimizations like loose index
scan and skip scan together, in fairly general terms: "Efficient
Search of Multidimensional B-Trees". Loose index scans are given the
name "MDAM duplicate elimination" in the paper. See:

http://vldb.org/conf/1995/P710.PDF

Goetz Graefe told me about the paper. It seems like the closest thing
that exists to a taxonomy or conceptual framework for these
techniques.

--
Peter Geoghegan

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Peter Geoghegan (#4)

Re: Index Skip Scan

On Mon, Jan 20, 2020 at 05:05:33PM -0800, Peter Geoghegan wrote:

I suggest that you find a way to add assertions to code like
_bt_readpage() that verify that we do in fact have the buffer content
lock. Actually, there is an existing assertion here that covers the
pin, but not the buffer content lock:

static bool
_bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
{
<declare variables>
...

/*
* We must have the buffer pinned and locked, but the usual macro can't be
* used here; this function is what makes it good for currPos.
*/
Assert(BufferIsValid(so->currPos.buf));

You can add another assertion that calls a new utility function in
bufmgr.c. That can use the same logic as this existing assertion in
FlushOneBuffer():

Assert(LWLockHeldByMe(BufferDescriptorGetContentLock(bufHdr)));

We haven't needed assertions like this so far because it's usually it
is clear whether or not a buffer lock is held (plus the bufmgr.c
assertions help on their own). The fact that it isn't clear whether or
not a buffer lock will be held by caller here suggests a problem. Even
still, having some guard rails in the form of these assertions could
be helpful. Also, it seems like _bt_scankey_within_page() should have
a similar set of assertions.

Thanks for suggestion. Agree, we will add such guards. It seems that in
general I need to go through the locking in the patch one more time,
since there are some gaps I din't notice/didn't know about before.

BTW, there is a paper that describes optimizations like loose index
scan and skip scan together, in fairly general terms: "Efficient
Search of Multidimensional B-Trees". Loose index scans are given the
name "MDAM duplicate elimination" in the paper. See:

http://vldb.org/conf/1995/P710.PDF

Goetz Graefe told me about the paper. It seems like the closest thing
that exists to a taxonomy or conceptual framework for these
techniques.

Yes, I've read this paper, as it's indeed the only reference I found
about this topic in literature. But unfortunately it's not much and (at
least from the first read) gives only an overview of the idea.

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Peter Geoghegan (#3)

Re: Index Skip Scan

On Mon, Jan 20, 2020 at 01:19:30PM -0800, Peter Geoghegan wrote:

Thanks for the commentaries. I'm trying to clarify your conclusions for
myself, so couple of questions.

- nbtsearch.c in general
Most of the code seems to rely quite heavily on the fact that xs_want_itup forces _bt_drop_lock_and_maybe_pin to never release the buffer pin. Have you considered that compacting of a page may still happen even if you hold the pin? [1] I've been trying to come up with cases in which this may break the patch, but I haven't able to produce such a scenario - so it may be fine.

Try making _bt_findinsertloc() call _bt_vacuum_one_page() whenever the
page is P_HAS_GARBAGE(), regardless of whether or not the page is
about to split. That will still be correct, while having a much better
chance of breaking the patch during stress-testing.

Relying on a buffer pin to prevent the B-Tree structure itself from
changing in any important way seems likely to be broken already. Even
if it isn't, it sounds fragile.

Except for checking low/high key (which should be done with a lock), I
believe the current implementation follows the same pattern I see quite
often, namely

* get a lock on a page of interest and test it's values (if we can find
next distinct value right on the next one without goind down the tree).

* if not, unlock the current page, search within the tree with
_bt_search (which locks a resuling new page) and examine values on a
new page, when necessary do _bt_steppage

Is there an obvious problem with this approach, when it comes to the
page structure modification?

A leaf page doesn't really have anything called a low key. It usually
has a current first "data item"/non-pivot tuple, which is an
inherently unstable thing.

Would this inherent instability be resolved for this particular case by
having a lock on a page while checking a first data item, or there is
something else I need to take into account?

There is a BT_READ lock in place when finding the correct leaf page, or
searching within the leaf page itself. _bt_vacuum_one_page deletes only
LP_DEAD tuples, but those are already ignored in _bt_readpage. Peter, do
you have some feedback for this ?

It sounds like the design of the patch relies on doing something other
than stopping a scan "between" pages, in the sense that is outlined in
the commit message of commit 09cb5c0e. If so, then that's a serious
flaw in its design.

Could you please elaborate why does it sound like that? If I understand
correctly, to stop a scan only "between" pages one need to use only
_bt_readpage/_bt_steppage? Other than that there is no magic with scan
position in the patch, so I'm not sure if I'm missing something here.

Jesper Pedersen

jesper.pedersen@redhat.com

almost 6 years ago

In reply to: Peter Geoghegan (#4)

1 attachment(s)

Re: Index Skip Scan

Hi Peter,

Thanks for your feedback; Dmitry has followed-up with some additional
questions.

On 1/20/20 8:05 PM, Peter Geoghegan wrote:

This is definitely not okay.

I suggest that you find a way to add assertions to code like
_bt_readpage() that verify that we do in fact have the buffer content
lock.

If you apply the attached patch on master it will fail the test suite;
did you mean something else ?

Best regards,
Jesper

Attachments:

buffer_locked.txttext/plain; charset=UTF-8; name=buffer_locked.txtDownload

diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index 4189730f3a..57882f0b8d 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -1721,6 +1721,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 	 * used here; this function is what makes it good for currPos.
 	 */
 	Assert(BufferIsValid(so->currPos.buf));
+	Assert(IsBufferPinnedAndLocked(so->currPos.buf));
 
 	page = BufferGetPage(so->currPos.buf);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index aba3960481..f29f40f9b6 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -3774,6 +3774,23 @@ HoldingBufferPinThatDelaysRecovery(void)
 	return false;
 }
 
+/*
+ * Assert that the buffer is pinned and locked
+ */
+bool
+IsBufferPinnedAndLocked(Buffer buffer)
+{
+	BufferDesc *bufHdr;
+
+	Assert(BufferIsPinned(buffer));
+
+	bufHdr = GetBufferDescriptor(buffer - 1);
+
+	Assert(LWLockHeldByMe(BufferDescriptorGetContentLock(bufHdr)));
+
+	return true;
+}
+
 /*
  * ConditionalLockBufferForCleanup - as above, but don't wait to get the lock
  *
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 73c7e9ba38..46a6aa6560 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -217,6 +217,7 @@ extern void LockBufferForCleanup(Buffer buffer);
 extern bool ConditionalLockBufferForCleanup(Buffer buffer);
 extern bool IsBufferCleanupOK(Buffer buffer);
 extern bool HoldingBufferPinThatDelaysRecovery(void);
+extern bool IsBufferPinnedAndLocked(Buffer buffer);
 
 extern void AbortBufferIO(void);

Floris Van Nee

florisvannee@Optiver.com

almost 6 years ago

In reply to: Dmitry Dolgov (#6)

Could you please elaborate why does it sound like that? If I understand
correctly, to stop a scan only "between" pages one need to use only
_bt_readpage/_bt_steppage? Other than that there is no magic with scan
position in the patch, so I'm not sure if I'm missing something here.

Anyone please correct me if I'm wrong, but I think one case where the current patch relies on some data from the page it has locked before it in checking this hi/lo key. I think it's possible for the following sequence to happen. Suppose we have a very simple one leaf-page btree containing four elements: leaf page 1 = [2,4,6,8]
We do a backwards index skip scan on this and have just returned our first tuple (8). The buffer is left pinned but unlocked. Now, someone else comes in and inserts a tuple (value 5) into this page, but suppose the page happens to be full. So a page split occurs. As far as I know, a page split could happen at any random element in the page. One of the situations we could be left with is:
Leaf page 1 = [2,4]
Leaf page 2 = [5,6,8]
However, our scan is still pointing to leaf page 1. For non-skip scans this is not a problem, as we already read all matching elements in our local buffer and we'll return those. But the skip scan currently:
a) checks the lo-key of the page to see if the next prefix can be found on the leaf page 1
b) finds out that this is actually true
c) does a search on the page and returns value=4 (while it should have returned value=6)

Peter, is my understanding about the btree internals correct so far?

Now that I look at the patch again, I fear there currently may also be such a dependency in the "Advance forward but read backward"-case. It saves the offset number of a tuple in a variable, then does a _bt_search (releasing the lock and pin on the page). At this point, anything can happen to the tuples on this page - the page may be compacted by vacuum such that the offset number you have in your variable does not match the actual offset number of the tuple on the page anymore. Then, at the check for (nextOffset == startOffset) later, there's a possibility the offsets are different even though they relate to the same tuple.

-Floris

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Floris Van Nee (#8)

Re: Index Skip Scan

On Wed, Jan 22, 2020 at 07:50:30AM +0000, Floris Van Nee wrote:

Anyone please correct me if I'm wrong, but I think one case where the current patch relies on some data from the page it has locked before it in checking this hi/lo key. I think it's possible for the following sequence to happen. Suppose we have a very simple one leaf-page btree containing four elements: leaf page 1 = [2,4,6,8]
We do a backwards index skip scan on this and have just returned our first tuple (8). The buffer is left pinned but unlocked. Now, someone else comes in and inserts a tuple (value 5) into this page, but suppose the page happens to be full. So a page split occurs. As far as I know, a page split could happen at any random element in the page. One of the situations we could be left with is:
Leaf page 1 = [2,4]
Leaf page 2 = [5,6,8]
However, our scan is still pointing to leaf page 1.

In case if we just returned a tuple, the next action would be either
check the next page for another key or search down to the tree. Maybe
I'm missing something in your scenario, but the latter will land us on a
required page (we do not point to any leaf here), and before the former
there is a check for high/low key. Is there anything else missing?

Now that I look at the patch again, I fear there currently may also be such a dependency in the "Advance forward but read backward"-case. It saves the offset number of a tuple in a variable, then does a _bt_search (releasing the lock and pin on the page). At this point, anything can happen to the tuples on this page - the page may be compacted by vacuum such that the offset number you have in your variable does not match the actual offset number of the tuple on the page anymore. Then, at the check for (nextOffset == startOffset) later, there's a possibility the offsets are different even though they relate to the same tuple.

Interesting point. The original idea here was to check that we're not
returned to the same position after jumping, so maybe instead of offsets
we can check a tuple we found.

#10

Floris Van Nee

florisvannee@Optiver.com

almost 6 years ago

In reply to: Dmitry Dolgov (#9)

Re: Index Skip Scan

Hi Dmitry,

On Wed, Jan 22, 2020 at 07:50:30AM +0000, Floris Van Nee wrote:

Anyone please correct me if I'm wrong, but I think one case where the current patch relies on some data from the page it has locked before it in checking this hi/lo key. I think it's possible for the following sequence to happen. Suppose we have a very simple one leaf-page btree containing four elements: leaf page 1 = [2,4,6,8]
We do a backwards index skip scan on this and have just returned our first tuple (8). The buffer is left pinned but unlocked. Now, someone else comes in and inserts a tuple (value 5) into this page, but suppose the page happens to be full. So a page split occurs. As far as I know, a page split could happen at any random element in the page. One of the situations we could be left with is:
Leaf page 1 = [2,4]
Leaf page 2 = [5,6,8]
However, our scan is still pointing to leaf page 1.

In case if we just returned a tuple, the next action would be either

check the next page for another key or search down to the tree. Maybe

But it won't look at the 'next page for another key', but rather at the 'same page or another key', right? In the _bt_scankey_within_page shortcut we're taking, there's no stepping to a next page involved. It just locks the page again that it previously also locked.

I'm missing something in your scenario, but the latter will land us on a
required page (we do not point to any leaf here), and before the former
there is a check for high/low key. Is there anything else missing?

Let me try to clarify. After we return the first tuple, so->currPos.buf is pointing to page=1 in my example (it's the only page after all). We've returned item=8. Then the split happens and the items get rearranged as in my example. We're still pointing with so->currPos.buf to page=1, but the page now contains [2,4]. The split happened to the right, so there's a page=2 with [5,6,8], however the ongoing index scan is unaware of that.
Now _bt_skip gets called to fetch the next tuple. It starts by checking _bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir), the result of which will be 'true': we're comparing the skip key to the low key of the page. So it thinks the next key can be found on the current page. It locks the page and does a _binsrch, finding item=4 to be returned.

The problem here is that _bt_scankey_within_page mistakenly returns true, thereby limiting the search to just the page that it's pointing to already.
It may be fine to just fix this function to return the proper value (I guess it'd also need to look at the high key in this example). It could also be fixed by not looking at the lo/hi key of the page, but to use the local tuple buffer instead. We already did a _read_page once, so if we have any matching tuples on that specific page, we have them locally in the buffer already. That way we never need to lock the same page twice.

#11

Peter Geoghegan

pg@bowt.ie

almost 6 years ago

In reply to: Jesper Pedersen (#7)

Re: Index Skip Scan

On Tue, Jan 21, 2020 at 9:06 AM Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:

If you apply the attached patch on master it will fail the test suite;
did you mean something else ?

Yeah, this is exactly what I had in mind for the _bt_readpage() assertion.

As I said, it isn't a great sign that this kind of assertion is even
necessary in index access method code (code like bufmgr.c is another
matter). Usually it's just obvious that a buffer lock is held. I can't
really blame this patch for that, though. You could say the same thing
about the existing "buffer pin held" _bt_readpage() assertion. It's
good that it verifies what is actually a fragile assumption, even
though I'd prefer to not make a fragile assumption.

--
Peter Geoghegan

#12

Peter Geoghegan

pg@bowt.ie

almost 6 years ago

In reply to: Floris Van Nee (#8)

Re: Index Skip Scan

On Tue, Jan 21, 2020 at 11:50 PM Floris Van Nee
<florisvannee@optiver.com> wrote:

Anyone please correct me if I'm wrong, but I think one case where the current patch relies on some data from the page it has locked before it in checking this hi/lo key. I think it's possible for the following sequence to happen. Suppose we have a very simple one leaf-page btree containing four elements: leaf page 1 = [2,4,6,8]
We do a backwards index skip scan on this and have just returned our first tuple (8). The buffer is left pinned but unlocked. Now, someone else comes in and inserts a tuple (value 5) into this page, but suppose the page happens to be full. So a page split occurs. As far as I know, a page split could happen at any random element in the page. One of the situations we could be left with is:
Leaf page 1 = [2,4]
Leaf page 2 = [5,6,8]
However, our scan is still pointing to leaf page 1. For non-skip scans this is not a problem, as we already read all matching elements in our local buffer and we'll return those. But the skip scan currently:
a) checks the lo-key of the page to see if the next prefix can be found on the leaf page 1
b) finds out that this is actually true
c) does a search on the page and returns value=4 (while it should have returned value=6)

Peter, is my understanding about the btree internals correct so far?

This is a good summary. This is the kind of scenario I had in mind
when I expressed a general concern about "stopping between pages".
Processing a whole page at a time is a crucial part of how
_bt_readpage() currently deals with concurrent page splits.

Holding a buffer pin on a leaf page is only effective as an interlock
against VACUUM completely removing a tuple, which could matter with
non-MVCC scans.

Now that I look at the patch again, I fear there currently may also be such a dependency in the "Advance forward but read backward"-case. It saves the offset number of a tuple in a variable, then does a _bt_search (releasing the lock and pin on the page). At this point, anything can happen to the tuples on this page - the page may be compacted by vacuum such that the offset number you have in your variable does not match the actual offset number of the tuple on the page anymore. Then, at the check for (nextOffset == startOffset) later, there's a possibility the offsets are different even though they relate to the same tuple.

If skip scan is restricted to heapkeyspace indexes (i.e. those created
on Postgres 12+), then it might be reasonable to save an index tuple,
and relocate it within the same page using a fresh binary search that
uses a scankey derived from the same index tuple -- without unsetting
scantid/the heap TID scankey attribute. I suppose that you'll need to
"find your place again" after releasing the buffer lock on a leaf page
for a time. Also, I think that this will only be safe with MVCC scans,
because otherwise the page could be concurrently deleted by VACUUM.

--
Peter Geoghegan

#13

Peter Geoghegan

pg@bowt.ie

almost 6 years ago

In reply to: Peter Geoghegan (#12)

Re: Index Skip Scan

On Wed, Jan 22, 2020 at 10:55 AM Peter Geoghegan <pg@bowt.ie> wrote:

This is a good summary. This is the kind of scenario I had in mind
when I expressed a general concern about "stopping between pages".
Processing a whole page at a time is a crucial part of how
_bt_readpage() currently deals with concurrent page splits.

Note in particular that index scans cannot return the same index tuple
twice -- processing a page at a time ensures that that cannot happen.

Can a loose index scan return the same tuple (i.e. a tuple with the
same heap TID) to the executor more than once?

--
Peter Geoghegan

#14

Floris Van Nee

florisvannee@Optiver.com

almost 6 years ago

In reply to: Peter Geoghegan (#13)

Note in particular that index scans cannot return the same index tuple twice -
- processing a page at a time ensures that that cannot happen.

Can a loose index scan return the same tuple (i.e. a tuple with the same heap
TID) to the executor more than once?

The loose index scan shouldn't return a tuple twice. It should only be able to skip 'further', so that shouldn't be a problem. Out of curiosity, why can't index scans return the same tuple twice? Is there something in the executor that isn't able to handle this?

-Floris

#15

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Floris Van Nee (#10)

Re: Index Skip Scan

On Wed, Jan 22, 2020 at 05:24:43PM +0000, Floris Van Nee wrote:

Anyone please correct me if I'm wrong, but I think one case where the current patch relies on some data from the page it has locked before it in checking this hi/lo key. I think it's possible for the following sequence to happen. Suppose we have a very simple one leaf-page btree containing four elements: leaf page 1 = [2,4,6,8]
We do a backwards index skip scan on this and have just returned our first tuple (8). The buffer is left pinned but unlocked. Now, someone else comes in and inserts a tuple (value 5) into this page, but suppose the page happens to be full. So a page split occurs. As far as I know, a page split could happen at any random element in the page. One of the situations we could be left with is:
Leaf page 1 = [2,4]
Leaf page 2 = [5,6,8]
However, our scan is still pointing to leaf page 1.

In case if we just returned a tuple, the next action would be either

check the next page for another key or search down to the tree. Maybe

But it won't look at the 'next page for another key', but rather at the 'same page or another key', right? In the _bt_scankey_within_page shortcut we're taking, there's no stepping to a next page involved. It just locks the page again that it previously also locked.

Yep, it would look only on the same page. Not sure what do you mean by
"another key", if the current key is not found within the current page
at the first stage, we restart from the root.

I'm missing something in your scenario, but the latter will land us on a
required page (we do not point to any leaf here), and before the former
there is a check for high/low key. Is there anything else missing?

Let me try to clarify. After we return the first tuple, so->currPos.buf is pointing to page=1 in my example (it's the only page after all). We've returned item=8. Then the split happens and the items get rearranged as in my example. We're still pointing with so->currPos.buf to page=1, but the page now contains [2,4]. The split happened to the right, so there's a page=2 with [5,6,8], however the ongoing index scan is unaware of that.
Now _bt_skip gets called to fetch the next tuple. It starts by checking _bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir), the result of which will be 'true': we're comparing the skip key to the low key of the page. So it thinks the next key can be found on the current page. It locks the page and does a _binsrch, finding item=4 to be returned.

The problem here is that _bt_scankey_within_page mistakenly returns true, thereby limiting the search to just the page that it's pointing to already.
It may be fine to just fix this function to return the proper value (I guess it'd also need to look at the high key in this example). It could also be fixed by not looking at the lo/hi key of the page, but to use the local tuple buffer instead. We already did a _read_page once, so if we have any matching tuples on that specific page, we have them locally in the buffer already. That way we never need to lock the same page twice.

Oh, that's what you mean. Yes, I was somehow tricked by the name of this
function and didn't notice that it checks only one boundary, so in case
of backward scan it returns wrong result. I think in the situation
you've describe it would actually not find any item on the current page
and restart from the root, but nevertheless we need to check for both
keys in _bt_scankey_within_page.

#16

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Floris Van Nee (#14)

Re: Index Skip Scan

On Wed, Jan 22, 2020 at 09:08:59PM +0000, Floris Van Nee wrote:

Note in particular that index scans cannot return the same index tuple twice -
- processing a page at a time ensures that that cannot happen.

Can a loose index scan return the same tuple (i.e. a tuple with the same heap
TID) to the executor more than once?

The loose index scan shouldn't return a tuple twice. It should only be able to skip 'further', so that shouldn't be a problem.

Yes, it shouldn't happen.

#17

Peter Geoghegan

pg@bowt.ie

almost 6 years ago

In reply to: Floris Van Nee (#14)

Re: Index Skip Scan

On Wed, Jan 22, 2020 at 1:09 PM Floris Van Nee <florisvannee@optiver.com> wrote:

The loose index scan shouldn't return a tuple twice. It should only be able to skip 'further', so that shouldn't be a problem. Out of curiosity, why can't index scans return the same tuple twice? Is there something in the executor that isn't able to handle this?

I have no reason to believe that the executor has a problem with index
scans that return a tuple more than once, aside from the very obvious:
in general, that will often be wrong. It might not be wrong when the
scan happens to be input to a unique node anyway, or something like
that.

I'm not particularly concerned about it. Just wanted to be clear on
our assumptions for loose index scans -- if loose index scans were
allowed to return a tuple more than once, then that would at least
have to at least be considered in the wider context of the executor
(but apparently they're not, so no need to worry about it). This may
have been mentioned somewhere already. If it is then I must have
missed it.

--
Peter Geoghegan

#18

Peter Geoghegan

pg@bowt.ie

almost 6 years ago

In reply to: Dmitry Dolgov (#15)

Re: Index Skip Scan

On Wed, Jan 22, 2020 at 1:35 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Oh, that's what you mean. Yes, I was somehow tricked by the name of this
function and didn't notice that it checks only one boundary, so in case
of backward scan it returns wrong result. I think in the situation
you've describe it would actually not find any item on the current page
and restart from the root, but nevertheless we need to check for both
keys in _bt_scankey_within_page.

I suggest reading the nbtree README file's description of backwards
scans. Read the paragraph that begins with 'We support the notion of
an ordered "scan" of an index...'. I also suggest that you read a bit
of the stuff in the large section on page deletion. Certainly read the
paragraph that begins with 'Moving left in a backward scan is
complicated because...'.

It's important to grok why it's okay that we don't "couple" or "crab"
buffer locks as we descend the tree with Lehman & Yao's design -- we
can get away with having *no* interlock against page splits (e.g.,
pin, buffer lock) when we are "between" levels of the tree. This is
safe, since the page that we land on must still be "substantively the
same page", no matter how much time passes. That is, it must at least
cover the leftmost portion of the keyspace covered by the original
version of the page that we saw that we needed to descend to within
the parent page. The worst that can happen is that we have to recover
from a concurrent page split by moving right one or more times.
(Actually, page deletion can change the contents of a page entirely,
but that's not really an exception to the general rule -- page
deletion is careful about recycling pages that an in flight index scan
might land on.)

Lehman & Yao don't have backwards scans (or left links, or page
deletion). Unlike nbtree. This is why the usual Lehman & Yao
guarantees don't quite work with backward scans. We must therefore
compensate as described by the README file (basically, we check and
re-check for races, possibly returning to the original page when we
think that we might have overlooked something and need to make sure).
It's an exception to the general rule, you could say.

--
Peter Geoghegan

#19

Peter Geoghegan

pg@bowt.ie

almost 6 years ago

In reply to: Peter Geoghegan (#12)

Re: Index Skip Scan

On Wed, Jan 22, 2020 at 10:55 AM Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Jan 21, 2020 at 11:50 PM Floris Van Nee
<florisvannee@optiver.com> wrote:

As far as I know, a page split could happen at any random element in the page. One of the situations we could be left with is:
Leaf page 1 = [2,4]
Leaf page 2 = [5,6,8]
However, our scan is still pointing to leaf page 1. For non-skip scans this is not a problem, as we already read all matching elements in our local buffer and we'll return those. But the skip scan currently:
a) checks the lo-key of the page to see if the next prefix can be found on the leaf page 1
b) finds out that this is actually true
c) does a search on the page and returns value=4 (while it should have returned value=6)

Peter, is my understanding about the btree internals correct so far?

This is a good summary. This is the kind of scenario I had in mind
when I expressed a general concern about "stopping between pages".
Processing a whole page at a time is a crucial part of how
_bt_readpage() currently deals with concurrent page splits.

I want to be clear about what it means that the page doesn't have a
"low key". Let us once again start with a very simple one leaf-page
btree containing four elements: leaf page 1 = [2,4,6,8] -- just like
in Floris' original page split scenario.

Let us also say that page 1 has a left sibling page -- page 0. Page 0
happens to have a high key with the integer value 0. So you could
*kind of* claim that the "low key" of page 1 is the integer value 0
(page 1 values must be > 0) -- *not* the integer value 2 (the
so-called "low key" here is neither > 2, nor >= 2). More formally, an
invariant exists that says that all values on page 1 must be
*strictly* greater than the integer value 0. However, this formal
invariant thing is hard or impossible to rely on when we actually
reach page 1 and want to know about its lower bound -- since there is
no "low key" pivot tuple on page 1 (we can only speak of a "low key"
as an abstract concept, or something that works transitively from the
parent -- there is only a physical high key pivot tuple on page 1
itself).

Suppose further that Page 0 is now empty, apart from its "all values
on page are <= 0" high key (page 0 must have had a few negative
integer values in its tuples at some point, but not anymore). VACUUM
will delete the page, *changing the effective low key* of Page 0 in
the process. The lower bound from the shared parent page will move
lower/left as a consequence of the deletion of page 0. nbtree page
deletion makes the "keyspace move right, not left". So the "conceptual
low key" of page 1 just went down from 0 to -5 (say), without there
being any practical way of a skip scan reading page 1 noticing the
change (the left sibling of page 0, page -1, has a high key of <= -5,
say).

Not only is it possible for somebody to insert the value 1 in page 1
-- now they can insert the value -3 or -4!

More concretely, the pivot tuple in the parent that originally pointed
to page 0 is still there -- all that page deletion changed about this
tuple is its downlink, which now points to page 1 instead or page 0.
Confusingly, page deletion removes the pivot tuple of the right
sibling page from the parent -- *not* the pivot tuple of the empty
page that gets deleted (in this case page 0) itself.

Note: this example ignores things like negative infinity values in
truncated pivot tuples, and the heap TID tiebreaker column -- in
reality this would look a bit different because of those factors.

See also: amcheck's bt_right_page_check_scankey() function, which has
a huge comment that reasons about a race involving page deletion. In
general, page deletion is by far the biggest source of complexity when
reasoning about the key space.
--
Peter Geoghegan

#20

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Dmitry Dolgov (#15)

2 attachment(s)

Re: Index Skip Scan

On Wed, Jan 22, 2020 at 10:36:03PM +0100, Dmitry Dolgov wrote:

Let me try to clarify. After we return the first tuple, so->currPos.buf is pointing to page=1 in my example (it's the only page after all). We've returned item=8. Then the split happens and the items get rearranged as in my example. We're still pointing with so->currPos.buf to page=1, but the page now contains [2,4]. The split happened to the right, so there's a page=2 with [5,6,8], however the ongoing index scan is unaware of that.
Now _bt_skip gets called to fetch the next tuple. It starts by checking _bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir), the result of which will be 'true': we're comparing the skip key to the low key of the page. So it thinks the next key can be found on the current page. It locks the page and does a _binsrch, finding item=4 to be returned.

The problem here is that _bt_scankey_within_page mistakenly returns true, thereby limiting the search to just the page that it's pointing to already.
It may be fine to just fix this function to return the proper value (I guess it'd also need to look at the high key in this example). It could also be fixed by not looking at the lo/hi key of the page, but to use the local tuple buffer instead. We already did a _read_page once, so if we have any matching tuples on that specific page, we have them locally in the buffer already. That way we never need to lock the same page twice.

Oh, that's what you mean. Yes, I was somehow tricked by the name of this
function and didn't notice that it checks only one boundary, so in case
of backward scan it returns wrong result. I think in the situation
you've describe it would actually not find any item on the current page
and restart from the root, but nevertheless we need to check for both
keys in _bt_scankey_within_page.

Thanks again everyone for commentaries and clarification. Here is the
version, where hopefully I've addressed all the mentioned issues.

As mentioned in the _bt_skip commentaries, before we were moving left to
check the next page to avoid significant issues in case if ndistinct was
underestimated and we need to skip too often. To make it work safe in
presense of splits we need to remember an original page and move right
again until we find a page with the right link pointing to it. It's not
clear whether it's worth to increase complexity for such sort of "edge
case" with ndistinct estimation while moving left, so at least for now
we ignore this in the implementation and just start from the root
immediately.

Offset based code in moving forward/reading backward was replaced with
remembering a start index tuple and an attempt to find it on the new
page. Also a missing page lock before _bt_scankey_within_page was added
and _bt_scankey_within_page checks for both page boundaries.

Attachments:

v31-0001-Unique-key.patchtext/x-diff; charset=us-asciiDownload

From c363e1f5dc7e2f0288fb04ca80bd073229c458a1 Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Tue, 9 Jul 2019 06:44:57 -0400
Subject: [PATCH v31 1/2] Unique key

Design by David Rowley.

Author: Jesper Pedersen
---
 src/backend/nodes/outfuncs.c           |  14 +++
 src/backend/nodes/print.c              |  39 +++++++
 src/backend/optimizer/path/Makefile    |   3 +-
 src/backend/optimizer/path/allpaths.c  |   8 ++
 src/backend/optimizer/path/indxpath.c  |  41 +++++++
 src/backend/optimizer/path/pathkeys.c  |  71 ++++++++++--
 src/backend/optimizer/path/uniquekey.c | 147 +++++++++++++++++++++++++
 src/backend/optimizer/plan/planagg.c   |   1 +
 src/backend/optimizer/plan/planmain.c  |   1 +
 src/backend/optimizer/plan/planner.c   |  17 ++-
 src/backend/optimizer/util/pathnode.c  |  12 ++
 src/include/nodes/nodes.h              |   1 +
 src/include/nodes/pathnodes.h          |  18 +++
 src/include/nodes/print.h              |   1 +
 src/include/optimizer/pathnode.h       |   1 +
 src/include/optimizer/paths.h          |  11 ++
 16 files changed, 373 insertions(+), 13 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekey.c

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index d76fae44b8..16083e7a7e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1723,6 +1723,7 @@ _outPathInfo(StringInfo str, const Path *node)
 	WRITE_FLOAT_FIELD(startup_cost, "%.2f");
 	WRITE_FLOAT_FIELD(total_cost, "%.2f");
 	WRITE_NODE_FIELD(pathkeys);
+	WRITE_NODE_FIELD(uniquekeys);
 }
 
 /*
@@ -2205,6 +2206,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(eq_classes);
 	WRITE_BOOL_FIELD(ec_merging_done);
 	WRITE_NODE_FIELD(canon_pathkeys);
+	WRITE_NODE_FIELD(canon_uniquekeys);
 	WRITE_NODE_FIELD(left_join_clauses);
 	WRITE_NODE_FIELD(right_join_clauses);
 	WRITE_NODE_FIELD(full_join_clauses);
@@ -2214,6 +2216,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(placeholder_list);
 	WRITE_NODE_FIELD(fkey_list);
 	WRITE_NODE_FIELD(query_pathkeys);
+	WRITE_NODE_FIELD(query_uniquekeys);
 	WRITE_NODE_FIELD(group_pathkeys);
 	WRITE_NODE_FIELD(window_pathkeys);
 	WRITE_NODE_FIELD(distinct_pathkeys);
@@ -2401,6 +2404,14 @@ _outPathKey(StringInfo str, const PathKey *node)
 	WRITE_BOOL_FIELD(pk_nulls_first);
 }
 
+static void
+_outUniqueKey(StringInfo str, const UniqueKey *node)
+{
+	WRITE_NODE_TYPE("UNIQUEKEY");
+
+	WRITE_NODE_FIELD(eq_clause);
+}
+
 static void
 _outPathTarget(StringInfo str, const PathTarget *node)
 {
@@ -4092,6 +4103,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PathKey:
 				_outPathKey(str, obj);
 				break;
+			case T_UniqueKey:
+				_outUniqueKey(str, obj);
+				break;
 			case T_PathTarget:
 				_outPathTarget(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 42476724d8..d286b34544 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -459,6 +459,45 @@ print_pathkeys(const List *pathkeys, const List *rtable)
 	printf(")\n");
 }
 
+/*
+ * print_uniquekeys -
+ *	  uniquekeys list of UniqueKeys
+ */
+void
+print_uniquekeys(const List *uniquekeys, const List *rtable)
+{
+	ListCell   *l;
+
+	printf("(");
+	foreach(l, uniquekeys)
+	{
+		UniqueKey *unique_key = (UniqueKey *) lfirst(l);
+		EquivalenceClass *eclass = (EquivalenceClass *) unique_key->eq_clause;
+		ListCell   *k;
+		bool		first = true;
+
+		/* chase up */
+		while (eclass->ec_merged)
+			eclass = eclass->ec_merged;
+
+		printf("(");
+		foreach(k, eclass->ec_members)
+		{
+			EquivalenceMember *mem = (EquivalenceMember *) lfirst(k);
+
+			if (first)
+				first = false;
+			else
+				printf(", ");
+			print_expr((Node *) mem->em_expr, rtable);
+		}
+		printf(")");
+		if (lnext(uniquekeys, l))
+			printf(", ");
+	}
+	printf(")\n");
+}
+
 /*
  * print_tl
  *	  print targetlist in a more legible way.
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..63cc1505d9 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekey.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8286d9cf34..bbc13e6141 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3954,6 +3954,14 @@ print_path(PlannerInfo *root, Path *path, int indent)
 		print_pathkeys(path->pathkeys, root->parse->rtable);
 	}
 
+	if (path->uniquekeys)
+	{
+		for (i = 0; i < indent; i++)
+			printf("\t");
+		printf("  uniquekeys: ");
+		print_uniquekeys(path->uniquekeys, root->parse->rtable);
+	}
+
 	if (join)
 	{
 		JoinPath   *jp = (JoinPath *) path;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2a50272da6..bd1ea53e5c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -189,6 +189,7 @@ static Expr *match_clause_to_ordering_op(IndexOptInfo *index,
 static bool ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 									   EquivalenceClass *ec, EquivalenceMember *em,
 									   void *arg);
+static List *get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys);
 
 
 /*
@@ -874,6 +875,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	List	   *orderbyclausecols;
 	List	   *index_pathkeys;
 	List	   *useful_pathkeys;
+	List	   *useful_uniquekeys = NIL;
 	bool		found_lower_saop_clause;
 	bool		pathkeys_possibly_useful;
 	bool		index_is_ordered;
@@ -1036,11 +1038,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	if (index_clauses != NIL || useful_pathkeys != NIL || useful_predicate ||
 		index_only_scan)
 	{
+		if (has_useful_uniquekeys(root))
+			useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 		ipath = create_index_path(root, index,
 								  index_clauses,
 								  orderbyclauses,
 								  orderbyclausecols,
 								  useful_pathkeys,
+								  useful_uniquekeys,
 								  index_is_ordered ?
 								  ForwardScanDirection :
 								  NoMovementScanDirection,
@@ -1063,6 +1069,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 									  orderbyclauses,
 									  orderbyclausecols,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  index_is_ordered ?
 									  ForwardScanDirection :
 									  NoMovementScanDirection,
@@ -1093,11 +1100,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 													index_pathkeys);
 		if (useful_pathkeys != NIL)
 		{
+			if (has_useful_uniquekeys(root))
+				useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 			ipath = create_index_path(root, index,
 									  index_clauses,
 									  NIL,
 									  NIL,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  BackwardScanDirection,
 									  index_only_scan,
 									  outer_relids,
@@ -1115,6 +1126,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 										  NIL,
 										  NIL,
 										  useful_pathkeys,
+										  useful_uniquekeys,
 										  BackwardScanDirection,
 										  index_only_scan,
 										  outer_relids,
@@ -3365,6 +3377,35 @@ match_clause_to_ordering_op(IndexOptInfo *index,
 	return clause;
 }
 
+/*
+ * get_uniquekeys_for_index
+ */
+static List *
+get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys)
+{
+	ListCell *lc;
+
+	if (pathkeys)
+	{
+		List *uniquekeys = NIL;
+		foreach(lc, pathkeys)
+		{
+			UniqueKey *unique_key;
+			PathKey *pk = (PathKey *) lfirst(lc);
+			EquivalenceClass *ec = (EquivalenceClass *) pk->pk_eclass;
+
+			unique_key = makeNode(UniqueKey);
+			unique_key->eq_clause = ec;
+
+			lappend(uniquekeys, unique_key);
+		}
+
+		if (uniquekeys_contained_in(root->canon_uniquekeys, uniquekeys))
+			return uniquekeys;
+	}
+
+	return NIL;
+}
 
 /****************************************************************************
  *				----  ROUTINES TO DO PARTIAL INDEX PREDICATE TESTS	----
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 71b9d42c99..054df9a617 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -29,6 +29,7 @@
 #include "utils/lsyscache.h"
 
 
+static bool pathkey_is_unique(PathKey *new_pathkey, List *pathkeys);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
 											 RelOptInfo *partrel,
@@ -96,6 +97,29 @@ make_canonical_pathkey(PlannerInfo *root,
 	return pk;
 }
 
+/*
+ * pathkey_is_unique
+ *	   Checks if the new pathkey's equivalence class is the same as that of
+ *     any existing member of the pathkey list.
+ */
+static bool
+pathkey_is_unique(PathKey *new_pathkey, List *pathkeys)
+{
+	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
+	ListCell   *lc;
+
+	/* If same EC already is already in the list, then not unique */
+	foreach(lc, pathkeys)
+	{
+		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
+
+		if (new_ec == old_pathkey->pk_eclass)
+			return false;
+	}
+
+	return true;
+}
+
 /*
  * pathkey_is_redundant
  *	   Is a pathkey redundant with one already in the given list?
@@ -135,22 +159,12 @@ static bool
 pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys)
 {
 	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
-	ListCell   *lc;
 
 	/* Check for EC containing a constant --- unconditionally redundant */
 	if (EC_MUST_BE_REDUNDANT(new_ec))
 		return true;
 
-	/* If same EC already used in list, then redundant */
-	foreach(lc, pathkeys)
-	{
-		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
-
-		if (new_ec == old_pathkey->pk_eclass)
-			return true;
-	}
-
-	return false;
+	return !pathkey_is_unique(new_pathkey, pathkeys);
 }
 
 /*
@@ -1098,6 +1112,41 @@ make_pathkeys_for_sortclauses(PlannerInfo *root,
 	return pathkeys;
 }
 
+/*
+ * make_pathkeys_for_uniquekeyclauses
+ *		Generate a pathkeys list to be used for uniquekey clauses
+ */
+List *
+make_pathkeys_for_uniquekeys(PlannerInfo *root,
+							 List *sortclauses,
+							 List *tlist)
+{
+	List	   *pathkeys = NIL;
+	ListCell   *l;
+
+	foreach(l, sortclauses)
+	{
+		SortGroupClause *sortcl = (SortGroupClause *) lfirst(l);
+		Expr	   *sortkey;
+		PathKey    *pathkey;
+
+		sortkey = (Expr *) get_sortgroupclause_expr(sortcl, tlist);
+		Assert(OidIsValid(sortcl->sortop));
+		pathkey = make_pathkey_from_sortop(root,
+										   sortkey,
+										   root->nullable_baserels,
+										   sortcl->sortop,
+										   sortcl->nulls_first,
+										   sortcl->tleSortGroupRef,
+										   true);
+
+		if (pathkey_is_unique(pathkey, pathkeys))
+			pathkeys = lappend(pathkeys, pathkey);
+	}
+
+	return pathkeys;
+}
+
 /****************************************************************************
  *		PATHKEYS AND MERGECLAUSES
  ****************************************************************************/
diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c
new file mode 100644
index 0000000000..13d4ebb98c
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekey.c
@@ -0,0 +1,147 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekey.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekey.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "nodes/pg_list.h"
+
+static UniqueKey *make_canonical_uniquekey(PlannerInfo *root, EquivalenceClass *eclass);
+
+/*
+ * Build a list of unique keys
+ */
+List*
+build_uniquekeys(PlannerInfo *root, List *sortclauses)
+{
+	List *result = NIL;
+	List *sortkeys;
+	ListCell *l;
+
+	sortkeys = make_pathkeys_for_uniquekeys(root,
+											sortclauses,
+											root->processed_tlist);
+
+	/* Create a uniquekey and add it to the list */
+	foreach(l, sortkeys)
+	{
+		PathKey    *pathkey = (PathKey *) lfirst(l);
+		EquivalenceClass *ec = pathkey->pk_eclass;
+		UniqueKey *unique_key = make_canonical_uniquekey(root, ec);
+
+		result = lappend(result, unique_key);
+	}
+
+	return result;
+}
+
+/*
+ * uniquekeys_contained_in
+ *	  Are the keys2 included in the keys1 superset
+ */
+bool
+uniquekeys_contained_in(List *keys1, List *keys2)
+{
+	ListCell   *key1,
+			   *key2;
+
+	/*
+	 * Fall out quickly if we are passed two identical lists.  This mostly
+	 * catches the case where both are NIL, but that's common enough to
+	 * warrant the test.
+	 */
+	if (keys1 == keys2)
+		return true;
+
+	foreach(key2, keys2)
+	{
+		bool found = false;
+		UniqueKey  *uniquekey2 = (UniqueKey *) lfirst(key2);
+
+		foreach(key1, keys1)
+		{
+			UniqueKey  *uniquekey1 = (UniqueKey *) lfirst(key1);
+
+			if (uniquekey1->eq_clause == uniquekey2->eq_clause)
+			{
+				found = true;
+				break;
+			}
+		}
+
+		if (!found)
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * has_useful_uniquekeys
+ *		Detect whether the planner could have any uniquekeys that are
+ *		useful.
+ */
+bool
+has_useful_uniquekeys(PlannerInfo *root)
+{
+	if (root->query_uniquekeys != NIL)
+		return true;	/* there are some */
+	return false;		/* definitely useless */
+}
+
+/*
+ * make_canonical_uniquekey
+ *	  Given the parameters for a UniqueKey, find any pre-existing matching
+ *	  uniquekey in the query's list of "canonical" uniquekeys.  Make a new
+ *	  entry if there's not one already.
+ *
+ * Note that this function must not be used until after we have completed
+ * merging EquivalenceClasses.  (We don't try to enforce that here; instead,
+ * equivclass.c will complain if a merge occurs after root->canon_uniquekeys
+ * has become nonempty.)
+ */
+static UniqueKey *
+make_canonical_uniquekey(PlannerInfo *root,
+						 EquivalenceClass *eclass)
+{
+	UniqueKey  *uk;
+	ListCell   *lc;
+	MemoryContext oldcontext;
+
+	/* The passed eclass might be non-canonical, so chase up to the top */
+	while (eclass->ec_merged)
+		eclass = eclass->ec_merged;
+
+	foreach(lc, root->canon_uniquekeys)
+	{
+		uk = (UniqueKey *) lfirst(lc);
+		if (eclass == uk->eq_clause)
+			return uk;
+	}
+
+	/*
+	 * Be sure canonical uniquekeys are allocated in the main planning context.
+	 * Not an issue in normal planning, but it is for GEQO.
+	 */
+	oldcontext = MemoryContextSwitchTo(root->planner_cxt);
+
+	uk = makeNode(UniqueKey);
+	uk->eq_clause = eclass;
+
+	root->canon_uniquekeys = lappend(root->canon_uniquekeys, uk);
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return uk;
+}
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 8634940efc..dd64775d8f 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -511,6 +511,7 @@ minmax_qp_callback(PlannerInfo *root, void *extra)
 									  root->parse->targetList);
 
 	root->query_pathkeys = root->sort_pathkeys;
+	root->query_uniquekeys = NIL;
 }
 
 /*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..3a372af91b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -70,6 +70,7 @@ query_planner(PlannerInfo *root,
 	root->join_rel_level = NULL;
 	root->join_cur_level = 0;
 	root->canon_pathkeys = NIL;
+	root->canon_uniquekeys = NIL;
 	root->left_join_clauses = NIL;
 	root->right_join_clauses = NIL;
 	root->full_join_clauses = NIL;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d6f2153593..984fca0696 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3657,15 +3657,30 @@ standard_qp_callback(PlannerInfo *root, void *extra)
 	 * much easier, since we know that the parser ensured that one is a
 	 * superset of the other.
 	 */
+	root->query_uniquekeys = NIL;
+
 	if (root->group_pathkeys)
+	{
 		root->query_pathkeys = root->group_pathkeys;
+
+		if (!root->parse->hasAggs)
+			root->query_uniquekeys = build_uniquekeys(root, qp_extra->groupClause);
+	}
 	else if (root->window_pathkeys)
 		root->query_pathkeys = root->window_pathkeys;
 	else if (list_length(root->distinct_pathkeys) >
 			 list_length(root->sort_pathkeys))
+	{
 		root->query_pathkeys = root->distinct_pathkeys;
+		root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else if (root->sort_pathkeys)
+	{
 		root->query_pathkeys = root->sort_pathkeys;
+
+		if (root->distinct_pathkeys)
+			root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else
 		root->query_pathkeys = NIL;
 }
@@ -6222,7 +6237,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
 
 	/* Estimate the cost of index scan */
 	indexScanPath = create_index_path(root, indexInfo,
-									  NIL, NIL, NIL, NIL,
+									  NIL, NIL, NIL, NIL, NIL,
 									  ForwardScanDirection, false,
 									  NULL, 1.0, false);
 
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e6d08aede5..a006dbbe9c 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -940,6 +940,7 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = parallel_workers;
 	pathnode->pathkeys = NIL;	/* seqscan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_seqscan(pathnode, root, rel, pathnode->param_info);
 
@@ -964,6 +965,7 @@ create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* samplescan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_samplescan(pathnode, root, rel, pathnode->param_info);
 
@@ -1000,6 +1002,7 @@ create_index_path(PlannerInfo *root,
 				  List *indexorderbys,
 				  List *indexorderbycols,
 				  List *pathkeys,
+				  List *uniquekeys,
 				  ScanDirection indexscandir,
 				  bool indexonly,
 				  Relids required_outer,
@@ -1018,6 +1021,7 @@ create_index_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = 0;
 	pathnode->path.pathkeys = pathkeys;
+	pathnode->path.uniquekeys = uniquekeys;
 
 	pathnode->indexinfo = index;
 	pathnode->indexclauses = indexclauses;
@@ -1061,6 +1065,7 @@ create_bitmap_heap_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_degree;
 	pathnode->path.pathkeys = NIL;	/* always unordered */
+	pathnode->path.uniquekeys = NIL;
 
 	pathnode->bitmapqual = bitmapqual;
 
@@ -1922,6 +1927,7 @@ create_functionscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = pathkeys;
+	pathnode->uniquekeys = NIL;
 
 	cost_functionscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1948,6 +1954,7 @@ create_tablefuncscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_tablefuncscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1974,6 +1981,7 @@ create_valuesscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_valuesscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1999,6 +2007,7 @@ create_ctescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* XXX for now, result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2025,6 +2034,7 @@ create_namedtuplestorescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_namedtuplestorescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2051,6 +2061,7 @@ create_resultscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_resultscan(pathnode, root, rel, pathnode->param_info);
 
@@ -2077,6 +2088,7 @@ create_worktablescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	/* Cost is the same as for a regular CTE scan */
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index baced7eec0..a1511b46ea 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 3d3be197e0..4e329f0fb5 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -269,6 +269,8 @@ struct PlannerInfo
 
 	List	   *canon_pathkeys; /* list of "canonical" PathKeys */
 
+	List	   *canon_uniquekeys; /* list of "canonical" UniqueKeys */
+
 	List	   *left_join_clauses;	/* list of RestrictInfos for mergejoinable
 									 * outer join clauses w/nonnullable var on
 									 * left */
@@ -297,6 +299,8 @@ struct PlannerInfo
 
 	List	   *query_pathkeys; /* desired pathkeys for query_planner() */
 
+	List	   *query_uniquekeys; /* unique keys used for the query */
+
 	List	   *group_pathkeys; /* groupClause pathkeys, if any */
 	List	   *window_pathkeys;	/* pathkeys of bottom window, if any */
 	List	   *distinct_pathkeys;	/* distinctClause pathkeys, if any */
@@ -1077,6 +1081,15 @@ typedef struct ParamPathInfo
 	List	   *ppi_clauses;	/* join clauses available from outer rels */
 } ParamPathInfo;
 
+/*
+ * UniqueKey
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+
+	EquivalenceClass *eq_clause;	/* equivalence class */
+} UniqueKey;
 
 /*
  * Type "Path" is used as-is for sequential-scan paths, as well as some other
@@ -1106,6 +1119,9 @@ typedef struct ParamPathInfo
  *
  * "pathkeys" is a List of PathKey nodes (see above), describing the sort
  * ordering of the path's output rows.
+ *
+ * "uniquekeys", if not NIL, is a list of UniqueKey nodes (see above),
+ * describing the XXX.
  */
 typedef struct Path
 {
@@ -1129,6 +1145,8 @@ typedef struct Path
 
 	List	   *pathkeys;		/* sort ordering of path's output */
 	/* pathkeys is a List of PathKey nodes; see above */
+
+	List	   *uniquekeys;	/* the unique keys, or NIL if none */
 } Path;
 
 /* Macro for extracting a path's parameterization relids; beware double eval */
diff --git a/src/include/nodes/print.h b/src/include/nodes/print.h
index 6126b491bf..006248bfb5 100644
--- a/src/include/nodes/print.h
+++ b/src/include/nodes/print.h
@@ -28,6 +28,7 @@ extern char *pretty_format_node_dump(const char *dump);
 extern void print_rt(const List *rtable);
 extern void print_expr(const Node *expr, const List *rtable);
 extern void print_pathkeys(const List *pathkeys, const List *rtable);
+extern void print_uniquekeys(const List *uniquekeys, const List *rtable);
 extern void print_tl(const List *tlist, const List *rtable);
 extern void print_slot(TupleTableSlot *slot);
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e450fe112a..f75ff6f323 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -44,6 +44,7 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									List *indexorderbys,
 									List *indexorderbycols,
 									List *pathkeys,
+									List *uniquekeys,
 									ScanDirection indexscandir,
 									bool indexonly,
 									Relids required_outer,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ab73bd20c..5b6be383b3 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -214,6 +214,9 @@ extern List *build_join_pathkeys(PlannerInfo *root,
 extern List *make_pathkeys_for_sortclauses(PlannerInfo *root,
 										   List *sortclauses,
 										   List *tlist);
+extern List *make_pathkeys_for_uniquekeys(PlannerInfo *root,
+										  List *sortclauses,
+										  List *tlist);
 extern void initialize_mergeclause_eclasses(PlannerInfo *root,
 											RestrictInfo *restrictinfo);
 extern void update_mergeclause_eclasses(PlannerInfo *root,
@@ -240,4 +243,12 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+/*
+ * uniquekey.c
+ *	  Utilities for matching and building unique keys
+ */
+extern List *build_uniquekeys(PlannerInfo *root, List *sortclauses);
+extern bool uniquekeys_contained_in(List *keys1, List *keys2);
+extern bool has_useful_uniquekeys(PlannerInfo *root);
+
 #endif							/* PATHS_H */
-- 
2.21.0

v31-0002-Index-skip-scan.patchtext/x-diff; charset=us-asciiDownload

From 74d154665162b04ed7df2fd4210ff50846436547 Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Fri, 15 Nov 2019 09:46:53 -0500
Subject: [PATCH v31 2/2] Index skip scan

Implementation of Index Skip Scan (see Loose Index Scan in the wiki [1])
on top of IndexOnlyScan and IndexScan. To make it suitable for both
situations when there are small number of distinct values and
significant amount of distinct values the following approach is taken -
instead of searching from the root for every value we're searching for
then first on the current page, and then if not found continue searching
from the root.

Original patch and design were proposed by Thomas Munro [2], revived and
improved by Dmitry Dolgov and Jesper Pedersen.

[1] https://wiki.postgresql.org/wiki/Loose_indexscan
[2] https://www.postgresql.org/message-id/flat/CADLWmXXbTSBxP-MzJuPAYSsL_2f0iPm5VWPbCvDbVvfX93FKkw%40mail.gmail.com

Author: Jesper Pedersen, Dmitry Dolgov
Reviewed-by: Thomas Munro, David Rowley, Floris Van Nee, Kyotaro Horiguchi, Tomas Vondra, Peter Geoghegan
---
 contrib/bloom/blutils.c                       |   1 +
 doc/src/sgml/config.sgml                      |  15 +
 doc/src/sgml/indexam.sgml                     |  63 +++
 doc/src/sgml/indices.sgml                     |  23 +
 src/backend/access/brin/brin.c                |   1 +
 src/backend/access/gin/ginutil.c              |   1 +
 src/backend/access/gist/gist.c                |   1 +
 src/backend/access/hash/hash.c                |   1 +
 src/backend/access/index/indexam.c            |  18 +
 src/backend/access/nbtree/nbtree.c            |  13 +
 src/backend/access/nbtree/nbtsearch.c         | 425 +++++++++++++++
 src/backend/access/spgist/spgutils.c          |   1 +
 src/backend/commands/explain.c                |  25 +
 src/backend/executor/nodeIndexonlyscan.c      |  51 +-
 src/backend/executor/nodeIndexscan.c          |  56 +-
 src/backend/nodes/copyfuncs.c                 |   2 +
 src/backend/nodes/outfuncs.c                  |   2 +
 src/backend/nodes/readfuncs.c                 |   2 +
 src/backend/optimizer/path/costsize.c         |   1 +
 src/backend/optimizer/plan/createplan.c       |  20 +-
 src/backend/optimizer/plan/planner.c          |  76 +++
 src/backend/optimizer/util/pathnode.c         |  40 ++
 src/backend/optimizer/util/plancat.c          |   1 +
 src/backend/utils/misc/guc.c                  |   9 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/amapi.h                    |   8 +
 src/include/access/genam.h                    |   2 +
 src/include/access/nbtree.h                   |   7 +
 src/include/access/sdir.h                     |   7 +
 src/include/nodes/execnodes.h                 |   6 +
 src/include/nodes/pathnodes.h                 |   5 +
 src/include/nodes/plannodes.h                 |   4 +
 src/include/optimizer/cost.h                  |   1 +
 src/include/optimizer/pathnode.h              |   5 +
 src/test/regress/expected/select_distinct.out | 505 ++++++++++++++++++
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/sql/select_distinct.sql      | 186 +++++++
 37 files changed, 1577 insertions(+), 11 deletions(-)

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 0104d02f67..a018b7f3d0 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -133,6 +133,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = blbulkdelete;
 	amroutine->amvacuumcleanup = blvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = blcostestimate;
 	amroutine->amoptions = bloptions;
 	amroutine->amproperty = NULL;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5d45b6f7cb..f15bc20c20 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4517,6 +4517,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-indexskipscan" xreflabel="enable_indexskipscan">
+      <term><varname>enable_indexskipscan</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_indexskipscan</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of index-skip-scan plan
+        types (see <xref linkend="indexes-index-skip-scans"/>). The default is
+        <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-material" xreflabel="enable_material">
       <term><varname>enable_material</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 37f8d8760a..a726d80878 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -148,6 +148,7 @@ typedef struct IndexAmRoutine
     amendscan_function amendscan;
     ammarkpos_function ammarkpos;       /* can be NULL */
     amrestrpos_function amrestrpos;     /* can be NULL */
+    amskip_function amskip;             /* can be NULL */
 
     /* interface functions to support parallel index scans */
     amestimateparallelscan_function amestimateparallelscan;    /* can be NULL */
@@ -691,6 +692,68 @@ amrestrpos (IndexScanDesc scan);
 
   <para>
 <programlisting>
+bool
+amskip (IndexScanDesc scan,
+        ScanDirection direction,
+        ScanDirection indexdir,
+        bool scanstart,
+        int prefix);
+</programlisting>
+  Skip past all tuples where the first 'prefix' columns have the same value as
+  the last tuple returned in the current scan. The arguments are:
+
+   <variablelist>
+    <varlistentry>
+     <term><parameter>scan</parameter></term>
+     <listitem>
+      <para>
+       Index scan information
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>direction</parameter></term>
+     <listitem>
+      <para>
+       The direction in which data is advancing.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>indexdir</parameter></term>
+     <listitem>
+      <para>
+        The index direction, in which data must be read.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>scanstart</parameter></term>
+     <listitem>
+      <para>
+        Whether or not it is a start of the scan.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>prefix</parameter></term>
+     <listitem>
+      <para>
+        Distinct prefix size.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+
+  </para>
+
+  <para>
+<programlisting>
 Size
 amestimateparallelscan (void);
 </programlisting>
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index c54bf0dbbd..c429d98fc7 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1254,6 +1254,29 @@ SELECT target FROM tests WHERE subject = 'some-subject' AND success;
    and later will recognize such cases and allow index-only scans to be
    generated, but older versions will not.
   </para>
+
+  <sect2 id="indexes-index-skip-scans">
+    <title>Index Skip Scans</title>
+
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index</primary>
+      <secondary>index-skip scans</secondary>
+    </indexterm>
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index-skip scan</primary>
+    </indexterm>
+
+    <para>
+     When the rows retrieved from an index scan are then deduplicated by
+     eliminating rows matching on a prefix of index keys (e.g. when using
+     <literal>SELECT DISTINCT</literal>), the planner will consider
+     skipping groups of rows with a matching key prefix. When a row with
+     a particular prefix is found, remaining rows with the same key prefix
+     are skipped.  The larger the number of rows with the same key prefix
+     rows (i.e. the lower the number of distinct key prefixes in the index),
+     the more efficient this is.
+    </para>
+  </sect2>
  </sect1>
 
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2e8f67ef10..4db31bb211 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -113,6 +113,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = brinbulkdelete;
 	amroutine->amvacuumcleanup = brinvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = brincostestimate;
 	amroutine->amoptions = brinoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index a7e55caf28..8dd1d30d2a 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -65,6 +65,7 @@ ginhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = ginbulkdelete;
 	amroutine->amvacuumcleanup = ginvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = gincostestimate;
 	amroutine->amoptions = ginoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index aefc302ed2..8c692f7fb4 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -86,6 +86,7 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = gistbulkdelete;
 	amroutine->amvacuumcleanup = gistvacuumcleanup;
 	amroutine->amcanreturn = gistcanreturn;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = gistcostestimate;
 	amroutine->amoptions = gistoptions;
 	amroutine->amproperty = gistproperty;
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 4871b7ff4d..e5fa4c7864 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -83,6 +83,7 @@ hashhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = hashbulkdelete;
 	amroutine->amvacuumcleanup = hashvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = hashcostestimate;
 	amroutine->amoptions = hashoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 01539b6bd6..1047a35ade 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -33,6 +33,7 @@
  *		index_can_return	- does index support index-only scans?
  *		index_getprocid - get a support procedure OID
  *		index_getprocinfo - get a support procedure's lookup info
+ *		index_skip		- advance past duplicate key values in a scan
  *
  * NOTES
  *		This file contains the index_ routines which used
@@ -730,6 +731,23 @@ index_can_return(Relation indexRelation, int attno)
 	return indexRelation->rd_indam->amcanreturn(indexRelation, attno);
 }
 
+/* ----------------
+ *		index_skip
+ *
+ *		Skip past all tuples where the first 'prefix' columns have the
+ *		same value as the last tuple returned in the current scan.
+ * ----------------
+ */
+bool
+index_skip(IndexScanDesc scan, ScanDirection direction,
+		   ScanDirection indexdir, bool scanstart, int prefix)
+{
+	SCAN_CHECKS;
+
+	return scan->indexRelation->rd_indam->amskip(scan, direction,
+												 indexdir, scanstart, prefix);
+}
+
 /* ----------------
  *		index_getprocid
  *
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 5254bc7ef5..8fde56fe60 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -132,6 +132,7 @@ bthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = btbulkdelete;
 	amroutine->amvacuumcleanup = btvacuumcleanup;
 	amroutine->amcanreturn = btcanreturn;
+	amroutine->amskip = btskip;
 	amroutine->amcostestimate = btcostestimate;
 	amroutine->amoptions = btoptions;
 	amroutine->amproperty = btproperty;
@@ -381,6 +382,8 @@ btbeginscan(Relation rel, int nkeys, int norderbys)
 	 */
 	so->currTuples = so->markTuples = NULL;
 
+	so->skipScanKey = NULL;
+
 	scan->xs_itupdesc = RelationGetDescr(rel);
 
 	scan->opaque = so;
@@ -448,6 +451,16 @@ btrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 	_bt_preprocess_array_keys(scan);
 }
 
+/*
+ * btskip() -- skip to the beginning of the next key prefix
+ */
+bool
+btskip(IndexScanDesc scan, ScanDirection direction,
+	   ScanDirection indexdir, bool start, int prefix)
+{
+	return _bt_skip(scan, direction, indexdir, start, prefix);
+}
+
 /*
  *	btendscan() -- close down a scan
  */
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index c573814f01..c5f5d228f2 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -37,6 +37,10 @@ static bool _bt_parallel_readpage(IndexScanDesc scan, BlockNumber blkno,
 static Buffer _bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot);
 static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir);
 static inline void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir);
+static inline void _bt_update_skip_scankeys(IndexScanDesc scan,
+											Relation indexRel);
+static inline bool _bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+										Buffer buf, ScanDirection dir);
 
 
 /*
@@ -1375,6 +1379,376 @@ _bt_next(IndexScanDesc scan, ScanDirection dir)
 	return true;
 }
 
+/*
+ *  _bt_skip() -- Skip items that have the same prefix as the most recently
+ * 				  fetched index tuple.
+ *
+ * 		The current position is set so that a subsequent call to _bt_next will
+ * 		fetch the first tuple that differs in the leading 'prefix' keys.
+ *
+ * 		There are four different kinds of skipping (depending on dir and
+ * 		indexdir, that are important to distinguish, especially in the presense
+ * 		of an index condition:
+ *
+ * 		* Advancing forward and reading forward
+ * 			simple scan
+ *
+ * 		* Advancing forward and reading backward
+ * 			scan inside a cursor fetching backward, when skipping is necessary
+ * 			right from the start
+ *
+ * 		* Advancing backward and reading forward
+ * 			scan with order by desc inside a cursor fetching forward, when
+ * 			skipping is necessary right from the start
+ *
+ * 		* Advancing backward and reading backward
+ * 			simple scan with order by desc
+ *
+ *      The current page is searched for the next unique value. If none is found
+ *      we will do a scan from the root in order to find the next page with
+ *      a unique value.
+ */
+bool
+_bt_skip(IndexScanDesc scan, ScanDirection dir,
+		 ScanDirection indexdir, bool scanstart, int prefix)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+	BTStack stack;
+	Buffer buf;
+	OffsetNumber offnum;
+	BTScanPosItem *currItem;
+	Relation 	 indexRel = scan->indexRelation;
+
+	/* We want to return tuples, and we need a starting point */
+	Assert(scan->xs_want_itup);
+	Assert(scan->xs_itup);
+
+	/* If skipScanKey is NULL then we initialize it with _bt_mkscankey */
+	if (so->skipScanKey == NULL)
+	{
+		so->skipScanKey = _bt_mkscankey(indexRel, scan->xs_itup);
+		so->skipScanKey->keysz = prefix;
+		so->skipScanKey->scantid = NULL;
+	}
+	so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+	_bt_update_skip_scankeys(scan, indexRel);
+
+	/* Check if the next unique key can be found within the current page.
+	 * Since we do not lock the current page between jumps, it's possible
+	 * that it was splitted since the last time we saw it. This is fine in
+	 * case of scanning forward, since page split to the right and we are
+	 * still on the left most page. In case of scanning backwards it's
+	 * possible to loose some pages and we need to remember the previous
+	 * page, and then follow the right link from the current page until we
+	 * find the original one.
+	 *
+	 * Since the whole idea of checking the current page is to protect
+	 * ourselves and make more performant statistic mismatch case when
+	 * there are too many distinct values for jumping, it's not clear if
+	 * the complexity of this solution in case of backward scan is
+	 * justified, so for now just avoid it.
+	 */
+	if (BufferIsValid(so->currPos.buf) && ScanDirectionIsForward(dir))
+	{
+		LockBuffer(so->currPos.buf, BT_READ);
+
+		if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
+		{
+			bool keyFound = false;
+
+			offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, so->currPos.buf);
+
+			/* Lock the page for SERIALIZABLE transactions */
+			PredicateLockPage(scan->indexRelation, BufferGetBlockNumber(so->currPos.buf),
+							  scan->xs_snapshot);
+
+			/* We know in which direction to look */
+			_bt_initialize_more_data(so, dir);
+
+			/* Now read the data */
+			keyFound = _bt_readpage(scan, dir, offnum);
+
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			ReleaseBuffer(so->currPos.buf);
+			so->currPos.buf = InvalidBuffer;
+
+			if (keyFound)
+			{
+				/* set IndexTuple */
+				currItem = &so->currPos.items[so->currPos.itemIndex];
+				scan->xs_heaptid = currItem->heapTid;
+				scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+				return true;
+			}
+		}
+		else
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+	}
+
+	if (BufferIsValid(so->currPos.buf))
+	{
+		ReleaseBuffer(so->currPos.buf);
+		so->currPos.buf = InvalidBuffer;
+	}
+
+	/*
+	 * We haven't found scan key within the current page, so let's scan from
+	 * the root. Use _bt_search and _bt_binsrch to get the buffer and offset
+	 * number
+	 */
+	so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+	stack = _bt_search(scan->indexRelation, so->skipScanKey,
+					   &buf, BT_READ, scan->xs_snapshot);
+	_bt_freestack(stack);
+	so->currPos.buf = buf;
+	offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+
+	/* Lock the page for SERIALIZABLE transactions */
+	PredicateLockPage(scan->indexRelation, BufferGetBlockNumber(buf),
+					  scan->xs_snapshot);
+
+	/* We know in which direction to look */
+	_bt_initialize_more_data(so, dir);
+
+	/*
+	 * Simplest case is when both directions are forward, when we are already
+	 * at the next distinct key at the beginning of the series (so everything
+	 * else would be done in _bt_readpage)
+	 *
+	 * The case when both directions are backwards is also simple, but we need
+	 * to go one step back, since we need a last element from the previous
+	 * series.
+	 */
+	if (ScanDirectionIsBackward(dir) && ScanDirectionIsBackward(indexdir))
+		 offnum = OffsetNumberPrev(offnum);
+
+	/*
+	 * Andvance backward but read forward. At this moment we are at the next
+	 * distinct key at the beginning of the series. In case if scan just
+	 * started, we can read forward without doing anything else. Otherwise
+	 * find previous distinct key and the beginning of it's series and read
+	 * forward from there. To do so, go back one step, perform binary search
+	 * to find the first item in the series and let _bt_readpage do everything
+	 * else.
+	 */
+	else if (ScanDirectionIsBackward(dir) && ScanDirectionIsForward(indexdir))
+	{
+		if (!scanstart)
+		{
+			offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+
+			/* One step back to find a previous value */
+			_bt_readpage(scan, dir, offnum);
+
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			if (_bt_next(scan, dir))
+			{
+				LockBuffer(so->currPos.buf, BT_READ);
+				_bt_update_skip_scankeys(scan, indexRel);
+
+				/*
+				 * And now find the last item from the sequence for the
+				 * current, value with the intention do OffsetNumberNext. As a
+				 * result we end up on a first element from the sequence.
+				 */
+				if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
+					offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				else
+				{
+					if (BufferIsValid(so->currPos.buf))
+					{
+						LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+						ReleaseBuffer(so->currPos.buf);
+						so->currPos.buf = InvalidBuffer;
+					}
+
+					stack = _bt_search(scan->indexRelation, so->skipScanKey,
+									   &buf, BT_READ, scan->xs_snapshot);
+					_bt_freestack(stack);
+					so->currPos.buf = buf;
+					offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				}
+			}
+			else
+			{
+				pfree(so->skipScanKey);
+				so->skipScanKey = NULL;
+				return false;
+			}
+		}
+	}
+
+	/*
+	 * Advance forward but read backward. At this moment we are at the next
+	 * distinct key at the beginning of the series. In case if scan just
+	 * started, we can go one step back and read forward without doing
+	 * anything else. Otherwise find the next distinct key and the beginning
+	 * of it's series, go one step back and read backward from there.
+	 *
+	 * An interesting situation can happen if one of distinct keys do not pass
+	 * a corresponding index condition at all. In this case reading backward
+	 * can lead to a previous distinct key being found, creating a loop. To
+	 * avoid that check the value to be returned, and jump one more time if
+	 * it's the same as at the beginning.
+	 */
+	else if (ScanDirectionIsForward(dir) && ScanDirectionIsBackward(indexdir))
+	{
+		if (scanstart)
+			offnum = OffsetNumberPrev(offnum);
+		else
+		{
+			OffsetNumber nextOffset,
+						startOffset;
+
+			IndexTuple startItup = CopyIndexTuple(scan->xs_itup);
+			Page page = BufferGetPage(so->currPos.buf);
+
+			/* We are at the end and need to return */
+			if ((offnum > PageGetMaxOffsetNumber(page)) &
+				(so->currPos.nextPage == P_NONE))
+			{
+				LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+
+				BTScanPosUnpinIfPinned(so->currPos);
+				BTScanPosInvalidate(so->currPos)
+
+				pfree(so->skipScanKey);
+				so->skipScanKey = NULL;
+				return false;
+			}
+
+			nextOffset = startOffset = ItemPointerGetOffsetNumber(&scan->xs_itup->t_tid);
+
+			while (nextOffset == startOffset)
+			{
+				IndexTuple itup;
+
+				/*
+				 * Find a next index tuple to update scan key. It could be at
+				 * the end, so check for max offset
+				 */
+				OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+				ItemId itemid = PageGetItemId(page, Min(offnum, maxoff));
+
+				CHECK_FOR_INTERRUPTS();
+
+				page = BufferGetPage(so->currPos.buf);
+				scan->xs_itup = (IndexTuple) PageGetItem(page, itemid);
+				so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+
+				_bt_update_skip_scankeys(scan, indexRel);
+				LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+
+				if (BufferIsValid(so->currPos.buf))
+				{
+					ReleaseBuffer(so->currPos.buf);
+					so->currPos.buf = InvalidBuffer;
+				}
+
+				stack = _bt_search(scan->indexRelation, so->skipScanKey,
+								   &buf, BT_READ, scan->xs_snapshot);
+				_bt_freestack(stack);
+				so->currPos.buf = buf;
+				offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				offnum = OffsetNumberPrev(offnum);
+
+				/* Check if _bt_readpage returns already found item */
+				if (!_bt_readpage(scan, indexdir, offnum))
+				{
+					/*
+					 * There's no actually-matching data on this page.  Try to
+					 * advance to the next page. Return false if there's no
+					 * matching data at all.
+					 */
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+					if (!_bt_steppage(scan, dir))
+					{
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+				}
+
+				currItem = &so->currPos.items[so->currPos.lastItem];
+				itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+				nextOffset = ItemPointerGetOffsetNumber(&itup->t_tid);
+
+				/*
+				 * To check if we returned the same tuple, try to find a
+				 * startItup on the current page. For that we need to update
+				 * scankey to match the whole tuple and set nextkey to return
+				 * an exact tuple, not the next one. If the nextOffset is the
+				 * same as before, it means we are in the loop, return offnum
+				 * to the original position and jump further
+				 */
+				scan->xs_itup = startItup;
+				_bt_update_skip_scankeys(scan, indexRel);
+
+				so->skipScanKey->keysz = IndexRelationGetNumberOfKeyAttributes(indexRel);
+				so->skipScanKey->nextkey = false;
+
+				if (_bt_scankey_within_page(scan, so->skipScanKey,
+											so->currPos.buf, dir))
+				{
+					startOffset = _bt_binsrch(scan->indexRelation,
+											  so->skipScanKey,
+											  so->currPos.buf);
+
+					page = BufferGetPage(so->currPos.buf);
+					maxoff = PageGetMaxOffsetNumber(page);
+
+					if (nextOffset == startOffset)
+						offnum = OffsetNumberNext(offnum);
+
+					if ((offnum > maxoff) & (so->currPos.nextPage == P_NONE))
+					{
+						LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+
+						BTScanPosUnpinIfPinned(so->currPos);
+						BTScanPosInvalidate(so->currPos)
+
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+				}
+
+				/* Return original scankey options */
+				so->skipScanKey->keysz = prefix;
+				so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+			}
+		}
+	}
+
+	/* Now read the data */
+	if (!_bt_readpage(scan, indexdir, offnum))
+	{
+		/*
+		 * There's no actually-matching data on this page.  Try to advance to
+		 * the next page.  Return false if there's no matching data at all.
+		 */
+		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		if (!_bt_steppage(scan, dir))
+		{
+			pfree(so->skipScanKey);
+			so->skipScanKey = NULL;
+			return false;
+		}
+	}
+	else
+	{
+		/* Drop the lock, and maybe the pin, on the current page */
+		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+	}
+
+	/* And set IndexTuple */
+	currItem = &so->currPos.items[so->currPos.itemIndex];
+	scan->xs_heaptid = currItem->heapTid;
+	scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+
+	return true;
+}
+
 /*
  *	_bt_readpage() -- Load data from current index page into so->currPos
  *
@@ -2246,3 +2620,54 @@ _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir)
 	so->numKilled = 0;			/* just paranoia */
 	so->markItemIndex = -1;		/* ditto */
 }
+
+/*
+ * _bt_update_skip_scankeys() -- set up a new values for the existing scankeys
+ * 								 based on the current index tuple
+ */
+static inline void
+_bt_update_skip_scankeys(IndexScanDesc scan, Relation indexRel)
+{
+	TupleDesc		itupdesc;
+	int			indnkeyatts,
+				i;
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	ScanKey			scankeys = so->skipScanKey->scankeys;
+
+	itupdesc = RelationGetDescr(indexRel);
+	indnkeyatts = IndexRelationGetNumberOfKeyAttributes(indexRel);
+	for (i = 0; i < indnkeyatts; i++)
+	{
+		Datum datum;
+		bool null;
+		int flags;
+
+		datum = index_getattr(scan->xs_itup, i + 1, itupdesc, &null);
+		flags = (null ? SK_ISNULL : 0) |
+				(indexRel->rd_indoption[i] << SK_BT_INDOPTION_SHIFT);
+		scankeys[i].sk_flags = flags;
+		scankeys[i].sk_argument = datum;
+	}
+}
+
+/*
+ * _bt_scankey_within_page() -- check if the provided scankey could be found
+ * 								within a page, specified by the buffer.
+ */
+static inline bool
+_bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+						Buffer buf, ScanDirection dir)
+{
+	OffsetNumber low, high;
+	Page page = BufferGetPage(buf);
+	BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+
+	low = P_FIRSTDATAKEY(opaque);
+	high = PageGetMaxOffsetNumber(page);
+
+	if (unlikely(high < low))
+		return false;
+
+	return (_bt_compare(scan->indexRelation, key, page, low) > 0 &&
+			_bt_compare(scan->indexRelation, key, page, high) < 1);
+}
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 4924ae1c59..fa09a4685e 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -68,6 +68,7 @@ spghandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = spgbulkdelete;
 	amroutine->amvacuumcleanup = spgvacuumcleanup;
 	amroutine->amcanreturn = spgcanreturn;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = spgcostestimate;
 	amroutine->amoptions = spgoptions;
 	amroutine->amproperty = spgproperty;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d189b8d573..60cd801247 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -130,6 +130,7 @@ static void ExplainDummyGroup(const char *objtype, const char *labelname,
 static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
+static void ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
 
 
@@ -1058,6 +1059,22 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 	return planstate_tree_walker(planstate, ExplainPreScanNode, rels_used);
 }
 
+/*
+ * ExplainIndexSkipScanKeys -
+ *	  Append information about index skip scan to es->str.
+ *
+ * Can be used to print the skip prefix size.
+ */
+static void
+ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es)
+{
+	if (skipPrefixSize > 0)
+	{
+		if (es->format != EXPLAIN_FORMAT_TEXT)
+			ExplainPropertyInteger("Distinct Prefix", NULL, skipPrefixSize, es);
+	}
+}
+
 /*
  * ExplainNode -
  *	  Appends a description of a plan tree to es->str
@@ -1380,6 +1397,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
 
+				ExplainIndexSkipScanKeys(indexscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexscan->indexid,
 										indexscan->indexorderdir,
 										es);
@@ -1390,6 +1409,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) plan;
 
+				ExplainIndexSkipScanKeys(indexonlyscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexonlyscan->indexid,
 										indexonlyscan->indexorderdir,
 										es);
@@ -1599,6 +1620,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	switch (nodeTag(plan))
 	{
 		case T_IndexScan:
+			if (((IndexScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyBool("Skip scan", true, es);
 			show_scan_qual(((IndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexScan *) plan)->indexqualorig)
@@ -1612,6 +1635,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			break;
 		case T_IndexOnlyScan:
+			if (((IndexOnlyScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyBool("Skip scan", true, es);
 			show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexOnlyScan *) plan)->indexqual)
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 5617ac29e7..76330f7906 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -65,6 +65,13 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
 	ItemPointer tid;
+	IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) node->ss.ps.plan;
+
+	/*
+	 * tells if the current position was reached via skipping. In this case
+	 * there is no nead for the index_getnext_tid
+	 */
+	bool skipped = false;
 
 	/*
 	 * extract necessary information from index scan node
@@ -72,7 +79,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexOnlyScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexonlyscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -115,14 +122,50 @@ IndexOnlyNext(IndexOnlyScanState *node)
 						 node->ioss_NumOrderByKeys);
 	}
 
+	/*
+	 * Check if we need to skip to the next key prefix, because we've been
+	 * asked to implement DISTINCT.
+	 *
+	 * When fetching a cursor in the direction opposite to a general scan
+	 * direction, the result must be what normal fetching should have
+	 * returned, but in reversed order. In other words, return the last or
+	 * first scanned tuple in a DISTINCT set, depending on a cursor direction.
+	 * Due to that we skip also when the first tuple wasn't emitted yet, but
+	 * the directions are opposite.
+	 */
+	if (node->ioss_SkipPrefixSize > 0 &&
+		(node->ioss_FirstTupleEmitted ||
+		 ScanDirectionsAreOpposite(direction, indexonlyscan->indexorderdir)))
+	{
+		if (!index_skip(scandesc, direction, indexonlyscan->indexorderdir,
+						!node->ioss_FirstTupleEmitted, node->ioss_SkipPrefixSize))
+		{
+			/*
+			 * Reached end of index. At this point currPos is invalidated, and
+			 * we need to reset ioss_FirstTupleEmitted, since otherwise after
+			 * going backwards, reaching the end of index, and going forward
+			 * again we apply skip again. It would be incorrect and lead to an
+			 * extra skipped item.
+			 */
+			node->ioss_FirstTupleEmitted = false;
+			return ExecClearTuple(slot);
+		}
+		else
+		{
+			skipped = true;
+			tid = &scandesc->xs_heaptid;
+		}
+	}
+
 	/*
 	 * OK, now that we have what we need, fetch the next tuple.
 	 */
-	while ((tid = index_getnext_tid(scandesc, direction)) != NULL)
+	while (skipped || (tid = index_getnext_tid(scandesc, direction)) != NULL)
 	{
 		bool		tuple_from_heap = false;
 
 		CHECK_FOR_INTERRUPTS();
+		skipped = false;
 
 		/*
 		 * We can skip the heap fetch if the TID references a heap page on
@@ -250,6 +293,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 							  ItemPointerGetBlockNumber(tid),
 							  estate->es_snapshot);
 
+		node->ioss_FirstTupleEmitted = true;
+
 		return slot;
 	}
 
@@ -504,6 +549,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
 	indexstate->ss.ps.ExecProcNode = ExecIndexOnlyScan;
+	indexstate->ioss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->ioss_FirstTupleEmitted = false;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index d0a96a38e0..449aaec3ac 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -85,6 +85,13 @@ IndexNext(IndexScanState *node)
 	ScanDirection direction;
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
+	IndexScan *indexscan = (IndexScan *) node->ss.ps.plan;
+
+	/*
+	 * tells if the current position was reached via skipping. In this case
+	 * there is no nead for the index_getnext_tid
+	 */
+	bool skipped = false;
 
 	/*
 	 * extract necessary information from index scan node
@@ -92,7 +99,7 @@ IndexNext(IndexScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -117,6 +124,12 @@ IndexNext(IndexScanState *node)
 
 		node->iss_ScanDesc = scandesc;
 
+		/* Index skip scan assumes xs_want_itup, so set it to true */
+		if (indexscan->indexskipprefixsize > 0)
+			node->iss_ScanDesc->xs_want_itup = true;
+		else
+			node->iss_ScanDesc->xs_want_itup = false;
+
 		/*
 		 * If no run-time keys to calculate or they are ready, go ahead and
 		 * pass the scankeys to the index AM.
@@ -127,12 +140,48 @@ IndexNext(IndexScanState *node)
 						 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
 	}
 
+	/*
+	 * Check if we need to skip to the next key prefix, because we've been
+	 * asked to implement DISTINCT.
+	 *
+	 * When fetching a cursor in the direction opposite to a general scan
+	 * direction, the result must be what normal fetching should have
+	 * returned, but in reversed order. In other words, return the last or
+	 * first scanned tuple in a DISTINCT set, depending on a cursor direction.
+	 * Due to that we skip also when the first tuple wasn't emitted yet, but
+	 * the directions are opposite.
+	 */
+	if (node->iss_SkipPrefixSize > 0 &&
+		(node->iss_FirstTupleEmitted ||
+		 ScanDirectionsAreOpposite(direction, indexscan->indexorderdir)))
+	{
+		if (!index_skip(scandesc, direction, indexscan->indexorderdir,
+					   !node->iss_FirstTupleEmitted, node->iss_SkipPrefixSize))
+		{
+			/*
+			 * Reached end of index. At this point currPos is invalidated, and
+			 * we need to reset iss_FirstTupleEmitted, since otherwise after
+			 * going backwards, reaching the end of index, and going forward
+			 * again we apply skip again. It would be incorrect and lead to an
+			 * extra skipped item.
+			 */
+			node->iss_FirstTupleEmitted = false;
+			return ExecClearTuple(slot);
+		}
+		else
+		{
+			skipped = true;
+			index_fetch_heap(scandesc, slot);
+		}
+	}
+
 	/*
 	 * ok, now that we have what we need, fetch the next tuple.
 	 */
-	while (index_getnext_slot(scandesc, direction, slot))
+	while (skipped || index_getnext_slot(scandesc, direction, slot))
 	{
 		CHECK_FOR_INTERRUPTS();
+		skipped = false;
 
 		/*
 		 * If the index was lossy, we have to recheck the index quals using
@@ -149,6 +198,7 @@ IndexNext(IndexScanState *node)
 			}
 		}
 
+		node->iss_FirstTupleEmitted = true;
 		return slot;
 	}
 
@@ -910,6 +960,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
 	indexstate->ss.ps.ExecProcNode = ExecIndexScan;
+	indexstate->iss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->iss_FirstTupleEmitted = false;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 54ad62bb7f..e0cfd710c4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -493,6 +493,7 @@ _copyIndexScan(const IndexScan *from)
 	COPY_NODE_FIELD(indexorderbyorig);
 	COPY_NODE_FIELD(indexorderbyops);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
 
 	return newnode;
 }
@@ -518,6 +519,7 @@ _copyIndexOnlyScan(const IndexOnlyScan *from)
 	COPY_NODE_FIELD(indexorderby);
 	COPY_NODE_FIELD(indextlist);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 16083e7a7e..5f723cda4b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -562,6 +562,7 @@ _outIndexScan(StringInfo str, const IndexScan *node)
 	WRITE_NODE_FIELD(indexorderbyorig);
 	WRITE_NODE_FIELD(indexorderbyops);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
 }
 
 static void
@@ -576,6 +577,7 @@ _outIndexOnlyScan(StringInfo str, const IndexOnlyScan *node)
 	WRITE_NODE_FIELD(indexorderby);
 	WRITE_NODE_FIELD(indextlist);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 551ce6c41c..028d03a56d 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1820,6 +1820,7 @@ _readIndexScan(void)
 	READ_NODE_FIELD(indexorderbyorig);
 	READ_NODE_FIELD(indexorderbyops);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
 
 	READ_DONE();
 }
@@ -1839,6 +1840,7 @@ _readIndexOnlyScan(void)
 	READ_NODE_FIELD(indexorderby);
 	READ_NODE_FIELD(indextlist);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b5a0033721..710edf160a 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -124,6 +124,7 @@ int			max_parallel_workers_per_gather = 2;
 bool		enable_seqscan = true;
 bool		enable_indexscan = true;
 bool		enable_indexonlyscan = true;
+bool		enable_indexskipscan = true;
 bool		enable_bitmapscan = true;
 bool		enable_tidscan = true;
 bool		enable_sort = true;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index dff826a828..7b32f2cc7e 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -175,12 +175,14 @@ static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
 								 Oid indexid, List *indexqual, List *indexqualorig,
 								 List *indexorderby, List *indexorderbyorig,
 								 List *indexorderbyops,
-								 ScanDirection indexscandir);
+								 ScanDirection indexscandir,
+								 int skipprefix);
 static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 Index scanrelid, Oid indexid,
 										 List *indexqual, List *indexorderby,
 										 List *indextlist,
-										 ScanDirection indexscandir);
+										 ScanDirection indexscandir,
+										 int skipprefix);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
 											  List *indexqualorig);
@@ -2910,7 +2912,8 @@ create_indexscan_plan(PlannerInfo *root,
 												fixed_indexquals,
 												fixed_indexorderbys,
 												best_path->indexinfo->indextlist,
-												best_path->indexscandir);
+												best_path->indexscandir,
+												best_path->indexskipprefix);
 	else
 		scan_plan = (Scan *) make_indexscan(tlist,
 											qpqual,
@@ -2921,7 +2924,8 @@ create_indexscan_plan(PlannerInfo *root,
 											fixed_indexorderbys,
 											indexorderbys,
 											indexorderbyops,
-											best_path->indexscandir);
+											best_path->indexscandir,
+											best_path->indexskipprefix);
 
 	copy_generic_path_info(&scan_plan->plan, &best_path->path);
 
@@ -5184,7 +5188,8 @@ make_indexscan(List *qptlist,
 			   List *indexorderby,
 			   List *indexorderbyorig,
 			   List *indexorderbyops,
-			   ScanDirection indexscandir)
+			   ScanDirection indexscandir,
+			   int skipPrefixSize)
 {
 	IndexScan  *node = makeNode(IndexScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5201,6 +5206,7 @@ make_indexscan(List *qptlist,
 	node->indexorderbyorig = indexorderbyorig;
 	node->indexorderbyops = indexorderbyops;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
 
 	return node;
 }
@@ -5213,7 +5219,8 @@ make_indexonlyscan(List *qptlist,
 				   List *indexqual,
 				   List *indexorderby,
 				   List *indextlist,
-				   ScanDirection indexscandir)
+				   ScanDirection indexscandir,
+				   int skipPrefixSize)
 {
 	IndexOnlyScan *node = makeNode(IndexOnlyScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5228,6 +5235,7 @@ make_indexonlyscan(List *qptlist,
 	node->indexorderby = indexorderby;
 	node->indextlist = indextlist;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 984fca0696..c84388e6f7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4834,6 +4834,82 @@ create_distinct_paths(PlannerInfo *root,
 												  path,
 												  list_length(root->distinct_pathkeys),
 												  numDistinctRows));
+
+				/* Consider index skip scan as well */
+				if (enable_indexskipscan &&
+					IsA(path, IndexPath) &&
+					((IndexPath *) path)->indexinfo->amcanskip &&
+					root->distinct_pathkeys != NIL)
+				{
+					ListCell   		*lc;
+					IndexOptInfo 	*index = NULL;
+					bool 			different_columns_order = false,
+									not_empty_qual = false;
+					int 			i = 0;
+					int 			distinctPrefixKeys;
+
+					Assert(path->pathtype == T_IndexOnlyScan ||
+						   path->pathtype == T_IndexScan);
+
+					index = ((IndexPath *) path)->indexinfo;
+					distinctPrefixKeys = list_length(root->query_uniquekeys);
+
+					/*
+					 * Normally we can think about distinctPrefixKeys as just
+					 * a number of distinct keys. But if lets say we have a
+					 * distinct key a, and the index contains b, a in exactly
+					 * this order. In such situation we need to use position
+					 * of a in the index as distinctPrefixKeys, otherwise skip
+					 * will happen only by the first column.
+					 */
+					foreach(lc, root->query_uniquekeys)
+					{
+						UniqueKey *uniquekey = (UniqueKey *) lfirst(lc);
+						EquivalenceMember *em =
+							lfirst_node(EquivalenceMember,
+										list_head(uniquekey->eq_clause->ec_members));
+						Var *var = (Var *) em->em_expr;
+
+						Assert(i < index->ncolumns);
+
+						for (i = 0; i < index->ncolumns; i++)
+						{
+							if (index->indexkeys[i] == var->varattno)
+							{
+								distinctPrefixKeys = Max(i + 1, distinctPrefixKeys);
+								break;
+							}
+						}
+					}
+
+					/*
+					 * XXX: In case of index scan quals evaluation happens
+					 * after ExecScanFetch, which means skip results could be
+					 * fitered out. Consider the following query:
+					 *
+					 * 		select distinct (a, b) a, b, c from t where  c < 100;
+					 *
+					 * Skip scan returns one tuple for one distinct set of (a,
+					 * b) with arbitrary one of c, so if the choosed c does
+					 * not match the qual and there is any c that matches the
+					 * qual, we miss that tuple.
+					 */
+					if (path->pathtype == T_IndexScan &&
+						parse->jointree != NULL &&
+						parse->jointree->quals != NULL &&
+						list_length((List *) parse->jointree->quals) != 0)
+							not_empty_qual = true;
+
+					if (!different_columns_order &&	!not_empty_qual)
+					{
+						add_path(distinct_rel, (Path *)
+								 create_skipscan_unique_path(root,
+															 distinct_rel,
+															 path,
+															 distinctPrefixKeys,
+															 numDistinctRows));
+					}
+				}
 			}
 		}
 
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index a006dbbe9c..2fb18fb372 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2915,6 +2915,46 @@ create_upper_unique_path(PlannerInfo *root,
 	return pathnode;
 }
 
+/*
+ * create_skipscan_unique_path
+ *	  Creates a pathnode the same as an existing IndexPath except based on
+ *	  skipping duplicate values.  This may or may not be cheaper than using
+ *	  create_upper_unique_path.
+ *
+ * The input path must be an IndexPath for an index that supports amskip.
+ */
+IndexPath *
+create_skipscan_unique_path(PlannerInfo *root,
+							RelOptInfo *rel,
+							Path *basepath,
+							int distinctPrefixKeys,
+							double numGroups)
+{
+	IndexPath *pathnode = makeNode(IndexPath);
+
+	Assert(IsA(basepath, IndexPath));
+
+	/* We don't want to modify basepath, so make a copy. */
+	memcpy(pathnode, basepath, sizeof(IndexPath));
+
+	/* The size of the prefix we'll use for skipping. */
+	Assert(pathnode->indexinfo->amcanskip);
+	Assert(distinctPrefixKeys > 0);
+	/*Assert(distinctPrefixKeys <= list_length(pathnode->path.pathkeys));*/
+	pathnode->indexskipprefix = distinctPrefixKeys;
+
+	/*
+	 * The cost to skip to each distinct value should be roughly the same as
+	 * the cost of finding the first key times the number of distinct values
+	 * we expect to find.
+	 */
+	pathnode->path.startup_cost = basepath->startup_cost;
+	pathnode->path.total_cost = basepath->startup_cost * numGroups;
+	pathnode->path.rows = numGroups;
+
+	return pathnode;
+}
+
 /*
  * create_agg_path
  *	  Creates a pathnode that represents performing aggregation/grouping
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d82fc5ab8b..f65b299f37 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -271,6 +271,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			info->amoptionalkey = amroutine->amoptionalkey;
 			info->amsearcharray = amroutine->amsearcharray;
 			info->amsearchnulls = amroutine->amsearchnulls;
+			info->amcanskip = (amroutine->amskip != NULL);
 			info->amcanparallel = amroutine->amcanparallel;
 			info->amhasgettuple = (amroutine->amgettuple != NULL);
 			info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e5f8a1301f..3258b1d007 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -918,6 +918,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_indexskipscan", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of index-skip-scan plans."),
+			NULL
+		},
+		&enable_indexskipscan,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_bitmapscan", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of bitmap-scan plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e1048c0047..a002ee2143 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -353,6 +353,7 @@
 #enable_hashjoin = on
 #enable_indexscan = on
 #enable_indexonlyscan = on
+#enable_indexskipscan = on
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 3b3e22f73d..3d39cd9d07 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -130,6 +130,13 @@ typedef void (*amrescan_function) (IndexScanDesc scan,
 typedef bool (*amgettuple_function) (IndexScanDesc scan,
 									 ScanDirection direction);
 
+/* skip past duplicates in a given prefix */
+typedef bool (*amskip_function) (IndexScanDesc scan,
+								 ScanDirection dir,
+								 ScanDirection indexdir,
+								 bool start,
+								 int prefix);
+
 /* fetch all valid tuples */
 typedef int64 (*amgetbitmap_function) (IndexScanDesc scan,
 									   TIDBitmap *tbm);
@@ -229,6 +236,7 @@ typedef struct IndexAmRoutine
 	amendscan_function amendscan;
 	ammarkpos_function ammarkpos;	/* can be NULL */
 	amrestrpos_function amrestrpos; /* can be NULL */
+	amskip_function amskip;				/* can be NULL */
 
 	/* interface functions to support parallel index scans */
 	amestimateparallelscan_function amestimateparallelscan; /* can be NULL */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 7e9364a50c..815de4e4dd 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,8 @@ extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
 extern IndexBulkDeleteResult *index_vacuum_cleanup(IndexVacuumInfo *info,
 												   IndexBulkDeleteResult *stats);
 extern bool index_can_return(Relation indexRelation, int attno);
+extern bool index_skip(IndexScanDesc scan, ScanDirection direction,
+					   ScanDirection indexdir, bool start, int prefix);
 extern RegProcedure index_getprocid(Relation irel, AttrNumber attnum,
 									uint16 procnum);
 extern FmgrInfo *index_getprocinfo(Relation irel, AttrNumber attnum,
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 20ace69dab..e098c6a1ab 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -662,6 +662,9 @@ typedef struct BTScanOpaqueData
 	 */
 	int			markItemIndex;	/* itemIndex, or -1 if not valid */
 
+	/* Work space for _bt_skip */
+	BTScanInsert	skipScanKey;	/* used to control skipping */
+
 	/* keep these last in struct for efficiency */
 	BTScanPosData currPos;		/* current position data */
 	BTScanPosData markPos;		/* marked position, if any */
@@ -793,6 +796,8 @@ extern OffsetNumber _bt_binsrch_insert(Relation rel, BTInsertState insertstate);
 extern int32 _bt_compare(Relation rel, BTScanInsert key, Page page, OffsetNumber offnum);
 extern bool _bt_first(IndexScanDesc scan, ScanDirection dir);
 extern bool _bt_next(IndexScanDesc scan, ScanDirection dir);
+extern bool _bt_skip(IndexScanDesc scan, ScanDirection dir,
+					 ScanDirection indexdir, bool start, int prefix);
 extern Buffer _bt_get_endpoint(Relation rel, uint32 level, bool rightmost,
 							   Snapshot snapshot);
 
@@ -817,6 +822,8 @@ extern void _bt_end_vacuum_callback(int code, Datum arg);
 extern Size BTreeShmemSize(void);
 extern void BTreeShmemInit(void);
 extern bytea *btoptions(Datum reloptions, bool validate);
+extern bool btskip(IndexScanDesc scan, ScanDirection dir,
+				   ScanDirection indexdir, bool start, int prefix);
 extern bool btproperty(Oid index_oid, int attno,
 					   IndexAMProperty prop, const char *propname,
 					   bool *res, bool *isnull);
diff --git a/src/include/access/sdir.h b/src/include/access/sdir.h
index 23feb90986..094a127464 100644
--- a/src/include/access/sdir.h
+++ b/src/include/access/sdir.h
@@ -55,4 +55,11 @@ typedef enum ScanDirection
 #define ScanDirectionIsForward(direction) \
 	((bool) ((direction) == ForwardScanDirection))
 
+/*
+ * ScanDirectionsAreOpposite
+ *		True iff scan directions are backward/forward or forward/backward.
+ */
+#define ScanDirectionsAreOpposite(dirA, dirB) \
+	((bool) (dirA != NoMovementScanDirection && dirA == -dirB))
+
 #endif							/* SDIR_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index eaea1f3b0c..94cf92d07f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1423,6 +1423,8 @@ typedef struct IndexScanState
 	ExprContext *iss_RuntimeContext;
 	Relation	iss_RelationDesc;
 	struct IndexScanDescData *iss_ScanDesc;
+	int         iss_SkipPrefixSize;
+	bool		iss_FirstTupleEmitted;
 
 	/* These are needed for re-checking ORDER BY expr ordering */
 	pairingheap *iss_ReorderQueue;
@@ -1452,6 +1454,8 @@ typedef struct IndexScanState
  *		TableSlot		   slot for holding tuples fetched from the table
  *		VMBuffer		   buffer in use for visibility map testing, if any
  *		PscanLen		   size of parallel index-only scan descriptor
+ *		SkipPrefixSize	   number of keys for skip-based DISTINCT
+ *		FirstTupleEmitted  has the first tuple been emitted
  * ----------------
  */
 typedef struct IndexOnlyScanState
@@ -1470,6 +1474,8 @@ typedef struct IndexOnlyScanState
 	struct IndexScanDescData *ioss_ScanDesc;
 	TupleTableSlot *ioss_TableSlot;
 	Buffer		ioss_VMBuffer;
+	int         ioss_SkipPrefixSize;
+	bool		ioss_FirstTupleEmitted;
 	Size		ioss_PscanLen;
 } IndexOnlyScanState;
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4e329f0fb5..b0ff9ca3a8 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -839,6 +839,7 @@ struct IndexOptInfo
 	bool		amsearchnulls;	/* can AM search for NULL/NOT NULL entries? */
 	bool		amhasgettuple;	/* does AM have amgettuple interface? */
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
+	bool		amcanskip;		/* can AM skip duplicate values? */
 	bool		amcanparallel;	/* does AM support parallel scan? */
 	/* Rather than include amapi.h here, we declare amcostestimate like this */
 	void		(*amcostestimate) ();	/* AM's cost estimator */
@@ -1189,6 +1190,9 @@ typedef struct Path
  * we need not recompute them when considering using the same index in a
  * bitmap index/heap scan (see BitmapHeapPath).  The costs of the IndexPath
  * itself represent the costs of an IndexScan or IndexOnlyScan plan type.
+ *
+ * 'indexskipprefix' represents the number of columns to consider for skip
+ * scans.
  *----------
  */
 typedef struct IndexPath
@@ -1201,6 +1205,7 @@ typedef struct IndexPath
 	ScanDirection indexscandir;
 	Cost		indextotalcost;
 	Selectivity indexselectivity;
+	int			indexskipprefix;
 } IndexPath;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 32c0d87f80..03a00e8e1d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -409,6 +409,8 @@ typedef struct IndexScan
 	List	   *indexorderbyorig;	/* the same in original form */
 	List	   *indexorderbyops;	/* OIDs of sort ops for ORDER BY exprs */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for distinct
+										 * scans */
 } IndexScan;
 
 /* ----------------
@@ -436,6 +438,8 @@ typedef struct IndexOnlyScan
 	List	   *indexorderby;	/* list of index ORDER BY exprs */
 	List	   *indextlist;		/* TargetEntry list describing index's cols */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for distinct
+										 * scans */
 } IndexOnlyScan;
 
 /* ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index cb012ba198..847f34f02b 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -50,6 +50,7 @@ extern PGDLLIMPORT int max_parallel_workers_per_gather;
 extern PGDLLIMPORT bool enable_seqscan;
 extern PGDLLIMPORT bool enable_indexscan;
 extern PGDLLIMPORT bool enable_indexonlyscan;
+extern PGDLLIMPORT bool enable_indexskipscan;
 extern PGDLLIMPORT bool enable_bitmapscan;
 extern PGDLLIMPORT bool enable_tidscan;
 extern PGDLLIMPORT bool enable_sort;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index f75ff6f323..6c8c9dadbb 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -201,6 +201,11 @@ extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
 												 Path *subpath,
 												 int numCols,
 												 double numGroups);
+extern IndexPath *create_skipscan_unique_path(PlannerInfo *root,
+											  RelOptInfo *rel,
+											  Path *subpath,
+											  int numCols,
+											  double numGroups);
 extern AggPath *create_agg_path(PlannerInfo *root,
 								RelOptInfo *rel,
 								Path *subpath,
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index f3696c6d1d..51e12ac925 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -244,3 +244,508 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+ANALYZE distinct_a;
+SELECT DISTINCT a FROM distinct_a;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+ a 
+---
+ 1
+(1 row)
+
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Index Only Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: true
+(2 rows)
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: true
+   Index Cond: (b = 2)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: true
+   Index Cond: (b = 2)
+(3 rows)
+
+DROP INDEX distinct_a_b_a;
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+FETCH FROM c;
+ a | b 
+---+---
+ 1 | 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+END;
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+FETCH FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+END;
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Index Only Scan using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: true
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 1 | 2
+ 3 | 1 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 1 | 2
+ 1 | 1 | 2
+(2 rows)
+
+END;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Index Only Scan Backward using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: true
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 2 | 2
+ 1 | 2 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 2 | 2
+ 3 | 2 | 2
+(2 rows)
+
+END;
+DROP TABLE distinct_abc;
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+ 2 | 1 | 10
+ 3 | 1 | 10
+ 4 | 1 | 10
+ 5 | 1 | 10
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Index Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: true
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Unique
+   ->  Bitmap Heap Scan on distinct_a
+         Recheck Cond: (a = 1)
+         ->  Bitmap Index Scan on distinct_a_a_b_idx
+               Index Cond: (a = 1)
+(5 rows)
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Unique
+   ->  Index Scan using distinct_a_a_b_idx on distinct_a
+         Index Cond: (b = 2)
+         Filter: (c = 10)
+(4 rows)
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+ a | a 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+ 4 | 4
+ 5 | 5
+(5 rows)
+
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+ a | ?column? 
+---+----------
+ 1 |        1
+ 2 |        1
+ 3 |        1
+ 4 |        1
+ 5 |        1
+(5 rows)
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+FETCH FROM c;
+ a 
+---
+ 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a 
+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+END;
+DROP TABLE distinct_a;
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 |  9999
+ 1 | 10000
+(5 rows)
+
+DROP TABLE distinct_visibility;
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Index Only Scan using distinct_boundaries_a_b_c_idx on distinct_boundaries
+   Skip scan: true
+   Index Cond: ((b >= 1) AND (c = 0))
+(3 rows)
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+ a | b | c 
+---+---+---
+ 1 | 2 | 0
+ 2 | 2 | 0
+ 3 | 2 | 0
+ 4 | 2 | 0
+ 5 | 2 | 0
+(5 rows)
+
+DROP TABLE distinct_boundaries;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index a1c90eb905..bd3b373515 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -78,6 +78,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashjoin                | on
  enable_indexonlyscan           | on
  enable_indexscan               | on
+ enable_indexskipscan           | on
  enable_material                | on
  enable_mergejoin               | on
  enable_nestloop                | on
@@ -89,7 +90,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(17 rows)
+(18 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index a605e86449..4c8a50d153 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -73,3 +73,189 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+ANALYZE distinct_a;
+
+SELECT DISTINCT a FROM distinct_a;
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+DROP INDEX distinct_a_b_a;
+
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+DROP TABLE distinct_abc;
+
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+DROP TABLE distinct_a;
+
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DROP TABLE distinct_visibility;
+
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+DROP TABLE distinct_boundaries;
-- 
2.21.0

#21

Floris Van Nee

florisvannee@Optiver.com

almost 6 years ago

In reply to: Dmitry Dolgov (#20)

Hi Dmitry,

Thanks for the new patch! I tested it and managed to find a case that causes some issues. Here's how to reproduce:

drop table if exists t;
create table t as select a,b,b%2 as c,10 as d from generate_series(1,5) a, generate_series(1,1000) b;
create index on t (a,b,c,d);

-- correct
postgres=# begin; declare c scroll cursor for select distinct on (a) a,b,c,d from t order by a desc, b desc; fetch forward all from c; fetch backward all from c; commit;
BEGIN
DECLARE CURSOR
a | b | c | d
---+------+---+----
5 | 1000 | 0 | 10
4 | 1000 | 0 | 10
3 | 1000 | 0 | 10
2 | 1000 | 0 | 10
1 | 1000 | 0 | 10
(5 rows)

a | b | c | d
---+------+---+----
1 | 1000 | 0 | 10
2 | 1000 | 0 | 10
3 | 1000 | 0 | 10
4 | 1000 | 0 | 10
5 | 1000 | 0 | 10
(5 rows)

-- now delete some rows
postgres=# delete from t where a=3;
DELETE 1000

-- and rerun: error is thrown
postgres=# begin; declare c scroll cursor for select distinct on (a) a,b,c,d from t order by a desc, b desc; fetch forward all from c; fetch backward all from c; commit;
BEGIN
DECLARE CURSOR
a | b | c | d
---+------+---+----
5 | 1000 | 0 | 10
4 | 1000 | 0 | 10
2 | 1000 | 0 | 10
1 | 1000 | 0 | 10
(4 rows)

ERROR: lock buffer_content is not held
ROLLBACK

A slightly different situation arises when executing the cursor with an ORDER BY a, b instead of the ORDER BY a DESC, b DESC:
-- recreate table again and execute the delete as above

postgres=# begin; declare c scroll cursor for select distinct on (a) a,b,c,d from t order by a, b; fetch forward all from c; fetch backward all from c; commit;
BEGIN
DECLARE CURSOR
a | b | c | d
---+---+---+----
1 | 1 | 1 | 10
2 | 1 | 1 | 10
4 | 1 | 1 | 10
5 | 1 | 1 | 10
(4 rows)

a | b | c | d
---+-----+---+----
5 | 1 | 1 | 10
4 | 1 | 1 | 10
2 | 827 | 1 | 10
1 | 1 | 1 | 10
(4 rows)

COMMIT

And lastly, you'll also get incorrect results if you do the delete slightly differently:
-- leave one row where a=3 and b=1000
postgres=# delete from t where a=3 and b<=999;
-- the cursor query above won't show any of the a=3 rows even though they should

-Floris

#22

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Floris Van Nee (#21)

Re: Index Skip Scan

Oh, interesting, thank you. I believe I know what happened, there is
one unnecessary locking part that eventually gives only problems, plus
one direct access to a page items without _bt_readpage. Will post a
new version soon.

Show quoted text

On Mon, Jan 27, 2020 at 3:00 PM Floris Van Nee <florisvannee@optiver.com> wrote:

Hi Dmitry,

Thanks for the new patch! I tested it and managed to find a case that causes some issues. Here's how to reproduce:

drop table if exists t;
create table t as select a,b,b%2 as c,10 as d from generate_series(1,5) a, generate_series(1,1000) b;
create index on t (a,b,c,d);

-- correct
postgres=# begin; declare c scroll cursor for select distinct on (a) a,b,c,d from t order by a desc, b desc; fetch forward all from c; fetch backward all from c; commit;
BEGIN
DECLARE CURSOR
a | b | c | d
---+------+---+----
5 | 1000 | 0 | 10
4 | 1000 | 0 | 10
3 | 1000 | 0 | 10
2 | 1000 | 0 | 10
1 | 1000 | 0 | 10
(5 rows)

a | b | c | d
---+------+---+----
1 | 1000 | 0 | 10
2 | 1000 | 0 | 10
3 | 1000 | 0 | 10
4 | 1000 | 0 | 10
5 | 1000 | 0 | 10
(5 rows)

-- now delete some rows
postgres=# delete from t where a=3;
DELETE 1000

-- and rerun: error is thrown
postgres=# begin; declare c scroll cursor for select distinct on (a) a,b,c,d from t order by a desc, b desc; fetch forward all from c; fetch backward all from c; commit;
BEGIN
DECLARE CURSOR
a | b | c | d
---+------+---+----
5 | 1000 | 0 | 10
4 | 1000 | 0 | 10
2 | 1000 | 0 | 10
1 | 1000 | 0 | 10
(4 rows)

ERROR: lock buffer_content is not held
ROLLBACK

A slightly different situation arises when executing the cursor with an ORDER BY a, b instead of the ORDER BY a DESC, b DESC:
-- recreate table again and execute the delete as above

postgres=# begin; declare c scroll cursor for select distinct on (a) a,b,c,d from t order by a, b; fetch forward all from c; fetch backward all from c; commit;
BEGIN
DECLARE CURSOR
a | b | c | d
---+---+---+----
1 | 1 | 1 | 10
2 | 1 | 1 | 10
4 | 1 | 1 | 10
5 | 1 | 1 | 10
(4 rows)

a | b | c | d
---+-----+---+----
5 | 1 | 1 | 10
4 | 1 | 1 | 10
2 | 827 | 1 | 10
1 | 1 | 1 | 10
(4 rows)

COMMIT

And lastly, you'll also get incorrect results if you do the delete slightly differently:
-- leave one row where a=3 and b=1000
postgres=# delete from t where a=3 and b<=999;
-- the cursor query above won't show any of the a=3 rows even though they should

-Floris

#23

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Floris Van Nee (#21)

Re: Index Skip Scan

On Mon, Jan 27, 2020 at 02:00:39PM +0000, Floris Van Nee wrote:

Thanks for the new patch! I tested it and managed to find a case that causes
some issues. Here's how to reproduce:

So, after a bit of investigation I found out the issue (it was actually there
even in the previous version). In this only case of moving forward and reading
backward, exactly scenarious you've described above, current implementation was
not ignoring deleted tuples.

My first idea to fix this was to use _bt_readpage when necessary and put
couple of _bt_killitems when we leave a page while jumping before, so that
deleted tuples will be ignored. To demonstrate it visually, let's say we
want to go backward on a cursor over an ORDER BY a DESC, b DESC query,
i.e. return:

(1,100), (2, 100), (3, 100) etc.

To achieve that we jump from (1,1) to (1,100), from (2,1) to (2,100) and so on.
If some values are deleted, we need to read backward. E.g. if (3,100) is
deleted, we need to return (3,99).

   +---------------+---------------+---------------+---------------+         
   |               |               |               |               |         
   | 1,1 ... 1,100 | 2,1 ... 2,100 | 3,1 ... 3,100 | 4,1 ... 4,100 |         
   |               |               |               |               |         
   +---------------+---------------+---------------+---------------+

| ^ | ^ | ^ | ^
| | | | | | | |
+-------------+ +-------------+ +-------------+ +-------------+

If it happened that a whole value series is deleted, we return to the
previous value and need to detect such situation. E.g. if all the values
from (3,1) to (3,100) were deleted, we will return to (2,100).

   +---------------+---------------+               +---------------+         
   |               |               |               |               |         
   | 1,1 ... 1,100 | 2,1 ... 2,100 |<--------------+ 4,1 ... 4,100 |         
   |               |               |               |               |         
   +---------------+---------------+               +---------------+         
                                                                 ^           
   |             ^ |             ^ |             ^               |           
   |             | |             | |             |               |           
   +-------------+ +-------------+ +-------------+               |           
                                   +-----------------------------+

This all is implemented inside _bt_skip. Unfortunately as I see it now the idea
of relying on ability to skip dead index tuples without checking a heap tuple
is not reliable, since an index tuple will be added into killedItems and can be
marked as dead only when not a single transaction can see it anymore.

Potentially there are two possible solutions:

* Adjust the code in nodeIndexOnlyscan to perform a proper visibility check and
understand if we returned back. Obviously it will make the patch more invasive.

* Reduce scope of the patch, and simply do not apply jumping in this case. This
means less functionality but hopefully still brings some value.

At this point me and Jesper inclined to go with the second option. But maybe
I'm missing something, are there any other suggestions?

#24

Floris Van Nee

florisvannee@Optiver.com

almost 6 years ago

In reply to: Dmitry Dolgov (#23)

this point me and Jesper inclined to go with the second option. But maybe
I'm missing something, are there any other suggestions?

Unfortunately I figured this would need a more invasive fix. I tend to agree that it'd be better to not skip in situations like this. I think it'd make most sense to make any plan for these 'prepare/fetch' queries would not use skip, but rather a materialize node, right?

-Floris

#25

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Floris Van Nee (#24)

Re: Index Skip Scan

On Tue, Feb 04, 2020 at 08:34:09PM +0000, Floris Van Nee wrote:

this point me and Jesper inclined to go with the second option. But maybe
I'm missing something, are there any other suggestions?

Unfortunately I figured this would need a more invasive fix. I tend to agree that it'd be better to not skip in situations like this. I think it'd make most sense to make any plan for these 'prepare/fetch' queries would not use skip, but rather a materialize node, right?

Yes, sort of, without a skip scan it would be just an index only scan
with unique on top. Actually it's not immediately clean how to achieve
this, since at the moment, when planner is deciding to consider index
skip scan, there is no information about neither direction nor whether
we're dealing with a cursor. Maybe we can somehow signal to the decision
logic that the root was a DeclareCursorStmt by e.g. introducing a new
field to the query structure (or abusing an existing one, since
DeclareCursorStmt is being processed by standard_ProcessUtility, just
for a test I've tried to use utilityStmt of a nested statement hoping
that it's unused and it didn't break tests yet).

#26

Kyotaro Horiguchi

horikyota.ntt@gmail.com

almost 6 years ago

In reply to: Dmitry Dolgov (#25)

Re: Index Skip Scan

At Wed, 5 Feb 2020 17:37:30 +0100, Dmitry Dolgov <9erthalion6@gmail.com> wrote in

On Tue, Feb 04, 2020 at 08:34:09PM +0000, Floris Van Nee wrote:

this point me and Jesper inclined to go with the second option. But maybe
I'm missing something, are there any other suggestions?

Unfortunately I figured this would need a more invasive fix. I tend to agree that it'd be better to not skip in situations like this. I think it'd make most sense to make any plan for these 'prepare/fetch' queries would not use skip, but rather a materialize node, right?

Yes, sort of, without a skip scan it would be just an index only scan
with unique on top. Actually it's not immediately clean how to achieve
this, since at the moment, when planner is deciding to consider index
skip scan, there is no information about neither direction nor whether
we're dealing with a cursor. Maybe we can somehow signal to the decision
logic that the root was a DeclareCursorStmt by e.g. introducing a new
field to the query structure (or abusing an existing one, since
DeclareCursorStmt is being processed by standard_ProcessUtility, just
for a test I've tried to use utilityStmt of a nested statement hoping
that it's unused and it didn't break tests yet).

Umm. I think it's a wrong direction. While defining a cursor,
default scrollability is decided based on the query allows backward
scan or not. That is, the definition of backward-scan'ability is not
just whether it can scan from the end toward the beginning, but
whether it can go back and forth freely or not. In that definition,
the *current* skip scan does not supporting backward scan. If we want
to allow descending order-by in a query, we should support scrollable
cursor, too.

We could add an additional parameter "in_cursor" to
ExecSupportBackwardScan and let skip scan return false if in_cursor is
true, but I'm not sure it's acceptable.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#27

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Kyotaro Horiguchi (#26)

Re: Index Skip Scan

On Thu, Feb 06, 2020 at 10:24:50AM +0900, Kyotaro Horiguchi wrote:
At Wed, 5 Feb 2020 17:37:30 +0100, Dmitry Dolgov <9erthalion6@gmail.com> wrote in

On Tue, Feb 04, 2020 at 08:34:09PM +0000, Floris Van Nee wrote:

this point me and Jesper inclined to go with the second option. But maybe
I'm missing something, are there any other suggestions?

Unfortunately I figured this would need a more invasive fix. I tend to agree that it'd be better to not skip in situations like this. I think it'd make most sense to make any plan for these 'prepare/fetch' queries would not use skip, but rather a materialize node, right?

Yes, sort of, without a skip scan it would be just an index only scan
with unique on top. Actually it's not immediately clean how to achieve
this, since at the moment, when planner is deciding to consider index
skip scan, there is no information about neither direction nor whether
we're dealing with a cursor. Maybe we can somehow signal to the decision
logic that the root was a DeclareCursorStmt by e.g. introducing a new
field to the query structure (or abusing an existing one, since
DeclareCursorStmt is being processed by standard_ProcessUtility, just
for a test I've tried to use utilityStmt of a nested statement hoping
that it's unused and it didn't break tests yet).

Umm. I think it's a wrong direction. While defining a cursor,
default scrollability is decided based on the query allows backward
scan or not. That is, the definition of backward-scan'ability is not
just whether it can scan from the end toward the beginning, but
whether it can go back and forth freely or not. In that definition,
the *current* skip scan does not supporting backward scan. If we want
to allow descending order-by in a query, we should support scrollable
cursor, too.

We could add an additional parameter "in_cursor" to
ExecSupportBackwardScan and let skip scan return false if in_cursor is
true, but I'm not sure it's acceptable.

I also was thinking about whether it's possible to use
ExecSupportBackwardScan here, but skip scan is just a mode of an
index/indexonly scan. Which means that ExecSupportBackwardScan also need
to know somehow if this mode is being used, and then, since this
function is called after it's already decided to use skip scan in the
resulting plan, somehow correct the plan (exclude skipping and try to
find next best path?) - do I understand your suggestion correct?

#28

Kyotaro Horiguchi

horikyota.ntt@gmail.com

almost 6 years ago

In reply to: Dmitry Dolgov (#27)

Re: Index Skip Scan

At Thu, 6 Feb 2020 11:57:07 +0100, Dmitry Dolgov <9erthalion6@gmail.com> wrote in

On Thu, Feb 06, 2020 at 10:24:50AM +0900, Kyotaro Horiguchi wrote:
At Wed, 5 Feb 2020 17:37:30 +0100, Dmitry Dolgov <9erthalion6@gmail.com> wrote in
We could add an additional parameter "in_cursor" to
ExecSupportBackwardScan and let skip scan return false if in_cursor is
true, but I'm not sure it's acceptable.

I also was thinking about whether it's possible to use
ExecSupportBackwardScan here, but skip scan is just a mode of an
index/indexonly scan. Which means that ExecSupportBackwardScan also need
to know somehow if this mode is being used, and then, since this
function is called after it's already decided to use skip scan in the
resulting plan, somehow correct the plan (exclude skipping and try to
find next best path?) - do I understand your suggestion correct?

I didn't thought so hardly, but a bit of confirmation told me that
IndexSupportsBackwardScan returns fixed flag for AM. It seems that
things are not that simple.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#29

Kyotaro Horiguchi

horikyota.ntt@gmail.com

almost 6 years ago

In reply to: Kyotaro Horiguchi (#28)

Re: Index Skip Scan

Sorry, I forgot to write more significant thing.

On 2020/02/06 21:22, Kyotaro Horiguchi wrote:

At Thu, 6 Feb 2020 11:57:07 +0100, Dmitry Dolgov <9erthalion6@gmail.com> wrote in

On Thu, Feb 06, 2020 at 10:24:50AM +0900, Kyotaro Horiguchi wrote:
At Wed, 5 Feb 2020 17:37:30 +0100, Dmitry Dolgov <9erthalion6@gmail.com> wrote in
We could add an additional parameter "in_cursor" to
ExecSupportBackwardScan and let skip scan return false if in_cursor is
true, but I'm not sure it's acceptable.

I also was thinking about whether it's possible to use
ExecSupportBackwardScan here, but skip scan is just a mode of an
index/indexonly scan. Which means that ExecSupportBackwardScan also need
to know somehow if this mode is being used, and then, since this
function is called after it's already decided to use skip scan in the
resulting plan, somehow correct the plan (exclude skipping and try to
find next best path?) - do I understand your suggestion correct?

No. I thought of the opposite thing. I meant that
IndexSupportsBackwardScan returns false if Index(Only)Scan is
going to do skip scan. But I found that the function doesn't have
access to plan node nor executor node. So I wrote as the follows.

I didn't thought so hardly, but a bit of confirmation told me that
IndexSupportsBackwardScan returns fixed flag for AM. It seems that
things are not that simple.

regards.

Kyotaro Horiguchi

#30

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Kyotaro Horiguchi (#28)

2 attachment(s)

Re: Index Skip Scan

On Thu, Feb 06, 2020 at 09:22:20PM +0900, Kyotaro Horiguchi wrote:
At Thu, 6 Feb 2020 11:57:07 +0100, Dmitry Dolgov <9erthalion6@gmail.com> wrote in

On Thu, Feb 06, 2020 at 10:24:50AM +0900, Kyotaro Horiguchi wrote:
At Wed, 5 Feb 2020 17:37:30 +0100, Dmitry Dolgov <9erthalion6@gmail.com> wrote in
We could add an additional parameter "in_cursor" to
ExecSupportBackwardScan and let skip scan return false if in_cursor is
true, but I'm not sure it's acceptable.

I also was thinking about whether it's possible to use
ExecSupportBackwardScan here, but skip scan is just a mode of an
index/indexonly scan. Which means that ExecSupportBackwardScan also need
to know somehow if this mode is being used, and then, since this
function is called after it's already decided to use skip scan in the
resulting plan, somehow correct the plan (exclude skipping and try to
find next best path?) - do I understand your suggestion correct?

I didn't thought so hardly, but a bit of confirmation told me that
IndexSupportsBackwardScan returns fixed flag for AM. It seems that
things are not that simple.

Yes, I've mentioned that already in one of the previous emails :) The
simplest way I see to achieve what we want is to do something like in
attached modified version with a new hasDeclaredCursor field. It's not a
final version though, but posted just for discussion, so feel free to
suggest any improvements or alternatives.

Attachments:

v33-0001-Unique-key.patchtext/x-diff; charset=us-asciiDownload

From 22e6b4ccd5f79ca069bd5cd90ba3696dd97f76ea Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Tue, 9 Jul 2019 06:44:57 -0400
Subject: [PATCH v33 1/2] Unique key

Design by David Rowley.

Author: Jesper Pedersen
---
 src/backend/nodes/outfuncs.c           |  14 +++
 src/backend/nodes/print.c              |  39 +++++++
 src/backend/optimizer/path/Makefile    |   3 +-
 src/backend/optimizer/path/allpaths.c  |   8 ++
 src/backend/optimizer/path/indxpath.c  |  41 +++++++
 src/backend/optimizer/path/pathkeys.c  |  71 ++++++++++--
 src/backend/optimizer/path/uniquekey.c | 147 +++++++++++++++++++++++++
 src/backend/optimizer/plan/planagg.c   |   1 +
 src/backend/optimizer/plan/planmain.c  |   1 +
 src/backend/optimizer/plan/planner.c   |  17 ++-
 src/backend/optimizer/util/pathnode.c  |  12 ++
 src/include/nodes/nodes.h              |   1 +
 src/include/nodes/pathnodes.h          |  18 +++
 src/include/nodes/print.h              |   1 +
 src/include/optimizer/pathnode.h       |   1 +
 src/include/optimizer/paths.h          |  11 ++
 16 files changed, 373 insertions(+), 13 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekey.c

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index d76fae44b8..16083e7a7e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1723,6 +1723,7 @@ _outPathInfo(StringInfo str, const Path *node)
 	WRITE_FLOAT_FIELD(startup_cost, "%.2f");
 	WRITE_FLOAT_FIELD(total_cost, "%.2f");
 	WRITE_NODE_FIELD(pathkeys);
+	WRITE_NODE_FIELD(uniquekeys);
 }
 
 /*
@@ -2205,6 +2206,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(eq_classes);
 	WRITE_BOOL_FIELD(ec_merging_done);
 	WRITE_NODE_FIELD(canon_pathkeys);
+	WRITE_NODE_FIELD(canon_uniquekeys);
 	WRITE_NODE_FIELD(left_join_clauses);
 	WRITE_NODE_FIELD(right_join_clauses);
 	WRITE_NODE_FIELD(full_join_clauses);
@@ -2214,6 +2216,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(placeholder_list);
 	WRITE_NODE_FIELD(fkey_list);
 	WRITE_NODE_FIELD(query_pathkeys);
+	WRITE_NODE_FIELD(query_uniquekeys);
 	WRITE_NODE_FIELD(group_pathkeys);
 	WRITE_NODE_FIELD(window_pathkeys);
 	WRITE_NODE_FIELD(distinct_pathkeys);
@@ -2401,6 +2404,14 @@ _outPathKey(StringInfo str, const PathKey *node)
 	WRITE_BOOL_FIELD(pk_nulls_first);
 }
 
+static void
+_outUniqueKey(StringInfo str, const UniqueKey *node)
+{
+	WRITE_NODE_TYPE("UNIQUEKEY");
+
+	WRITE_NODE_FIELD(eq_clause);
+}
+
 static void
 _outPathTarget(StringInfo str, const PathTarget *node)
 {
@@ -4092,6 +4103,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PathKey:
 				_outPathKey(str, obj);
 				break;
+			case T_UniqueKey:
+				_outUniqueKey(str, obj);
+				break;
 			case T_PathTarget:
 				_outPathTarget(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 42476724d8..d286b34544 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -459,6 +459,45 @@ print_pathkeys(const List *pathkeys, const List *rtable)
 	printf(")\n");
 }
 
+/*
+ * print_uniquekeys -
+ *	  uniquekeys list of UniqueKeys
+ */
+void
+print_uniquekeys(const List *uniquekeys, const List *rtable)
+{
+	ListCell   *l;
+
+	printf("(");
+	foreach(l, uniquekeys)
+	{
+		UniqueKey *unique_key = (UniqueKey *) lfirst(l);
+		EquivalenceClass *eclass = (EquivalenceClass *) unique_key->eq_clause;
+		ListCell   *k;
+		bool		first = true;
+
+		/* chase up */
+		while (eclass->ec_merged)
+			eclass = eclass->ec_merged;
+
+		printf("(");
+		foreach(k, eclass->ec_members)
+		{
+			EquivalenceMember *mem = (EquivalenceMember *) lfirst(k);
+
+			if (first)
+				first = false;
+			else
+				printf(", ");
+			print_expr((Node *) mem->em_expr, rtable);
+		}
+		printf(")");
+		if (lnext(uniquekeys, l))
+			printf(", ");
+	}
+	printf(")\n");
+}
+
 /*
  * print_tl
  *	  print targetlist in a more legible way.
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..63cc1505d9 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekey.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8286d9cf34..bbc13e6141 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3954,6 +3954,14 @@ print_path(PlannerInfo *root, Path *path, int indent)
 		print_pathkeys(path->pathkeys, root->parse->rtable);
 	}
 
+	if (path->uniquekeys)
+	{
+		for (i = 0; i < indent; i++)
+			printf("\t");
+		printf("  uniquekeys: ");
+		print_uniquekeys(path->uniquekeys, root->parse->rtable);
+	}
+
 	if (join)
 	{
 		JoinPath   *jp = (JoinPath *) path;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2a50272da6..bd1ea53e5c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -189,6 +189,7 @@ static Expr *match_clause_to_ordering_op(IndexOptInfo *index,
 static bool ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 									   EquivalenceClass *ec, EquivalenceMember *em,
 									   void *arg);
+static List *get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys);
 
 
 /*
@@ -874,6 +875,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	List	   *orderbyclausecols;
 	List	   *index_pathkeys;
 	List	   *useful_pathkeys;
+	List	   *useful_uniquekeys = NIL;
 	bool		found_lower_saop_clause;
 	bool		pathkeys_possibly_useful;
 	bool		index_is_ordered;
@@ -1036,11 +1038,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	if (index_clauses != NIL || useful_pathkeys != NIL || useful_predicate ||
 		index_only_scan)
 	{
+		if (has_useful_uniquekeys(root))
+			useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 		ipath = create_index_path(root, index,
 								  index_clauses,
 								  orderbyclauses,
 								  orderbyclausecols,
 								  useful_pathkeys,
+								  useful_uniquekeys,
 								  index_is_ordered ?
 								  ForwardScanDirection :
 								  NoMovementScanDirection,
@@ -1063,6 +1069,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 									  orderbyclauses,
 									  orderbyclausecols,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  index_is_ordered ?
 									  ForwardScanDirection :
 									  NoMovementScanDirection,
@@ -1093,11 +1100,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 													index_pathkeys);
 		if (useful_pathkeys != NIL)
 		{
+			if (has_useful_uniquekeys(root))
+				useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 			ipath = create_index_path(root, index,
 									  index_clauses,
 									  NIL,
 									  NIL,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  BackwardScanDirection,
 									  index_only_scan,
 									  outer_relids,
@@ -1115,6 +1126,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 										  NIL,
 										  NIL,
 										  useful_pathkeys,
+										  useful_uniquekeys,
 										  BackwardScanDirection,
 										  index_only_scan,
 										  outer_relids,
@@ -3365,6 +3377,35 @@ match_clause_to_ordering_op(IndexOptInfo *index,
 	return clause;
 }
 
+/*
+ * get_uniquekeys_for_index
+ */
+static List *
+get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys)
+{
+	ListCell *lc;
+
+	if (pathkeys)
+	{
+		List *uniquekeys = NIL;
+		foreach(lc, pathkeys)
+		{
+			UniqueKey *unique_key;
+			PathKey *pk = (PathKey *) lfirst(lc);
+			EquivalenceClass *ec = (EquivalenceClass *) pk->pk_eclass;
+
+			unique_key = makeNode(UniqueKey);
+			unique_key->eq_clause = ec;
+
+			lappend(uniquekeys, unique_key);
+		}
+
+		if (uniquekeys_contained_in(root->canon_uniquekeys, uniquekeys))
+			return uniquekeys;
+	}
+
+	return NIL;
+}
 
 /****************************************************************************
  *				----  ROUTINES TO DO PARTIAL INDEX PREDICATE TESTS	----
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 71b9d42c99..054df9a617 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -29,6 +29,7 @@
 #include "utils/lsyscache.h"
 
 
+static bool pathkey_is_unique(PathKey *new_pathkey, List *pathkeys);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
 											 RelOptInfo *partrel,
@@ -96,6 +97,29 @@ make_canonical_pathkey(PlannerInfo *root,
 	return pk;
 }
 
+/*
+ * pathkey_is_unique
+ *	   Checks if the new pathkey's equivalence class is the same as that of
+ *     any existing member of the pathkey list.
+ */
+static bool
+pathkey_is_unique(PathKey *new_pathkey, List *pathkeys)
+{
+	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
+	ListCell   *lc;
+
+	/* If same EC already is already in the list, then not unique */
+	foreach(lc, pathkeys)
+	{
+		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
+
+		if (new_ec == old_pathkey->pk_eclass)
+			return false;
+	}
+
+	return true;
+}
+
 /*
  * pathkey_is_redundant
  *	   Is a pathkey redundant with one already in the given list?
@@ -135,22 +159,12 @@ static bool
 pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys)
 {
 	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
-	ListCell   *lc;
 
 	/* Check for EC containing a constant --- unconditionally redundant */
 	if (EC_MUST_BE_REDUNDANT(new_ec))
 		return true;
 
-	/* If same EC already used in list, then redundant */
-	foreach(lc, pathkeys)
-	{
-		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
-
-		if (new_ec == old_pathkey->pk_eclass)
-			return true;
-	}
-
-	return false;
+	return !pathkey_is_unique(new_pathkey, pathkeys);
 }
 
 /*
@@ -1098,6 +1112,41 @@ make_pathkeys_for_sortclauses(PlannerInfo *root,
 	return pathkeys;
 }
 
+/*
+ * make_pathkeys_for_uniquekeyclauses
+ *		Generate a pathkeys list to be used for uniquekey clauses
+ */
+List *
+make_pathkeys_for_uniquekeys(PlannerInfo *root,
+							 List *sortclauses,
+							 List *tlist)
+{
+	List	   *pathkeys = NIL;
+	ListCell   *l;
+
+	foreach(l, sortclauses)
+	{
+		SortGroupClause *sortcl = (SortGroupClause *) lfirst(l);
+		Expr	   *sortkey;
+		PathKey    *pathkey;
+
+		sortkey = (Expr *) get_sortgroupclause_expr(sortcl, tlist);
+		Assert(OidIsValid(sortcl->sortop));
+		pathkey = make_pathkey_from_sortop(root,
+										   sortkey,
+										   root->nullable_baserels,
+										   sortcl->sortop,
+										   sortcl->nulls_first,
+										   sortcl->tleSortGroupRef,
+										   true);
+
+		if (pathkey_is_unique(pathkey, pathkeys))
+			pathkeys = lappend(pathkeys, pathkey);
+	}
+
+	return pathkeys;
+}
+
 /****************************************************************************
  *		PATHKEYS AND MERGECLAUSES
  ****************************************************************************/
diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c
new file mode 100644
index 0000000000..13d4ebb98c
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekey.c
@@ -0,0 +1,147 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekey.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekey.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "nodes/pg_list.h"
+
+static UniqueKey *make_canonical_uniquekey(PlannerInfo *root, EquivalenceClass *eclass);
+
+/*
+ * Build a list of unique keys
+ */
+List*
+build_uniquekeys(PlannerInfo *root, List *sortclauses)
+{
+	List *result = NIL;
+	List *sortkeys;
+	ListCell *l;
+
+	sortkeys = make_pathkeys_for_uniquekeys(root,
+											sortclauses,
+											root->processed_tlist);
+
+	/* Create a uniquekey and add it to the list */
+	foreach(l, sortkeys)
+	{
+		PathKey    *pathkey = (PathKey *) lfirst(l);
+		EquivalenceClass *ec = pathkey->pk_eclass;
+		UniqueKey *unique_key = make_canonical_uniquekey(root, ec);
+
+		result = lappend(result, unique_key);
+	}
+
+	return result;
+}
+
+/*
+ * uniquekeys_contained_in
+ *	  Are the keys2 included in the keys1 superset
+ */
+bool
+uniquekeys_contained_in(List *keys1, List *keys2)
+{
+	ListCell   *key1,
+			   *key2;
+
+	/*
+	 * Fall out quickly if we are passed two identical lists.  This mostly
+	 * catches the case where both are NIL, but that's common enough to
+	 * warrant the test.
+	 */
+	if (keys1 == keys2)
+		return true;
+
+	foreach(key2, keys2)
+	{
+		bool found = false;
+		UniqueKey  *uniquekey2 = (UniqueKey *) lfirst(key2);
+
+		foreach(key1, keys1)
+		{
+			UniqueKey  *uniquekey1 = (UniqueKey *) lfirst(key1);
+
+			if (uniquekey1->eq_clause == uniquekey2->eq_clause)
+			{
+				found = true;
+				break;
+			}
+		}
+
+		if (!found)
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * has_useful_uniquekeys
+ *		Detect whether the planner could have any uniquekeys that are
+ *		useful.
+ */
+bool
+has_useful_uniquekeys(PlannerInfo *root)
+{
+	if (root->query_uniquekeys != NIL)
+		return true;	/* there are some */
+	return false;		/* definitely useless */
+}
+
+/*
+ * make_canonical_uniquekey
+ *	  Given the parameters for a UniqueKey, find any pre-existing matching
+ *	  uniquekey in the query's list of "canonical" uniquekeys.  Make a new
+ *	  entry if there's not one already.
+ *
+ * Note that this function must not be used until after we have completed
+ * merging EquivalenceClasses.  (We don't try to enforce that here; instead,
+ * equivclass.c will complain if a merge occurs after root->canon_uniquekeys
+ * has become nonempty.)
+ */
+static UniqueKey *
+make_canonical_uniquekey(PlannerInfo *root,
+						 EquivalenceClass *eclass)
+{
+	UniqueKey  *uk;
+	ListCell   *lc;
+	MemoryContext oldcontext;
+
+	/* The passed eclass might be non-canonical, so chase up to the top */
+	while (eclass->ec_merged)
+		eclass = eclass->ec_merged;
+
+	foreach(lc, root->canon_uniquekeys)
+	{
+		uk = (UniqueKey *) lfirst(lc);
+		if (eclass == uk->eq_clause)
+			return uk;
+	}
+
+	/*
+	 * Be sure canonical uniquekeys are allocated in the main planning context.
+	 * Not an issue in normal planning, but it is for GEQO.
+	 */
+	oldcontext = MemoryContextSwitchTo(root->planner_cxt);
+
+	uk = makeNode(UniqueKey);
+	uk->eq_clause = eclass;
+
+	root->canon_uniquekeys = lappend(root->canon_uniquekeys, uk);
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return uk;
+}
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 8634940efc..dd64775d8f 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -511,6 +511,7 @@ minmax_qp_callback(PlannerInfo *root, void *extra)
 									  root->parse->targetList);
 
 	root->query_pathkeys = root->sort_pathkeys;
+	root->query_uniquekeys = NIL;
 }
 
 /*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..3a372af91b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -70,6 +70,7 @@ query_planner(PlannerInfo *root,
 	root->join_rel_level = NULL;
 	root->join_cur_level = 0;
 	root->canon_pathkeys = NIL;
+	root->canon_uniquekeys = NIL;
 	root->left_join_clauses = NIL;
 	root->right_join_clauses = NIL;
 	root->full_join_clauses = NIL;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d6f2153593..984fca0696 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3657,15 +3657,30 @@ standard_qp_callback(PlannerInfo *root, void *extra)
 	 * much easier, since we know that the parser ensured that one is a
 	 * superset of the other.
 	 */
+	root->query_uniquekeys = NIL;
+
 	if (root->group_pathkeys)
+	{
 		root->query_pathkeys = root->group_pathkeys;
+
+		if (!root->parse->hasAggs)
+			root->query_uniquekeys = build_uniquekeys(root, qp_extra->groupClause);
+	}
 	else if (root->window_pathkeys)
 		root->query_pathkeys = root->window_pathkeys;
 	else if (list_length(root->distinct_pathkeys) >
 			 list_length(root->sort_pathkeys))
+	{
 		root->query_pathkeys = root->distinct_pathkeys;
+		root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else if (root->sort_pathkeys)
+	{
 		root->query_pathkeys = root->sort_pathkeys;
+
+		if (root->distinct_pathkeys)
+			root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else
 		root->query_pathkeys = NIL;
 }
@@ -6222,7 +6237,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
 
 	/* Estimate the cost of index scan */
 	indexScanPath = create_index_path(root, indexInfo,
-									  NIL, NIL, NIL, NIL,
+									  NIL, NIL, NIL, NIL, NIL,
 									  ForwardScanDirection, false,
 									  NULL, 1.0, false);
 
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e6d08aede5..a006dbbe9c 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -940,6 +940,7 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = parallel_workers;
 	pathnode->pathkeys = NIL;	/* seqscan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_seqscan(pathnode, root, rel, pathnode->param_info);
 
@@ -964,6 +965,7 @@ create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* samplescan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_samplescan(pathnode, root, rel, pathnode->param_info);
 
@@ -1000,6 +1002,7 @@ create_index_path(PlannerInfo *root,
 				  List *indexorderbys,
 				  List *indexorderbycols,
 				  List *pathkeys,
+				  List *uniquekeys,
 				  ScanDirection indexscandir,
 				  bool indexonly,
 				  Relids required_outer,
@@ -1018,6 +1021,7 @@ create_index_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = 0;
 	pathnode->path.pathkeys = pathkeys;
+	pathnode->path.uniquekeys = uniquekeys;
 
 	pathnode->indexinfo = index;
 	pathnode->indexclauses = indexclauses;
@@ -1061,6 +1065,7 @@ create_bitmap_heap_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_degree;
 	pathnode->path.pathkeys = NIL;	/* always unordered */
+	pathnode->path.uniquekeys = NIL;
 
 	pathnode->bitmapqual = bitmapqual;
 
@@ -1922,6 +1927,7 @@ create_functionscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = pathkeys;
+	pathnode->uniquekeys = NIL;
 
 	cost_functionscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1948,6 +1954,7 @@ create_tablefuncscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_tablefuncscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1974,6 +1981,7 @@ create_valuesscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_valuesscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1999,6 +2007,7 @@ create_ctescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* XXX for now, result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2025,6 +2034,7 @@ create_namedtuplestorescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_namedtuplestorescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2051,6 +2061,7 @@ create_resultscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_resultscan(pathnode, root, rel, pathnode->param_info);
 
@@ -2077,6 +2088,7 @@ create_worktablescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	/* Cost is the same as for a regular CTE scan */
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index baced7eec0..a1511b46ea 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 3d3be197e0..4e329f0fb5 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -269,6 +269,8 @@ struct PlannerInfo
 
 	List	   *canon_pathkeys; /* list of "canonical" PathKeys */
 
+	List	   *canon_uniquekeys; /* list of "canonical" UniqueKeys */
+
 	List	   *left_join_clauses;	/* list of RestrictInfos for mergejoinable
 									 * outer join clauses w/nonnullable var on
 									 * left */
@@ -297,6 +299,8 @@ struct PlannerInfo
 
 	List	   *query_pathkeys; /* desired pathkeys for query_planner() */
 
+	List	   *query_uniquekeys; /* unique keys used for the query */
+
 	List	   *group_pathkeys; /* groupClause pathkeys, if any */
 	List	   *window_pathkeys;	/* pathkeys of bottom window, if any */
 	List	   *distinct_pathkeys;	/* distinctClause pathkeys, if any */
@@ -1077,6 +1081,15 @@ typedef struct ParamPathInfo
 	List	   *ppi_clauses;	/* join clauses available from outer rels */
 } ParamPathInfo;
 
+/*
+ * UniqueKey
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+
+	EquivalenceClass *eq_clause;	/* equivalence class */
+} UniqueKey;
 
 /*
  * Type "Path" is used as-is for sequential-scan paths, as well as some other
@@ -1106,6 +1119,9 @@ typedef struct ParamPathInfo
  *
  * "pathkeys" is a List of PathKey nodes (see above), describing the sort
  * ordering of the path's output rows.
+ *
+ * "uniquekeys", if not NIL, is a list of UniqueKey nodes (see above),
+ * describing the XXX.
  */
 typedef struct Path
 {
@@ -1129,6 +1145,8 @@ typedef struct Path
 
 	List	   *pathkeys;		/* sort ordering of path's output */
 	/* pathkeys is a List of PathKey nodes; see above */
+
+	List	   *uniquekeys;	/* the unique keys, or NIL if none */
 } Path;
 
 /* Macro for extracting a path's parameterization relids; beware double eval */
diff --git a/src/include/nodes/print.h b/src/include/nodes/print.h
index 6126b491bf..006248bfb5 100644
--- a/src/include/nodes/print.h
+++ b/src/include/nodes/print.h
@@ -28,6 +28,7 @@ extern char *pretty_format_node_dump(const char *dump);
 extern void print_rt(const List *rtable);
 extern void print_expr(const Node *expr, const List *rtable);
 extern void print_pathkeys(const List *pathkeys, const List *rtable);
+extern void print_uniquekeys(const List *uniquekeys, const List *rtable);
 extern void print_tl(const List *tlist, const List *rtable);
 extern void print_slot(TupleTableSlot *slot);
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e450fe112a..f75ff6f323 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -44,6 +44,7 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									List *indexorderbys,
 									List *indexorderbycols,
 									List *pathkeys,
+									List *uniquekeys,
 									ScanDirection indexscandir,
 									bool indexonly,
 									Relids required_outer,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ab73bd20c..5b6be383b3 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -214,6 +214,9 @@ extern List *build_join_pathkeys(PlannerInfo *root,
 extern List *make_pathkeys_for_sortclauses(PlannerInfo *root,
 										   List *sortclauses,
 										   List *tlist);
+extern List *make_pathkeys_for_uniquekeys(PlannerInfo *root,
+										  List *sortclauses,
+										  List *tlist);
 extern void initialize_mergeclause_eclasses(PlannerInfo *root,
 											RestrictInfo *restrictinfo);
 extern void update_mergeclause_eclasses(PlannerInfo *root,
@@ -240,4 +243,12 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+/*
+ * uniquekey.c
+ *	  Utilities for matching and building unique keys
+ */
+extern List *build_uniquekeys(PlannerInfo *root, List *sortclauses);
+extern bool uniquekeys_contained_in(List *keys1, List *keys2);
+extern bool has_useful_uniquekeys(PlannerInfo *root);
+
 #endif							/* PATHS_H */
-- 
2.21.0

v33-0002-Index-skip-scan.patchtext/x-diff; charset=us-asciiDownload

From c5b589f33e5ab2f8801bbf3e2cb9e7a25777bc82 Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Fri, 15 Nov 2019 09:46:53 -0500
Subject: [PATCH v33 2/2] Index skip scan

Implementation of Index Skip Scan (see Loose Index Scan in the wiki [1])
on top of IndexOnlyScan and IndexScan. To make it suitable for both
situations when there are small number of distinct values and
significant amount of distinct values the following approach is taken -
instead of searching from the root for every value we're searching for
then first on the current page, and then if not found continue searching
from the root.

Original patch and design were proposed by Thomas Munro [2], revived and
improved by Dmitry Dolgov and Jesper Pedersen.

[1] https://wiki.postgresql.org/wiki/Loose_indexscan
[2] https://www.postgresql.org/message-id/flat/CADLWmXXbTSBxP-MzJuPAYSsL_2f0iPm5VWPbCvDbVvfX93FKkw%40mail.gmail.com

Author: Jesper Pedersen, Dmitry Dolgov
Reviewed-by: Thomas Munro, David Rowley, Floris Van Nee, Kyotaro Horiguchi, Tomas Vondra, Peter Geoghegan
---
 contrib/bloom/blutils.c                       |   1 +
 doc/src/sgml/config.sgml                      |  15 +
 doc/src/sgml/indexam.sgml                     |  63 ++
 doc/src/sgml/indices.sgml                     |  23 +
 src/backend/access/brin/brin.c                |   1 +
 src/backend/access/gin/ginutil.c              |   1 +
 src/backend/access/gist/gist.c                |   1 +
 src/backend/access/hash/hash.c                |   1 +
 src/backend/access/index/indexam.c            |  18 +
 src/backend/access/nbtree/nbtree.c            |  13 +
 src/backend/access/nbtree/nbtsearch.c         | 458 +++++++++++++
 src/backend/access/spgist/spgutils.c          |   1 +
 src/backend/commands/explain.c                |  25 +
 src/backend/commands/portalcmds.c             |   2 +
 src/backend/executor/nodeIndexonlyscan.c      |  51 +-
 src/backend/executor/nodeIndexscan.c          |  56 +-
 src/backend/nodes/copyfuncs.c                 |   2 +
 src/backend/nodes/outfuncs.c                  |   2 +
 src/backend/nodes/readfuncs.c                 |   2 +
 src/backend/optimizer/path/costsize.c         |   1 +
 src/backend/optimizer/plan/createplan.c       |  20 +-
 src/backend/optimizer/plan/planner.c          |  77 +++
 src/backend/optimizer/util/pathnode.c         |  40 ++
 src/backend/optimizer/util/plancat.c          |   1 +
 src/backend/utils/misc/guc.c                  |   9 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/amapi.h                    |   8 +
 src/include/access/genam.h                    |   2 +
 src/include/access/nbtree.h                   |   7 +
 src/include/access/sdir.h                     |   7 +
 src/include/nodes/execnodes.h                 |   6 +
 src/include/nodes/parsenodes.h                |   1 +
 src/include/nodes/pathnodes.h                 |   5 +
 src/include/nodes/plannodes.h                 |   4 +
 src/include/optimizer/cost.h                  |   1 +
 src/include/optimizer/pathnode.h              |   5 +
 src/test/regress/expected/select_distinct.out | 601 ++++++++++++++++++
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/sql/select_distinct.sql      | 248 ++++++++
 39 files changed, 1772 insertions(+), 11 deletions(-)

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 0104d02f67..a018b7f3d0 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -133,6 +133,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = blbulkdelete;
 	amroutine->amvacuumcleanup = blvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = blcostestimate;
 	amroutine->amoptions = bloptions;
 	amroutine->amproperty = NULL;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e07dc01e80..36ba75b077 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4517,6 +4517,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-indexskipscan" xreflabel="enable_indexskipscan">
+      <term><varname>enable_indexskipscan</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_indexskipscan</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of index-skip-scan plan
+        types (see <xref linkend="indexes-index-skip-scans"/>). The default is
+        <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-material" xreflabel="enable_material">
       <term><varname>enable_material</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 37f8d8760a..a726d80878 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -148,6 +148,7 @@ typedef struct IndexAmRoutine
     amendscan_function amendscan;
     ammarkpos_function ammarkpos;       /* can be NULL */
     amrestrpos_function amrestrpos;     /* can be NULL */
+    amskip_function amskip;             /* can be NULL */
 
     /* interface functions to support parallel index scans */
     amestimateparallelscan_function amestimateparallelscan;    /* can be NULL */
@@ -691,6 +692,68 @@ amrestrpos (IndexScanDesc scan);
 
   <para>
 <programlisting>
+bool
+amskip (IndexScanDesc scan,
+        ScanDirection direction,
+        ScanDirection indexdir,
+        bool scanstart,
+        int prefix);
+</programlisting>
+  Skip past all tuples where the first 'prefix' columns have the same value as
+  the last tuple returned in the current scan. The arguments are:
+
+   <variablelist>
+    <varlistentry>
+     <term><parameter>scan</parameter></term>
+     <listitem>
+      <para>
+       Index scan information
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>direction</parameter></term>
+     <listitem>
+      <para>
+       The direction in which data is advancing.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>indexdir</parameter></term>
+     <listitem>
+      <para>
+        The index direction, in which data must be read.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>scanstart</parameter></term>
+     <listitem>
+      <para>
+        Whether or not it is a start of the scan.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>prefix</parameter></term>
+     <listitem>
+      <para>
+        Distinct prefix size.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+
+  </para>
+
+  <para>
+<programlisting>
 Size
 amestimateparallelscan (void);
 </programlisting>
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index c54bf0dbbd..c429d98fc7 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1254,6 +1254,29 @@ SELECT target FROM tests WHERE subject = 'some-subject' AND success;
    and later will recognize such cases and allow index-only scans to be
    generated, but older versions will not.
   </para>
+
+  <sect2 id="indexes-index-skip-scans">
+    <title>Index Skip Scans</title>
+
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index</primary>
+      <secondary>index-skip scans</secondary>
+    </indexterm>
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index-skip scan</primary>
+    </indexterm>
+
+    <para>
+     When the rows retrieved from an index scan are then deduplicated by
+     eliminating rows matching on a prefix of index keys (e.g. when using
+     <literal>SELECT DISTINCT</literal>), the planner will consider
+     skipping groups of rows with a matching key prefix. When a row with
+     a particular prefix is found, remaining rows with the same key prefix
+     are skipped.  The larger the number of rows with the same key prefix
+     rows (i.e. the lower the number of distinct key prefixes in the index),
+     the more efficient this is.
+    </para>
+  </sect2>
  </sect1>
 
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2e8f67ef10..4db31bb211 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -113,6 +113,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = brinbulkdelete;
 	amroutine->amvacuumcleanup = brinvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = brincostestimate;
 	amroutine->amoptions = brinoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index a7e55caf28..8dd1d30d2a 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -65,6 +65,7 @@ ginhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = ginbulkdelete;
 	amroutine->amvacuumcleanup = ginvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = gincostestimate;
 	amroutine->amoptions = ginoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index aefc302ed2..8c692f7fb4 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -86,6 +86,7 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = gistbulkdelete;
 	amroutine->amvacuumcleanup = gistvacuumcleanup;
 	amroutine->amcanreturn = gistcanreturn;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = gistcostestimate;
 	amroutine->amoptions = gistoptions;
 	amroutine->amproperty = gistproperty;
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 4871b7ff4d..e5fa4c7864 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -83,6 +83,7 @@ hashhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = hashbulkdelete;
 	amroutine->amvacuumcleanup = hashvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = hashcostestimate;
 	amroutine->amoptions = hashoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 01539b6bd6..1047a35ade 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -33,6 +33,7 @@
  *		index_can_return	- does index support index-only scans?
  *		index_getprocid - get a support procedure OID
  *		index_getprocinfo - get a support procedure's lookup info
+ *		index_skip		- advance past duplicate key values in a scan
  *
  * NOTES
  *		This file contains the index_ routines which used
@@ -730,6 +731,23 @@ index_can_return(Relation indexRelation, int attno)
 	return indexRelation->rd_indam->amcanreturn(indexRelation, attno);
 }
 
+/* ----------------
+ *		index_skip
+ *
+ *		Skip past all tuples where the first 'prefix' columns have the
+ *		same value as the last tuple returned in the current scan.
+ * ----------------
+ */
+bool
+index_skip(IndexScanDesc scan, ScanDirection direction,
+		   ScanDirection indexdir, bool scanstart, int prefix)
+{
+	SCAN_CHECKS;
+
+	return scan->indexRelation->rd_indam->amskip(scan, direction,
+												 indexdir, scanstart, prefix);
+}
+
 /* ----------------
  *		index_getprocid
  *
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 5254bc7ef5..8fde56fe60 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -132,6 +132,7 @@ bthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = btbulkdelete;
 	amroutine->amvacuumcleanup = btvacuumcleanup;
 	amroutine->amcanreturn = btcanreturn;
+	amroutine->amskip = btskip;
 	amroutine->amcostestimate = btcostestimate;
 	amroutine->amoptions = btoptions;
 	amroutine->amproperty = btproperty;
@@ -381,6 +382,8 @@ btbeginscan(Relation rel, int nkeys, int norderbys)
 	 */
 	so->currTuples = so->markTuples = NULL;
 
+	so->skipScanKey = NULL;
+
 	scan->xs_itupdesc = RelationGetDescr(rel);
 
 	scan->opaque = so;
@@ -448,6 +451,16 @@ btrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 	_bt_preprocess_array_keys(scan);
 }
 
+/*
+ * btskip() -- skip to the beginning of the next key prefix
+ */
+bool
+btskip(IndexScanDesc scan, ScanDirection direction,
+	   ScanDirection indexdir, bool start, int prefix)
+{
+	return _bt_skip(scan, direction, indexdir, start, prefix);
+}
+
 /*
  *	btendscan() -- close down a scan
  */
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index c573814f01..8b406416fd 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -37,6 +37,10 @@ static bool _bt_parallel_readpage(IndexScanDesc scan, BlockNumber blkno,
 static Buffer _bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot);
 static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir);
 static inline void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir);
+static inline void _bt_update_skip_scankeys(IndexScanDesc scan,
+											Relation indexRel);
+static inline bool _bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+										Buffer buf, ScanDirection dir);
 
 
 /*
@@ -1375,6 +1379,409 @@ _bt_next(IndexScanDesc scan, ScanDirection dir)
 	return true;
 }
 
+/*
+ *  _bt_skip() -- Skip items that have the same prefix as the most recently
+ * 				  fetched index tuple.
+ *
+ * 		The current position is set so that a subsequent call to _bt_next will
+ * 		fetch the first tuple that differs in the leading 'prefix' keys.
+ *
+ * 		There are four different kinds of skipping (depending on dir and
+ * 		indexdir, that are important to distinguish, especially in the presense
+ * 		of an index condition:
+ *
+ * 		* Advancing forward and reading forward
+ * 			simple scan
+ *
+ * 		* Advancing forward and reading backward
+ * 			scan inside a cursor fetching backward, when skipping is necessary
+ * 			right from the start
+ *
+ * 		* Advancing backward and reading forward
+ * 			scan with order by desc inside a cursor fetching forward, when
+ * 			skipping is necessary right from the start
+ *
+ * 		* Advancing backward and reading backward
+ * 			simple scan with order by desc
+ *
+ *      The current page is searched for the next unique value. If none is found
+ *      we will do a scan from the root in order to find the next page with
+ *      a unique value.
+ */
+bool
+_bt_skip(IndexScanDesc scan, ScanDirection dir,
+		 ScanDirection indexdir, bool scanstart, int prefix)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+	BTStack stack;
+	Buffer buf;
+	OffsetNumber offnum;
+	BTScanPosItem *currItem;
+	Relation 	 indexRel = scan->indexRelation;
+
+	/* We want to return tuples, and we need a starting point */
+	Assert(scan->xs_want_itup);
+	Assert(scan->xs_itup);
+
+	if (so->numKilled > 0)
+		_bt_killitems(scan);
+
+	/* If skipScanKey is NULL then we initialize it with _bt_mkscankey */
+	if (so->skipScanKey == NULL)
+	{
+		so->skipScanKey = _bt_mkscankey(indexRel, scan->xs_itup);
+		so->skipScanKey->keysz = prefix;
+		so->skipScanKey->scantid = NULL;
+	}
+	so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+	_bt_update_skip_scankeys(scan, indexRel);
+
+	/* Check if the next unique key can be found within the current page.
+	 * Since we do not lock the current page between jumps, it's possible
+	 * that it was splitted since the last time we saw it. This is fine in
+	 * case of scanning forward, since page split to the right and we are
+	 * still on the left most page. In case of scanning backwards it's
+	 * possible to loose some pages and we need to remember the previous
+	 * page, and then follow the right link from the current page until we
+	 * find the original one.
+	 *
+	 * Since the whole idea of checking the current page is to protect
+	 * ourselves and make more performant statistic mismatch case when
+	 * there are too many distinct values for jumping, it's not clear if
+	 * the complexity of this solution in case of backward scan is
+	 * justified, so for now just avoid it.
+	 */
+	if (BufferIsValid(so->currPos.buf) && ScanDirectionIsForward(dir))
+	{
+		LockBuffer(so->currPos.buf, BT_READ);
+
+		if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
+		{
+			bool keyFound = false;
+
+			offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, so->currPos.buf);
+
+			/* Lock the page for SERIALIZABLE transactions */
+			PredicateLockPage(scan->indexRelation, BufferGetBlockNumber(so->currPos.buf),
+							  scan->xs_snapshot);
+
+			/* We know in which direction to look */
+			_bt_initialize_more_data(so, dir);
+
+			/* Now read the data */
+			keyFound = _bt_readpage(scan, dir, offnum);
+
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			ReleaseBuffer(so->currPos.buf);
+			so->currPos.buf = InvalidBuffer;
+
+			if (keyFound)
+			{
+				/* set IndexTuple */
+				currItem = &so->currPos.items[so->currPos.itemIndex];
+				scan->xs_heaptid = currItem->heapTid;
+				scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+				return true;
+			}
+		}
+		else
+		{
+			if (so->numKilled > 0)
+				_bt_killitems(scan);
+
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		}
+	}
+
+	if (BufferIsValid(so->currPos.buf))
+	{
+		ReleaseBuffer(so->currPos.buf);
+		so->currPos.buf = InvalidBuffer;
+	}
+
+	/*
+	 * We haven't found scan key within the current page, so let's scan from
+	 * the root. Use _bt_search and _bt_binsrch to get the buffer and offset
+	 * number
+	 */
+	so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+	stack = _bt_search(scan->indexRelation, so->skipScanKey,
+					   &buf, BT_READ, scan->xs_snapshot);
+	_bt_freestack(stack);
+	so->currPos.buf = buf;
+	offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+
+	/* Lock the page for SERIALIZABLE transactions */
+	PredicateLockPage(scan->indexRelation, BufferGetBlockNumber(buf),
+					  scan->xs_snapshot);
+
+	/* We know in which direction to look */
+	_bt_initialize_more_data(so, dir);
+
+	/*
+	 * Simplest case is when both directions are forward, when we are already
+	 * at the next distinct key at the beginning of the series (so everything
+	 * else would be done in _bt_readpage)
+	 *
+	 * The case when both directions are backwards is also simple, but we need
+	 * to go one step back, since we need a last element from the previous
+	 * series.
+	 */
+	if (ScanDirectionIsBackward(dir) && ScanDirectionIsBackward(indexdir))
+		 offnum = OffsetNumberPrev(offnum);
+
+	/*
+	 * Andvance backward but read forward. At this moment we are at the next
+	 * distinct key at the beginning of the series. In case if scan just
+	 * started, we can read forward without doing anything else. Otherwise
+	 * find previous distinct key and the beginning of it's series and read
+	 * forward from there. To do so, go back one step, perform binary search
+	 * to find the first item in the series and let _bt_readpage do everything
+	 * else.
+	 */
+	else if (ScanDirectionIsBackward(dir) && ScanDirectionIsForward(indexdir))
+	{
+		if (!scanstart)
+		{
+			offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+
+			/* One step back to find a previous value */
+			_bt_readpage(scan, dir, offnum);
+
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			if (_bt_next(scan, dir))
+			{
+				LockBuffer(so->currPos.buf, BT_READ);
+				_bt_update_skip_scankeys(scan, indexRel);
+
+				/*
+				 * And now find the last item from the sequence for the
+				 * current, value with the intention do OffsetNumberNext. As a
+				 * result we end up on a first element from the sequence.
+				 */
+				if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
+					offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				else
+				{
+					if (BufferIsValid(so->currPos.buf))
+					{
+						if (so->numKilled > 0)
+							_bt_killitems(scan);
+
+						LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+						ReleaseBuffer(so->currPos.buf);
+						so->currPos.buf = InvalidBuffer;
+					}
+
+					stack = _bt_search(scan->indexRelation, so->skipScanKey,
+									   &buf, BT_READ, scan->xs_snapshot);
+					_bt_freestack(stack);
+					so->currPos.buf = buf;
+					offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				}
+			}
+			else
+			{
+				pfree(so->skipScanKey);
+				so->skipScanKey = NULL;
+				return false;
+			}
+		}
+	}
+
+	/*
+	 * Advance forward but read backward. At this moment we are at the next
+	 * distinct key at the beginning of the series. In case if scan just
+	 * started, we can go one step back and read forward without doing
+	 * anything else. Otherwise find the next distinct key and the beginning
+	 * of it's series, go one step back and read backward from there.
+	 *
+	 * An interesting situation can happen if one of distinct keys do not pass
+	 * a corresponding index condition at all. In this case reading backward
+	 * can lead to a previous distinct key being found, creating a loop. To
+	 * avoid that check the value to be returned, and jump one more time if
+	 * it's the same as at the beginning.
+	 */
+	else if (ScanDirectionIsForward(dir) && ScanDirectionIsBackward(indexdir))
+	{
+		if (scanstart)
+			offnum = OffsetNumberPrev(offnum);
+		else
+		{
+			OffsetNumber nextOffset,
+						startOffset,
+						jumpOffset;
+
+			IndexTuple startItup = CopyIndexTuple(scan->xs_itup);
+			Page page = BufferGetPage(so->currPos.buf);
+
+			/* We are at the end and need to return */
+			if ((offnum > PageGetMaxOffsetNumber(page)) &
+				(so->currPos.nextPage == P_NONE))
+			{
+				LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+
+				BTScanPosUnpinIfPinned(so->currPos);
+				BTScanPosInvalidate(so->currPos)
+
+				pfree(so->skipScanKey);
+				so->skipScanKey = NULL;
+				return false;
+			}
+
+			nextOffset = startOffset = ItemPointerGetOffsetNumber(&scan->xs_itup->t_tid);
+
+			while (nextOffset == startOffset)
+			{
+				IndexTuple itup;
+				CHECK_FOR_INTERRUPTS();
+
+				/*
+				 * Find a next index tuple to update scan key. It could be at
+				 * the end, so check for max offset
+				 */
+				if (!_bt_readpage(scan, ForwardScanDirection, offnum))
+				{
+					/*
+					 * There's no actually-matching data on this page.  Try to
+					 * advance to the next page. Return false if there's no
+					 * matching data at all.
+					 */
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+					if (!_bt_steppage(scan, dir))
+					{
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+					LockBuffer(so->currPos.buf, BT_READ);
+				}
+
+				currItem = &so->currPos.items[so->currPos.firstItem];
+				itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+
+				scan->xs_itup = itup;
+				so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+
+				_bt_update_skip_scankeys(scan, indexRel);
+				if (BufferIsValid(so->currPos.buf))
+				{
+					if (so->numKilled > 0)
+						_bt_killitems(scan);
+
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+					ReleaseBuffer(so->currPos.buf);
+					so->currPos.buf = InvalidBuffer;
+				}
+
+				stack = _bt_search(scan->indexRelation, so->skipScanKey,
+								   &buf, BT_READ, scan->xs_snapshot);
+				_bt_freestack(stack);
+				so->currPos.buf = buf;
+				jumpOffset = offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				offnum = OffsetNumberPrev(offnum);
+
+				so->currPos.moreLeft = true;
+				if (!_bt_readpage(scan, indexdir, offnum))
+				{
+					/*
+					 * There's no actually-matching data on this page.  Try to
+					 * advance to the next page. Return false if there's no
+					 * matching data at all.
+					 */
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+					if (!_bt_steppage(scan, indexdir))
+					{
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+					LockBuffer(so->currPos.buf, BT_READ);
+				}
+
+				currItem = &so->currPos.items[so->currPos.lastItem];
+				itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+				nextOffset = ItemPointerGetOffsetNumber(&itup->t_tid);
+
+				/*
+				 * To check if we returned the same tuple, try to find a
+				 * startItup on the current page. For that we need to update
+				 * scankey to match the whole tuple and set nextkey to return
+				 * an exact tuple, not the next one. If the nextOffset is the
+				 * same as before, it means we are in the loop, return offnum
+				 * to the original position and jump further
+				 */
+				scan->xs_itup = startItup;
+				_bt_update_skip_scankeys(scan, indexRel);
+
+				so->skipScanKey->keysz = IndexRelationGetNumberOfKeyAttributes(indexRel);
+				so->skipScanKey->nextkey = false;
+
+				if (_bt_scankey_within_page(scan, so->skipScanKey,
+											so->currPos.buf, dir))
+				{
+					OffsetNumber maxoff;
+					startOffset = _bt_binsrch(scan->indexRelation,
+											  so->skipScanKey,
+											  so->currPos.buf);
+
+					page = BufferGetPage(so->currPos.buf);
+					maxoff = PageGetMaxOffsetNumber(page);
+
+					if (nextOffset <= startOffset)
+					{
+						offnum = jumpOffset;
+						nextOffset = startOffset;
+					}
+
+					if ((offnum > maxoff) & (so->currPos.nextPage == P_NONE))
+					{
+						LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+
+						BTScanPosUnpinIfPinned(so->currPos);
+						BTScanPosInvalidate(so->currPos)
+
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+				}
+
+				/* Return original scankey options */
+				so->skipScanKey->keysz = prefix;
+				so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+			}
+		}
+	}
+
+	/* Now read the data */
+	if (!_bt_readpage(scan, indexdir, offnum))
+	{
+		/*
+		 * There's no actually-matching data on this page.  Try to advance to
+		 * the next page.  Return false if there's no matching data at all.
+		 */
+		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		if (!_bt_steppage(scan, dir))
+		{
+			pfree(so->skipScanKey);
+			so->skipScanKey = NULL;
+			return false;
+		}
+	}
+	else
+	{
+		/* Drop the lock, and maybe the pin, on the current page */
+		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+	}
+
+	/* And set IndexTuple */
+	currItem = &so->currPos.items[so->currPos.itemIndex];
+	scan->xs_heaptid = currItem->heapTid;
+	scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+	return true;
+}
+
 /*
  *	_bt_readpage() -- Load data from current index page into so->currPos
  *
@@ -2246,3 +2653,54 @@ _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir)
 	so->numKilled = 0;			/* just paranoia */
 	so->markItemIndex = -1;		/* ditto */
 }
+
+/*
+ * _bt_update_skip_scankeys() -- set up a new values for the existing scankeys
+ * 								 based on the current index tuple
+ */
+static inline void
+_bt_update_skip_scankeys(IndexScanDesc scan, Relation indexRel)
+{
+	TupleDesc		itupdesc;
+	int			indnkeyatts,
+				i;
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	ScanKey			scankeys = so->skipScanKey->scankeys;
+
+	itupdesc = RelationGetDescr(indexRel);
+	indnkeyatts = IndexRelationGetNumberOfKeyAttributes(indexRel);
+	for (i = 0; i < indnkeyatts; i++)
+	{
+		Datum datum;
+		bool null;
+		int flags;
+
+		datum = index_getattr(scan->xs_itup, i + 1, itupdesc, &null);
+		flags = (null ? SK_ISNULL : 0) |
+				(indexRel->rd_indoption[i] << SK_BT_INDOPTION_SHIFT);
+		scankeys[i].sk_flags = flags;
+		scankeys[i].sk_argument = datum;
+	}
+}
+
+/*
+ * _bt_scankey_within_page() -- check if the provided scankey could be found
+ * 								within a page, specified by the buffer.
+ */
+static inline bool
+_bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+						Buffer buf, ScanDirection dir)
+{
+	OffsetNumber low, high;
+	Page page = BufferGetPage(buf);
+	BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+
+	low = P_FIRSTDATAKEY(opaque);
+	high = PageGetMaxOffsetNumber(page);
+
+	if (unlikely(high < low))
+		return false;
+
+	return (_bt_compare(scan->indexRelation, key, page, low) > 0 &&
+			_bt_compare(scan->indexRelation, key, page, high) < 1);
+}
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 4924ae1c59..fa09a4685e 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -68,6 +68,7 @@ spghandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = spgbulkdelete;
 	amroutine->amvacuumcleanup = spgvacuumcleanup;
 	amroutine->amcanreturn = spgcanreturn;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = spgcostestimate;
 	amroutine->amoptions = spgoptions;
 	amroutine->amproperty = spgproperty;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c367c750b1..a7dd874531 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -141,6 +141,7 @@ static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
 static void ExplainIndentText(ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
+static void ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
 
 
@@ -1052,6 +1053,22 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 	return planstate_tree_walker(planstate, ExplainPreScanNode, rels_used);
 }
 
+/*
+ * ExplainIndexSkipScanKeys -
+ *	  Append information about index skip scan to es->str.
+ *
+ * Can be used to print the skip prefix size.
+ */
+static void
+ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es)
+{
+	if (skipPrefixSize > 0)
+	{
+		if (es->format != EXPLAIN_FORMAT_TEXT)
+			ExplainPropertyInteger("Distinct Prefix", NULL, skipPrefixSize, es);
+	}
+}
+
 /*
  * ExplainNode -
  *	  Appends a description of a plan tree to es->str
@@ -1386,6 +1403,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
 
+				ExplainIndexSkipScanKeys(indexscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexscan->indexid,
 										indexscan->indexorderdir,
 										es);
@@ -1396,6 +1415,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) plan;
 
+				ExplainIndexSkipScanKeys(indexonlyscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexonlyscan->indexid,
 										indexonlyscan->indexorderdir,
 										es);
@@ -1655,6 +1676,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	switch (nodeTag(plan))
 	{
 		case T_IndexScan:
+			if (((IndexScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyBool("Skip scan", true, es);
 			show_scan_qual(((IndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexScan *) plan)->indexqualorig)
@@ -1668,6 +1691,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			break;
 		case T_IndexOnlyScan:
+			if (((IndexOnlyScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyBool("Skip scan", true, es);
 			show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexOnlyScan *) plan)->indexqual)
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 7e5c805a1e..b9999ca3a9 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -89,6 +89,8 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
 	if (query->commandType != CMD_SELECT)
 		elog(ERROR, "non-SELECT statement in DECLARE CURSOR");
 
+	query->hasDeclareCursor = true;
+
 	/* Plan the query, applying the specified options */
 	plan = pg_plan_query(query, cstmt->options, params);
 
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 5617ac29e7..76330f7906 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -65,6 +65,13 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
 	ItemPointer tid;
+	IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) node->ss.ps.plan;
+
+	/*
+	 * tells if the current position was reached via skipping. In this case
+	 * there is no nead for the index_getnext_tid
+	 */
+	bool skipped = false;
 
 	/*
 	 * extract necessary information from index scan node
@@ -72,7 +79,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexOnlyScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexonlyscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -115,14 +122,50 @@ IndexOnlyNext(IndexOnlyScanState *node)
 						 node->ioss_NumOrderByKeys);
 	}
 
+	/*
+	 * Check if we need to skip to the next key prefix, because we've been
+	 * asked to implement DISTINCT.
+	 *
+	 * When fetching a cursor in the direction opposite to a general scan
+	 * direction, the result must be what normal fetching should have
+	 * returned, but in reversed order. In other words, return the last or
+	 * first scanned tuple in a DISTINCT set, depending on a cursor direction.
+	 * Due to that we skip also when the first tuple wasn't emitted yet, but
+	 * the directions are opposite.
+	 */
+	if (node->ioss_SkipPrefixSize > 0 &&
+		(node->ioss_FirstTupleEmitted ||
+		 ScanDirectionsAreOpposite(direction, indexonlyscan->indexorderdir)))
+	{
+		if (!index_skip(scandesc, direction, indexonlyscan->indexorderdir,
+						!node->ioss_FirstTupleEmitted, node->ioss_SkipPrefixSize))
+		{
+			/*
+			 * Reached end of index. At this point currPos is invalidated, and
+			 * we need to reset ioss_FirstTupleEmitted, since otherwise after
+			 * going backwards, reaching the end of index, and going forward
+			 * again we apply skip again. It would be incorrect and lead to an
+			 * extra skipped item.
+			 */
+			node->ioss_FirstTupleEmitted = false;
+			return ExecClearTuple(slot);
+		}
+		else
+		{
+			skipped = true;
+			tid = &scandesc->xs_heaptid;
+		}
+	}
+
 	/*
 	 * OK, now that we have what we need, fetch the next tuple.
 	 */
-	while ((tid = index_getnext_tid(scandesc, direction)) != NULL)
+	while (skipped || (tid = index_getnext_tid(scandesc, direction)) != NULL)
 	{
 		bool		tuple_from_heap = false;
 
 		CHECK_FOR_INTERRUPTS();
+		skipped = false;
 
 		/*
 		 * We can skip the heap fetch if the TID references a heap page on
@@ -250,6 +293,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 							  ItemPointerGetBlockNumber(tid),
 							  estate->es_snapshot);
 
+		node->ioss_FirstTupleEmitted = true;
+
 		return slot;
 	}
 
@@ -504,6 +549,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
 	indexstate->ss.ps.ExecProcNode = ExecIndexOnlyScan;
+	indexstate->ioss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->ioss_FirstTupleEmitted = false;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index d0a96a38e0..449aaec3ac 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -85,6 +85,13 @@ IndexNext(IndexScanState *node)
 	ScanDirection direction;
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
+	IndexScan *indexscan = (IndexScan *) node->ss.ps.plan;
+
+	/*
+	 * tells if the current position was reached via skipping. In this case
+	 * there is no nead for the index_getnext_tid
+	 */
+	bool skipped = false;
 
 	/*
 	 * extract necessary information from index scan node
@@ -92,7 +99,7 @@ IndexNext(IndexScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -117,6 +124,12 @@ IndexNext(IndexScanState *node)
 
 		node->iss_ScanDesc = scandesc;
 
+		/* Index skip scan assumes xs_want_itup, so set it to true */
+		if (indexscan->indexskipprefixsize > 0)
+			node->iss_ScanDesc->xs_want_itup = true;
+		else
+			node->iss_ScanDesc->xs_want_itup = false;
+
 		/*
 		 * If no run-time keys to calculate or they are ready, go ahead and
 		 * pass the scankeys to the index AM.
@@ -127,12 +140,48 @@ IndexNext(IndexScanState *node)
 						 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
 	}
 
+	/*
+	 * Check if we need to skip to the next key prefix, because we've been
+	 * asked to implement DISTINCT.
+	 *
+	 * When fetching a cursor in the direction opposite to a general scan
+	 * direction, the result must be what normal fetching should have
+	 * returned, but in reversed order. In other words, return the last or
+	 * first scanned tuple in a DISTINCT set, depending on a cursor direction.
+	 * Due to that we skip also when the first tuple wasn't emitted yet, but
+	 * the directions are opposite.
+	 */
+	if (node->iss_SkipPrefixSize > 0 &&
+		(node->iss_FirstTupleEmitted ||
+		 ScanDirectionsAreOpposite(direction, indexscan->indexorderdir)))
+	{
+		if (!index_skip(scandesc, direction, indexscan->indexorderdir,
+					   !node->iss_FirstTupleEmitted, node->iss_SkipPrefixSize))
+		{
+			/*
+			 * Reached end of index. At this point currPos is invalidated, and
+			 * we need to reset iss_FirstTupleEmitted, since otherwise after
+			 * going backwards, reaching the end of index, and going forward
+			 * again we apply skip again. It would be incorrect and lead to an
+			 * extra skipped item.
+			 */
+			node->iss_FirstTupleEmitted = false;
+			return ExecClearTuple(slot);
+		}
+		else
+		{
+			skipped = true;
+			index_fetch_heap(scandesc, slot);
+		}
+	}
+
 	/*
 	 * ok, now that we have what we need, fetch the next tuple.
 	 */
-	while (index_getnext_slot(scandesc, direction, slot))
+	while (skipped || index_getnext_slot(scandesc, direction, slot))
 	{
 		CHECK_FOR_INTERRUPTS();
+		skipped = false;
 
 		/*
 		 * If the index was lossy, we have to recheck the index quals using
@@ -149,6 +198,7 @@ IndexNext(IndexScanState *node)
 			}
 		}
 
+		node->iss_FirstTupleEmitted = true;
 		return slot;
 	}
 
@@ -910,6 +960,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
 	indexstate->ss.ps.ExecProcNode = ExecIndexScan;
+	indexstate->iss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->iss_FirstTupleEmitted = false;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 54ad62bb7f..e0cfd710c4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -493,6 +493,7 @@ _copyIndexScan(const IndexScan *from)
 	COPY_NODE_FIELD(indexorderbyorig);
 	COPY_NODE_FIELD(indexorderbyops);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
 
 	return newnode;
 }
@@ -518,6 +519,7 @@ _copyIndexOnlyScan(const IndexOnlyScan *from)
 	COPY_NODE_FIELD(indexorderby);
 	COPY_NODE_FIELD(indextlist);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 16083e7a7e..5f723cda4b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -562,6 +562,7 @@ _outIndexScan(StringInfo str, const IndexScan *node)
 	WRITE_NODE_FIELD(indexorderbyorig);
 	WRITE_NODE_FIELD(indexorderbyops);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
 }
 
 static void
@@ -576,6 +577,7 @@ _outIndexOnlyScan(StringInfo str, const IndexOnlyScan *node)
 	WRITE_NODE_FIELD(indexorderby);
 	WRITE_NODE_FIELD(indextlist);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 551ce6c41c..028d03a56d 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1820,6 +1820,7 @@ _readIndexScan(void)
 	READ_NODE_FIELD(indexorderbyorig);
 	READ_NODE_FIELD(indexorderbyops);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
 
 	READ_DONE();
 }
@@ -1839,6 +1840,7 @@ _readIndexOnlyScan(void)
 	READ_NODE_FIELD(indexorderby);
 	READ_NODE_FIELD(indextlist);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b5a0033721..710edf160a 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -124,6 +124,7 @@ int			max_parallel_workers_per_gather = 2;
 bool		enable_seqscan = true;
 bool		enable_indexscan = true;
 bool		enable_indexonlyscan = true;
+bool		enable_indexskipscan = true;
 bool		enable_bitmapscan = true;
 bool		enable_tidscan = true;
 bool		enable_sort = true;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index dff826a828..7b32f2cc7e 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -175,12 +175,14 @@ static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
 								 Oid indexid, List *indexqual, List *indexqualorig,
 								 List *indexorderby, List *indexorderbyorig,
 								 List *indexorderbyops,
-								 ScanDirection indexscandir);
+								 ScanDirection indexscandir,
+								 int skipprefix);
 static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 Index scanrelid, Oid indexid,
 										 List *indexqual, List *indexorderby,
 										 List *indextlist,
-										 ScanDirection indexscandir);
+										 ScanDirection indexscandir,
+										 int skipprefix);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
 											  List *indexqualorig);
@@ -2910,7 +2912,8 @@ create_indexscan_plan(PlannerInfo *root,
 												fixed_indexquals,
 												fixed_indexorderbys,
 												best_path->indexinfo->indextlist,
-												best_path->indexscandir);
+												best_path->indexscandir,
+												best_path->indexskipprefix);
 	else
 		scan_plan = (Scan *) make_indexscan(tlist,
 											qpqual,
@@ -2921,7 +2924,8 @@ create_indexscan_plan(PlannerInfo *root,
 											fixed_indexorderbys,
 											indexorderbys,
 											indexorderbyops,
-											best_path->indexscandir);
+											best_path->indexscandir,
+											best_path->indexskipprefix);
 
 	copy_generic_path_info(&scan_plan->plan, &best_path->path);
 
@@ -5184,7 +5188,8 @@ make_indexscan(List *qptlist,
 			   List *indexorderby,
 			   List *indexorderbyorig,
 			   List *indexorderbyops,
-			   ScanDirection indexscandir)
+			   ScanDirection indexscandir,
+			   int skipPrefixSize)
 {
 	IndexScan  *node = makeNode(IndexScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5201,6 +5206,7 @@ make_indexscan(List *qptlist,
 	node->indexorderbyorig = indexorderbyorig;
 	node->indexorderbyops = indexorderbyops;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
 
 	return node;
 }
@@ -5213,7 +5219,8 @@ make_indexonlyscan(List *qptlist,
 				   List *indexqual,
 				   List *indexorderby,
 				   List *indextlist,
-				   ScanDirection indexscandir)
+				   ScanDirection indexscandir,
+				   int skipPrefixSize)
 {
 	IndexOnlyScan *node = makeNode(IndexOnlyScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5228,6 +5235,7 @@ make_indexonlyscan(List *qptlist,
 	node->indexorderby = indexorderby;
 	node->indextlist = indextlist;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 984fca0696..777ae3ee0f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4834,6 +4834,83 @@ create_distinct_paths(PlannerInfo *root,
 												  path,
 												  list_length(root->distinct_pathkeys),
 												  numDistinctRows));
+
+				/* Consider index skip scan as well */
+				if (enable_indexskipscan &&
+					IsA(path, IndexPath) &&
+					root->parse->hasDeclareCursor == false &&
+					((IndexPath *) path)->indexinfo->amcanskip &&
+					root->distinct_pathkeys != NIL)
+				{
+					ListCell   		*lc;
+					IndexOptInfo 	*index = NULL;
+					bool 			different_columns_order = false,
+									not_empty_qual = false;
+					int 			i = 0;
+					int 			distinctPrefixKeys;
+
+					Assert(path->pathtype == T_IndexOnlyScan ||
+						   path->pathtype == T_IndexScan);
+
+					index = ((IndexPath *) path)->indexinfo;
+					distinctPrefixKeys = list_length(root->query_uniquekeys);
+
+					/*
+					 * Normally we can think about distinctPrefixKeys as just
+					 * a number of distinct keys. But if lets say we have a
+					 * distinct key a, and the index contains b, a in exactly
+					 * this order. In such situation we need to use position
+					 * of a in the index as distinctPrefixKeys, otherwise skip
+					 * will happen only by the first column.
+					 */
+					foreach(lc, root->query_uniquekeys)
+					{
+						UniqueKey *uniquekey = (UniqueKey *) lfirst(lc);
+						EquivalenceMember *em =
+							lfirst_node(EquivalenceMember,
+										list_head(uniquekey->eq_clause->ec_members));
+						Var *var = (Var *) em->em_expr;
+
+						Assert(i < index->ncolumns);
+
+						for (i = 0; i < index->ncolumns; i++)
+						{
+							if (index->indexkeys[i] == var->varattno)
+							{
+								distinctPrefixKeys = Max(i + 1, distinctPrefixKeys);
+								break;
+							}
+						}
+					}
+
+					/*
+					 * XXX: In case of index scan quals evaluation happens
+					 * after ExecScanFetch, which means skip results could be
+					 * fitered out. Consider the following query:
+					 *
+					 * 		select distinct (a, b) a, b, c from t where  c < 100;
+					 *
+					 * Skip scan returns one tuple for one distinct set of (a,
+					 * b) with arbitrary one of c, so if the choosed c does
+					 * not match the qual and there is any c that matches the
+					 * qual, we miss that tuple.
+					 */
+					if (path->pathtype == T_IndexScan &&
+						parse->jointree != NULL &&
+						parse->jointree->quals != NULL &&
+						list_length((List *) parse->jointree->quals) != 0)
+							not_empty_qual = true;
+
+					if (!different_columns_order &&	!not_empty_qual)
+					{
+						add_path(distinct_rel, (Path *)
+								 create_skipscan_unique_path(root,
+															 distinct_rel,
+															 path,
+															 distinctPrefixKeys,
+															 numDistinctRows));
+					}
+				}
 			}
 		}
 
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index a006dbbe9c..2fb18fb372 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2915,6 +2915,46 @@ create_upper_unique_path(PlannerInfo *root,
 	return pathnode;
 }
 
+/*
+ * create_skipscan_unique_path
+ *	  Creates a pathnode the same as an existing IndexPath except based on
+ *	  skipping duplicate values.  This may or may not be cheaper than using
+ *	  create_upper_unique_path.
+ *
+ * The input path must be an IndexPath for an index that supports amskip.
+ */
+IndexPath *
+create_skipscan_unique_path(PlannerInfo *root,
+							RelOptInfo *rel,
+							Path *basepath,
+							int distinctPrefixKeys,
+							double numGroups)
+{
+	IndexPath *pathnode = makeNode(IndexPath);
+
+	Assert(IsA(basepath, IndexPath));
+
+	/* We don't want to modify basepath, so make a copy. */
+	memcpy(pathnode, basepath, sizeof(IndexPath));
+
+	/* The size of the prefix we'll use for skipping. */
+	Assert(pathnode->indexinfo->amcanskip);
+	Assert(distinctPrefixKeys > 0);
+	/*Assert(distinctPrefixKeys <= list_length(pathnode->path.pathkeys));*/
+	pathnode->indexskipprefix = distinctPrefixKeys;
+
+	/*
+	 * The cost to skip to each distinct value should be roughly the same as
+	 * the cost of finding the first key times the number of distinct values
+	 * we expect to find.
+	 */
+	pathnode->path.startup_cost = basepath->startup_cost;
+	pathnode->path.total_cost = basepath->startup_cost * numGroups;
+	pathnode->path.rows = numGroups;
+
+	return pathnode;
+}
+
 /*
  * create_agg_path
  *	  Creates a pathnode that represents performing aggregation/grouping
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d82fc5ab8b..f65b299f37 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -271,6 +271,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			info->amoptionalkey = amroutine->amoptionalkey;
 			info->amsearcharray = amroutine->amsearcharray;
 			info->amsearchnulls = amroutine->amsearchnulls;
+			info->amcanskip = (amroutine->amskip != NULL);
 			info->amcanparallel = amroutine->amcanparallel;
 			info->amhasgettuple = (amroutine->amgettuple != NULL);
 			info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index cacbe904db..7c71ee4499 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -923,6 +923,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_indexskipscan", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of index-skip-scan plans."),
+			NULL
+		},
+		&enable_indexskipscan,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_bitmapscan", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of bitmap-scan plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e1048c0047..a002ee2143 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -353,6 +353,7 @@
 #enable_hashjoin = on
 #enable_indexscan = on
 #enable_indexonlyscan = on
+#enable_indexskipscan = on
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 3b3e22f73d..3d39cd9d07 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -130,6 +130,13 @@ typedef void (*amrescan_function) (IndexScanDesc scan,
 typedef bool (*amgettuple_function) (IndexScanDesc scan,
 									 ScanDirection direction);
 
+/* skip past duplicates in a given prefix */
+typedef bool (*amskip_function) (IndexScanDesc scan,
+								 ScanDirection dir,
+								 ScanDirection indexdir,
+								 bool start,
+								 int prefix);
+
 /* fetch all valid tuples */
 typedef int64 (*amgetbitmap_function) (IndexScanDesc scan,
 									   TIDBitmap *tbm);
@@ -229,6 +236,7 @@ typedef struct IndexAmRoutine
 	amendscan_function amendscan;
 	ammarkpos_function ammarkpos;	/* can be NULL */
 	amrestrpos_function amrestrpos; /* can be NULL */
+	amskip_function amskip;				/* can be NULL */
 
 	/* interface functions to support parallel index scans */
 	amestimateparallelscan_function amestimateparallelscan; /* can be NULL */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 7e9364a50c..815de4e4dd 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,8 @@ extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
 extern IndexBulkDeleteResult *index_vacuum_cleanup(IndexVacuumInfo *info,
 												   IndexBulkDeleteResult *stats);
 extern bool index_can_return(Relation indexRelation, int attno);
+extern bool index_skip(IndexScanDesc scan, ScanDirection direction,
+					   ScanDirection indexdir, bool start, int prefix);
 extern RegProcedure index_getprocid(Relation irel, AttrNumber attnum,
 									uint16 procnum);
 extern FmgrInfo *index_getprocinfo(Relation irel, AttrNumber attnum,
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 20ace69dab..e098c6a1ab 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -662,6 +662,9 @@ typedef struct BTScanOpaqueData
 	 */
 	int			markItemIndex;	/* itemIndex, or -1 if not valid */
 
+	/* Work space for _bt_skip */
+	BTScanInsert	skipScanKey;	/* used to control skipping */
+
 	/* keep these last in struct for efficiency */
 	BTScanPosData currPos;		/* current position data */
 	BTScanPosData markPos;		/* marked position, if any */
@@ -793,6 +796,8 @@ extern OffsetNumber _bt_binsrch_insert(Relation rel, BTInsertState insertstate);
 extern int32 _bt_compare(Relation rel, BTScanInsert key, Page page, OffsetNumber offnum);
 extern bool _bt_first(IndexScanDesc scan, ScanDirection dir);
 extern bool _bt_next(IndexScanDesc scan, ScanDirection dir);
+extern bool _bt_skip(IndexScanDesc scan, ScanDirection dir,
+					 ScanDirection indexdir, bool start, int prefix);
 extern Buffer _bt_get_endpoint(Relation rel, uint32 level, bool rightmost,
 							   Snapshot snapshot);
 
@@ -817,6 +822,8 @@ extern void _bt_end_vacuum_callback(int code, Datum arg);
 extern Size BTreeShmemSize(void);
 extern void BTreeShmemInit(void);
 extern bytea *btoptions(Datum reloptions, bool validate);
+extern bool btskip(IndexScanDesc scan, ScanDirection dir,
+				   ScanDirection indexdir, bool start, int prefix);
 extern bool btproperty(Oid index_oid, int attno,
 					   IndexAMProperty prop, const char *propname,
 					   bool *res, bool *isnull);
diff --git a/src/include/access/sdir.h b/src/include/access/sdir.h
index 23feb90986..094a127464 100644
--- a/src/include/access/sdir.h
+++ b/src/include/access/sdir.h
@@ -55,4 +55,11 @@ typedef enum ScanDirection
 #define ScanDirectionIsForward(direction) \
 	((bool) ((direction) == ForwardScanDirection))
 
+/*
+ * ScanDirectionsAreOpposite
+ *		True iff scan directions are backward/forward or forward/backward.
+ */
+#define ScanDirectionsAreOpposite(dirA, dirB) \
+	((bool) (dirA != NoMovementScanDirection && dirA == -dirB))
+
 #endif							/* SDIR_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 1f6f5bbc20..2c6acc160a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1423,6 +1423,8 @@ typedef struct IndexScanState
 	ExprContext *iss_RuntimeContext;
 	Relation	iss_RelationDesc;
 	struct IndexScanDescData *iss_ScanDesc;
+	int         iss_SkipPrefixSize;
+	bool		iss_FirstTupleEmitted;
 
 	/* These are needed for re-checking ORDER BY expr ordering */
 	pairingheap *iss_ReorderQueue;
@@ -1452,6 +1454,8 @@ typedef struct IndexScanState
  *		TableSlot		   slot for holding tuples fetched from the table
  *		VMBuffer		   buffer in use for visibility map testing, if any
  *		PscanLen		   size of parallel index-only scan descriptor
+ *		SkipPrefixSize	   number of keys for skip-based DISTINCT
+ *		FirstTupleEmitted  has the first tuple been emitted
  * ----------------
  */
 typedef struct IndexOnlyScanState
@@ -1470,6 +1474,8 @@ typedef struct IndexOnlyScanState
 	struct IndexScanDescData *ioss_ScanDesc;
 	TupleTableSlot *ioss_TableSlot;
 	Buffer		ioss_VMBuffer;
+	int         ioss_SkipPrefixSize;
+	bool		ioss_FirstTupleEmitted;
 	Size		ioss_PscanLen;
 } IndexOnlyScanState;
 
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index da0706add5..f69468d56a 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -131,6 +131,7 @@ typedef struct Query
 	bool		hasModifyingCTE;	/* has INSERT/UPDATE/DELETE in WITH */
 	bool		hasForUpdate;	/* FOR [KEY] UPDATE/SHARE was specified */
 	bool		hasRowSecurity; /* rewriter has applied some RLS policy */
+	bool		hasDeclareCursor;	/* has declaring cursor */
 
 	List	   *cteList;		/* WITH list (of CommonTableExpr's) */
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4e329f0fb5..b0ff9ca3a8 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -839,6 +839,7 @@ struct IndexOptInfo
 	bool		amsearchnulls;	/* can AM search for NULL/NOT NULL entries? */
 	bool		amhasgettuple;	/* does AM have amgettuple interface? */
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
+	bool		amcanskip;		/* can AM skip duplicate values? */
 	bool		amcanparallel;	/* does AM support parallel scan? */
 	/* Rather than include amapi.h here, we declare amcostestimate like this */
 	void		(*amcostestimate) ();	/* AM's cost estimator */
@@ -1189,6 +1190,9 @@ typedef struct Path
  * we need not recompute them when considering using the same index in a
  * bitmap index/heap scan (see BitmapHeapPath).  The costs of the IndexPath
  * itself represent the costs of an IndexScan or IndexOnlyScan plan type.
+ *
+ * 'indexskipprefix' represents the number of columns to consider for skip
+ * scans.
  *----------
  */
 typedef struct IndexPath
@@ -1201,6 +1205,7 @@ typedef struct IndexPath
 	ScanDirection indexscandir;
 	Cost		indextotalcost;
 	Selectivity indexselectivity;
+	int			indexskipprefix;
 } IndexPath;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 32c0d87f80..03a00e8e1d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -409,6 +409,8 @@ typedef struct IndexScan
 	List	   *indexorderbyorig;	/* the same in original form */
 	List	   *indexorderbyops;	/* OIDs of sort ops for ORDER BY exprs */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for distinct
+										 * scans */
 } IndexScan;
 
 /* ----------------
@@ -436,6 +438,8 @@ typedef struct IndexOnlyScan
 	List	   *indexorderby;	/* list of index ORDER BY exprs */
 	List	   *indextlist;		/* TargetEntry list describing index's cols */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for distinct
+										 * scans */
 } IndexOnlyScan;
 
 /* ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index cb012ba198..847f34f02b 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -50,6 +50,7 @@ extern PGDLLIMPORT int max_parallel_workers_per_gather;
 extern PGDLLIMPORT bool enable_seqscan;
 extern PGDLLIMPORT bool enable_indexscan;
 extern PGDLLIMPORT bool enable_indexonlyscan;
+extern PGDLLIMPORT bool enable_indexskipscan;
 extern PGDLLIMPORT bool enable_bitmapscan;
 extern PGDLLIMPORT bool enable_tidscan;
 extern PGDLLIMPORT bool enable_sort;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index f75ff6f323..6c8c9dadbb 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -201,6 +201,11 @@ extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
 												 Path *subpath,
 												 int numCols,
 												 double numGroups);
+extern IndexPath *create_skipscan_unique_path(PlannerInfo *root,
+											  RelOptInfo *rel,
+											  Path *subpath,
+											  int numCols,
+											  double numGroups);
 extern AggPath *create_agg_path(PlannerInfo *root,
 								RelOptInfo *rel,
 								Path *subpath,
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index f3696c6d1d..259db10c81 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -244,3 +244,604 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+ANALYZE distinct_a;
+SELECT DISTINCT a FROM distinct_a;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+ a 
+---
+ 1
+(1 row)
+
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Index Only Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: true
+(2 rows)
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: true
+   Index Cond: (b = 2)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: true
+   Index Cond: (b = 2)
+(3 rows)
+
+DROP INDEX distinct_a_b_a;
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+FETCH FROM c;
+ a | b 
+---+---
+ 1 | 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+END;
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+FETCH FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+END;
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Index Only Scan using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: true
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 1 | 2
+ 3 | 1 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 1 | 2
+ 1 | 1 | 2
+(2 rows)
+
+END;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Index Only Scan Backward using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: true
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 2 | 2
+ 1 | 2 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 2 | 2
+ 3 | 2 | 2
+(2 rows)
+
+END;
+DROP TABLE distinct_abc;
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+ 2 | 1 | 10
+ 3 | 1 | 10
+ 4 | 1 | 10
+ 5 | 1 | 10
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Index Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: true
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Unique
+   ->  Bitmap Heap Scan on distinct_a
+         Recheck Cond: (a = 1)
+         ->  Bitmap Index Scan on distinct_a_a_b_idx
+               Index Cond: (a = 1)
+(5 rows)
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Unique
+   ->  Index Scan using distinct_a_a_b_idx on distinct_a
+         Index Cond: (b = 2)
+         Filter: (c = 10)
+(4 rows)
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+ a | a 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+ 4 | 4
+ 5 | 5
+(5 rows)
+
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+ a | ?column? 
+---+----------
+ 1 |        1
+ 2 |        1
+ 3 |        1
+ 4 |        1
+ 5 |        1
+(5 rows)
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+FETCH FROM c;
+ a 
+---
+ 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a 
+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+END;
+DROP TABLE distinct_a;
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 |  9999
+ 1 | 10000
+(5 rows)
+
+DROP TABLE distinct_visibility;
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Index Only Scan using distinct_boundaries_a_b_c_idx on distinct_boundaries
+   Skip scan: true
+   Index Cond: ((b >= 1) AND (c = 0))
+(3 rows)
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+ a | b | c 
+---+---+---
+ 1 | 2 | 0
+ 2 | 2 | 0
+ 3 | 2 | 0
+ 4 | 2 | 0
+ 5 | 2 | 0
+(5 rows)
+
+DROP TABLE distinct_boundaries;
+-- test tuple killing
+-- DESC ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed where a = 3;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 5 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 1 | 1000 | 0 | 10
+(4 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 1 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 5 | 1000 | 0 | 10
+(4 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
+-- regular ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed where a = 3;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a, b;
+    FETCH FORWARD ALL FROM c;
+ a | b | c | d  
+---+---+---+----
+ 1 | 1 | 1 | 10
+ 2 | 1 | 1 | 10
+ 4 | 1 | 1 | 10
+ 5 | 1 | 1 | 10
+(4 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a | b | c | d  
+---+---+---+----
+ 5 | 1 | 1 | 10
+ 4 | 1 | 1 | 10
+ 2 | 1 | 1 | 10
+ 1 | 1 | 1 | 10
+(4 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
+-- partial delete
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed WHERE a = 3 AND b <= 999;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 5 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 3 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 1 | 1000 | 0 | 10
+(5 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 1 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 3 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 5 | 1000 | 0 | 10
+(5 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index a1c90eb905..bd3b373515 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -78,6 +78,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashjoin                | on
  enable_indexonlyscan           | on
  enable_indexscan               | on
+ enable_indexskipscan           | on
  enable_material                | on
  enable_mergejoin               | on
  enable_nestloop                | on
@@ -89,7 +90,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(17 rows)
+(18 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index a605e86449..843efeb28f 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -73,3 +73,251 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+ANALYZE distinct_a;
+
+SELECT DISTINCT a FROM distinct_a;
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+DROP INDEX distinct_a_b_a;
+
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+DROP TABLE distinct_abc;
+
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+DROP TABLE distinct_a;
+
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DROP TABLE distinct_visibility;
+
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+DROP TABLE distinct_boundaries;
+
+-- test tuple killing
+
+-- DESC ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed where a = 3;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
+
+-- regular ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed where a = 3;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a, b;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
+
+-- partial delete
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed WHERE a = 3 AND b <= 999;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
-- 
2.21.0

#31

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 6 years ago

In reply to: Dmitry Dolgov (#30)

Re: Index Skip Scan

On Fri, Feb 07, 2020 at 05:25:43PM +0100, Dmitry Dolgov wrote:

On Thu, Feb 06, 2020 at 09:22:20PM +0900, Kyotaro Horiguchi wrote:
At Thu, 6 Feb 2020 11:57:07 +0100, Dmitry Dolgov <9erthalion6@gmail.com> wrote in

On Thu, Feb 06, 2020 at 10:24:50AM +0900, Kyotaro Horiguchi wrote:
At Wed, 5 Feb 2020 17:37:30 +0100, Dmitry Dolgov <9erthalion6@gmail.com> wrote in
We could add an additional parameter "in_cursor" to
ExecSupportBackwardScan and let skip scan return false if in_cursor is
true, but I'm not sure it's acceptable.

I also was thinking about whether it's possible to use
ExecSupportBackwardScan here, but skip scan is just a mode of an
index/indexonly scan. Which means that ExecSupportBackwardScan also need
to know somehow if this mode is being used, and then, since this
function is called after it's already decided to use skip scan in the
resulting plan, somehow correct the plan (exclude skipping and try to
find next best path?) - do I understand your suggestion correct?

I didn't thought so hardly, but a bit of confirmation told me that
IndexSupportsBackwardScan returns fixed flag for AM. It seems that
things are not that simple.

Yes, I've mentioned that already in one of the previous emails :) The
simplest way I see to achieve what we want is to do something like in
attached modified version with a new hasDeclaredCursor field. It's not a
final version though, but posted just for discussion, so feel free to
suggest any improvements or alternatives.

IMO the proper fix for this case (moving forward, reading backwards) is
simply making it work by properly checking deleted tuples etc. Not sure
why that would be so much complex (haven't tried implementing it)?

I think making this depend on things like declared cursor etc. is going
to be tricky, may easily be more complex than checking deleted tuples,
and the behavior may be quite surprising.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#32

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 6 years ago

In reply to: Dmitry Dolgov (#30)

Re: Index Skip Scan

Hi,

I've done some testing and benchmarking of the v31 patch, looking for
regressions, costing issues etc. Essentially, I've ran a bunch of SELECT
DISTINCT queries on data sets of various size, number of distinct values
etc. The results are fairly large, so I've uploaded them to github

https://github.com/tvondra/skip-scan-test

There are four benchmark groups, depending on how the data are generated
and availability of extended stats and if the columns are independent:

1) skipscan - just indexes, columns are independent

2) skipscan-with-stats - indexes and extended stats, independent columns

3) skipscan-correlated - just indexes, correlated columns

4) skipscan-correlated-with-stats - indexes and extended stats,
correlated columns

The github repository contains *.ods spreadsheet comparing duration with
the regular query plan (no skip scan) and skip scan. In general, there
are pretty massive speedups, often by about two orders of magnitude.

There are a couple of regressions, where the plan with skipscan enables
is ~10x slower. But this seems to happen only in high-cardinality cases
where we misestimate the number of groups. Consider a table with two
independent columns

CREATE TABLE t (a text, b text);
INSERT INTO t SELECT
md5((10000*random())::int::text),
md5((10000*random())::int::text)
FROM generate_series(1,1000000) s(i);

CREATE INDEX ON t(a,b);

ANALYZE;

which then behaves like this:

test=# select * from (select distinct a,b from t) foo offset 10000000;
Time: 3138.222 ms (00:03.138)
test=# set enable_indexskipscan = off;
Time: 0.312 ms
test=# select * from (select distinct a,b from t) foo offset 10000000;
Time: 199.749 ms

So in this case the skip scan is ~15x slower than the usual plan (index
only scan + unique). The reason why this happens is pretty simple - to
estimate the number of groups we multiply the ndistinct estimates for
the two columns (which both have n_distinct = 10000), but then we cap
the estimate to 10% of the table. But when the columns are independent
with high cardinalities that under-estimates the actual value, making
the cost for skip scan much lower than it should be.

I don't think this is an issue the skipscan patch needs to fix, though.
Firstly, the regressed cases are a tiny minority. Secondly, we already
have a way to improve the root cause - creating extended stats with
ndistinct coefficients generally makes the problem go away.

One interesting observation however is that this regression only
happened with text columns but not with int or bigint. My assumption is
that this is due to text comparisons being much more expensive. Not sure
if there is something we could do to deal with this - reduce the number
of comparisons or something?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#33

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 6 years ago

In reply to: Dmitry Dolgov (#30)

Re: Index Skip Scan

OK,

A couple more comments based on quick review of the patch, particularly
the part related to planning:

1) create_skipscan_unique_path has one assert commented out. Either it's
something we want to enforce, or we should remove it.

/*Assert(distinctPrefixKeys <= list_length(pathnode->path.pathkeys));*/

2) I wonder if the current costing model is overly optimistic. We simply
copy the startup cost from the IndexPath, which seems fine. But for
total cost we do this:

pathnode->path.total_cost = basepath->startup_cost * numGroups;

which seems a bit too simplistic. The startup cost is pretty much just
the cost to find the first item in the index, but surely we need to do
more to find the next group - we need to do comparisons to skip some of
the items, etc. If we think that's unnecessary, we need to explain it in
a comment or somthing.

3) I don't think we should make planning dependent on hasDeclareCursor.

4) I'm not quite sure how sensible it's to create a new IndexPath in
create_skipscan_unique_path. On the one hand it works, but I don't think
any other path is constructed like this so I wonder if we're missing
something. Perhaps it'd be better to just add a new path node on top of
the IndexPath, and then handle this in create_plan. We already do
something similar for Bitmap Index Scans, where we create a different
executor node from IndexPath depending on the parent node.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#34

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Tomas Vondra (#32)

Re: Index Skip Scan

On Sat, Feb 08, 2020 at 03:22:17PM +0100, Tomas Vondra wrote:

I've done some testing and benchmarking of the v31 patch, looking for
regressions, costing issues etc. Essentially, I've ran a bunch of SELECT
DISTINCT queries on data sets of various size, number of distinct values
etc. The results are fairly large, so I've uploaded them to github

https://github.com/tvondra/skip-scan-test

Thanks a lot for testing!

There are a couple of regressions, where the plan with skipscan enables
is ~10x slower. But this seems to happen only in high-cardinality cases
where we misestimate the number of groups. Consider a table with two
independent columns

CREATE TABLE t (a text, b text);
INSERT INTO t SELECT
md5((10000*random())::int::text),
md5((10000*random())::int::text)
FROM generate_series(1,1000000) s(i);

CREATE INDEX ON t(a,b);

ANALYZE;

which then behaves like this:

test=# select * from (select distinct a,b from t) foo offset 10000000;
Time: 3138.222 ms (00:03.138)
test=# set enable_indexskipscan = off;
Time: 0.312 ms
test=# select * from (select distinct a,b from t) foo offset 10000000;
Time: 199.749 ms

So in this case the skip scan is ~15x slower than the usual plan (index
only scan + unique). The reason why this happens is pretty simple - to
estimate the number of groups we multiply the ndistinct estimates for
the two columns (which both have n_distinct = 10000), but then we cap
the estimate to 10% of the table. But when the columns are independent
with high cardinalities that under-estimates the actual value, making
the cost for skip scan much lower than it should be.

The current implementation checks if we can find the next value on the
same page to do a shortcut instead of tree traversal and improve such
kind of situations, but I can easily imagine that it's still not enough
in some extreme situations.

I don't think this is an issue the skipscan patch needs to fix, though.
Firstly, the regressed cases are a tiny minority. Secondly, we already
have a way to improve the root cause - creating extended stats with
ndistinct coefficients generally makes the problem go away.

Yes, I agree.

One interesting observation however is that this regression only
happened with text columns but not with int or bigint. My assumption is
that this is due to text comparisons being much more expensive. Not sure
if there is something we could do to deal with this - reduce the number
of comparisons or something?

Hm, interesting. I need to check that we do not do any unnecessary
comparisons.

On Sat, Feb 08, 2020 at 02:11:59PM +0100, Tomas Vondra wrote:

Yes, I've mentioned that already in one of the previous emails :) The
simplest way I see to achieve what we want is to do something like in
attached modified version with a new hasDeclaredCursor field. It's not a
final version though, but posted just for discussion, so feel free to
suggest any improvements or alternatives.

IMO the proper fix for this case (moving forward, reading backwards) is
simply making it work by properly checking deleted tuples etc. Not sure
why that would be so much complex (haven't tried implementing it)?

It's probably not that complex by itself, but requires changing
responsibilities isolation. At the moment current implementation leaves
jumping over a tree fully to _bt_skip, and heap visibility checks only
to IndexOnlyNext. To check deleted tuples properly we need to either
verify a corresponding heap tuple visibility inside _bt_skip (as I've
mentioned in one of the previous emails, checking if an index tuple is
dead is not enough), or teach the code in IndexOnlyNext to understand
that _bt_skip can lead to returning the same tuple while moving forward
& reading backward. Do you think it's still makes sense to go this way?

#35

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 6 years ago

In reply to: Dmitry Dolgov (#34)

Re: Index Skip Scan

On Sat, Feb 08, 2020 at 04:24:40PM +0100, Dmitry Dolgov wrote:

On Sat, Feb 08, 2020 at 03:22:17PM +0100, Tomas Vondra wrote:

I've done some testing and benchmarking of the v31 patch, looking for
regressions, costing issues etc. Essentially, I've ran a bunch of SELECT
DISTINCT queries on data sets of various size, number of distinct values
etc. The results are fairly large, so I've uploaded them to github

https://github.com/tvondra/skip-scan-test

Thanks a lot for testing!

There are a couple of regressions, where the plan with skipscan enables
is ~10x slower. But this seems to happen only in high-cardinality cases
where we misestimate the number of groups. Consider a table with two
independent columns

CREATE TABLE t (a text, b text);
INSERT INTO t SELECT
md5((10000*random())::int::text),
md5((10000*random())::int::text)
FROM generate_series(1,1000000) s(i);

CREATE INDEX ON t(a,b);

ANALYZE;

which then behaves like this:

test=# select * from (select distinct a,b from t) foo offset 10000000;
Time: 3138.222 ms (00:03.138)
test=# set enable_indexskipscan = off;
Time: 0.312 ms
test=# select * from (select distinct a,b from t) foo offset 10000000;
Time: 199.749 ms

So in this case the skip scan is ~15x slower than the usual plan (index
only scan + unique). The reason why this happens is pretty simple - to
estimate the number of groups we multiply the ndistinct estimates for
the two columns (which both have n_distinct = 10000), but then we cap
the estimate to 10% of the table. But when the columns are independent
with high cardinalities that under-estimates the actual value, making
the cost for skip scan much lower than it should be.

The current implementation checks if we can find the next value on the
same page to do a shortcut instead of tree traversal and improve such
kind of situations, but I can easily imagine that it's still not enough
in some extreme situations.

Yeah. I'm not sure there's room for further improvements. The regressed
cases were subject to the 10% cap, and with ndistinct being more than
10% of the table, we probably can find many distinct keys on each index
page - we know that every ~10 rows the values change.

I don't think this is an issue the skipscan patch needs to fix, though.
Firstly, the regressed cases are a tiny minority. Secondly, we already
have a way to improve the root cause - creating extended stats with
ndistinct coefficients generally makes the problem go away.

Yes, I agree.

One interesting observation however is that this regression only
happened with text columns but not with int or bigint. My assumption is
that this is due to text comparisons being much more expensive. Not sure
if there is something we could do to deal with this - reduce the number
of comparisons or something?

Hm, interesting. I need to check that we do not do any unnecessary
comparisons.

On Sat, Feb 08, 2020 at 02:11:59PM +0100, Tomas Vondra wrote:

Yes, I've mentioned that already in one of the previous emails :) The
simplest way I see to achieve what we want is to do something like in
attached modified version with a new hasDeclaredCursor field. It's not a
final version though, but posted just for discussion, so feel free to
suggest any improvements or alternatives.

IMO the proper fix for this case (moving forward, reading backwards) is
simply making it work by properly checking deleted tuples etc. Not sure
why that would be so much complex (haven't tried implementing it)?

It's probably not that complex by itself, but requires changing
responsibilities isolation. At the moment current implementation leaves
jumping over a tree fully to _bt_skip, and heap visibility checks only
to IndexOnlyNext. To check deleted tuples properly we need to either
verify a corresponding heap tuple visibility inside _bt_skip (as I've
mentioned in one of the previous emails, checking if an index tuple is
dead is not enough), or teach the code in IndexOnlyNext to understand
that _bt_skip can lead to returning the same tuple while moving forward
& reading backward. Do you think it's still makes sense to go this way?

Not sure. I have to think about this first.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#36

James Coleman

jtc331@gmail.com

almost 6 years ago

In reply to: Dmitry Dolgov (#34)

Re: Index Skip Scan

On Sat, Feb 8, 2020 at 10:24 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Sat, Feb 08, 2020 at 03:22:17PM +0100, Tomas Vondra wrote:
So in this case the skip scan is ~15x slower than the usual plan (index
only scan + unique). The reason why this happens is pretty simple - to
estimate the number of groups we multiply the ndistinct estimates for
the two columns (which both have n_distinct = 10000), but then we cap
the estimate to 10% of the table. But when the columns are independent
with high cardinalities that under-estimates the actual value, making
the cost for skip scan much lower than it should be.

The current implementation checks if we can find the next value on the
same page to do a shortcut instead of tree traversal and improve such
kind of situations, but I can easily imagine that it's still not enough
in some extreme situations.

This is almost certainly rehashing already covered ground, but since I
doubt it's been discussed recently, would you be able to summarize
that choice (not to always get the next tuple by scanning from the top
of the tree again) and the performance/complexity tradeoffs?

Thanks,
James

#37

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: James Coleman (#36)

Re: Index Skip Scan

On Sat, Feb 08, 2020 at 01:31:02PM -0500, James Coleman wrote:
On Sat, Feb 8, 2020 at 10:24 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Sat, Feb 08, 2020 at 03:22:17PM +0100, Tomas Vondra wrote:
So in this case the skip scan is ~15x slower than the usual plan (index
only scan + unique). The reason why this happens is pretty simple - to
estimate the number of groups we multiply the ndistinct estimates for
the two columns (which both have n_distinct = 10000), but then we cap
the estimate to 10% of the table. But when the columns are independent
with high cardinalities that under-estimates the actual value, making
the cost for skip scan much lower than it should be.

The current implementation checks if we can find the next value on the
same page to do a shortcut instead of tree traversal and improve such
kind of situations, but I can easily imagine that it's still not enough
in some extreme situations.

This is almost certainly rehashing already covered ground, but since I
doubt it's been discussed recently, would you be able to summarize
that choice (not to always get the next tuple by scanning from the top
of the tree again) and the performance/complexity tradeoffs?

Yeah, this part of discussion happened already some time ago. The idea
[1]: /messages/by-id/CA+TgmoY7QTHhzLWZupNSyyqFRBfMgYocg3R-6g=DRgT4-KBGqg@mail.gmail.com
estimations. Simply doing jumping over an index means that even if the
next key we're searching for is on the same page as previous, we still
end up doing a search from the root of the tree, which is of course less
efficient than just check right on the page before jumping further.

Performance tradeoff in this case is simple, we make regular use case
slightly slower, but can perform better in the worst case scenarios.
Complexity tradeoff was never discussed, but I guess everyone assumed
it's relatively straightforward to check the current page and return if
something was found before jumping.

[1]: /messages/by-id/CA+TgmoY7QTHhzLWZupNSyyqFRBfMgYocg3R-6g=DRgT4-KBGqg@mail.gmail.com

#38

Kyotaro Horiguchi

horikyota.ntt@gmail.com

almost 6 years ago

In reply to: Tomas Vondra (#31)

2 attachment(s)

Re: Index Skip Scan

Thank you very much for the benchmarking!

A bit different topic from the latest branch..

At Sat, 8 Feb 2020 14:11:59 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in

Yes, I've mentioned that already in one of the previous emails :) The
simplest way I see to achieve what we want is to do something like in
attached modified version with a new hasDeclaredCursor field. It's not
a
final version though, but posted just for discussion, so feel free to
suggest any improvements or alternatives.

IMO the proper fix for this case (moving forward, reading backwards)
is
simply making it work by properly checking deleted tuples etc. Not
sure
why that would be so much complex (haven't tried implementing it)?

I don't think it's not so complex. But I suspect that it might be a
bit harder starting from the current shpae.

The first attached (renamed to .txt not to confuse the cfbots) is a
small patch that makes sure if _bt_readpage is called with the proper
condition as written in its comment, that is, caller must have pinned
and read-locked so->currPos.buf. This patch reveals many instances of
breakage of the contract.

The second is a crude fix the breakages, but the result seems far from
neat.. I think we need rethinking taking modification of support
functions into consideration.

I think making this depend on things like declared cursor etc. is
going
to be tricky, may easily be more complex than checking deleted tuples,
and the behavior may be quite surprising.

Sure.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-debug-aid.patch.txttext/plain; charset=us-asciiDownload

From de129e5a261ed43f002c1684dc9d6575f3880b16 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 6 Feb 2020 14:31:36 +0900
Subject: [PATCH 1/2] debug aid

---
 src/backend/access/nbtree/nbtsearch.c |  1 +
 src/backend/storage/buffer/bufmgr.c   | 13 +++++++++++++
 src/include/storage/bufmgr.h          |  1 +
 3 files changed, 15 insertions(+)

diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index c5f5d228f2..5cd97d8bb5 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -1785,6 +1785,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 	 * used here; this function is what makes it good for currPos.
 	 */
 	Assert(BufferIsValid(so->currPos.buf));
+	Assert(BufferLockAndPinHeldByMe(so->currPos.buf));
 
 	page = BufferGetPage(so->currPos.buf);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index aba3960481..08a75a6846 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1553,6 +1553,19 @@ ReleaseAndReadBuffer(Buffer buffer,
 	return ReadBuffer(relation, blockNum);
 }
 
+/* tmp function for debugging */
+bool
+BufferLockAndPinHeldByMe(Buffer buffer)
+{
+	BufferDesc  *b = GetBufferDescriptor(buffer - 1);
+
+	if (BufferIsPinned(buffer) &&
+		LWLockHeldByMe(BufferDescriptorGetContentLock(b)))
+		return true;
+
+	return false;
+}
+
 /*
  * PinBuffer -- make buffer unavailable for replacement.
  *
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 73c7e9ba38..8e5fc639a0 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -177,6 +177,7 @@ extern void MarkBufferDirty(Buffer buffer);
 extern void IncrBufferRefCount(Buffer buffer);
 extern Buffer ReleaseAndReadBuffer(Buffer buffer, Relation relation,
 								   BlockNumber blockNum);
+extern bool BufferLockAndPinHeldByMe(Buffer buffer);
 
 extern void InitBufferPool(void);
 extern void InitBufferPoolAccess(void);
-- 
2.18.2

0002-crude-fix.patch.txttext/plain; charset=us-asciiDownload

From 912bad2ec8c66ccd01cebf1f69233b004c633243 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 6 Feb 2020 19:09:09 +0900
Subject: [PATCH 2/2] crude fix

---
 src/backend/access/nbtree/nbtsearch.c | 43 +++++++++++++++++----------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index 5cd97d8bb5..1f18b38ca5 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -1619,6 +1619,9 @@ _bt_skip(IndexScanDesc scan, ScanDirection dir,
 
 			nextOffset = startOffset = ItemPointerGetOffsetNumber(&scan->xs_itup->t_tid);
 
+			if (nextOffset != startOffset)
+				LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+
 			while (nextOffset == startOffset)
 			{
 				IndexTuple itup;
@@ -1653,7 +1656,7 @@ _bt_skip(IndexScanDesc scan, ScanDirection dir,
 				offnum = OffsetNumberPrev(offnum);
 
 				/* Check if _bt_readpage returns already found item */
-				if (!_bt_readpage(scan, indexdir, offnum))
+				if (!_bt_readpage(scan, dir, offnum))
 				{
 					/*
 					 * There's no actually-matching data on this page.  Try to
@@ -1668,6 +1671,8 @@ _bt_skip(IndexScanDesc scan, ScanDirection dir,
 						return false;
 					}
 				}
+				else
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
 
 				currItem = &so->currPos.items[so->currPos.lastItem];
 				itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
@@ -1721,24 +1726,30 @@ _bt_skip(IndexScanDesc scan, ScanDirection dir,
 	}
 
 	/* Now read the data */
-	if (!_bt_readpage(scan, indexdir, offnum))
+	if (!(ScanDirectionIsForward(dir) &&
+		  ScanDirectionIsBackward(indexdir)) ||
+		scanstart)
 	{
-		/*
-		 * There's no actually-matching data on this page.  Try to advance to
-		 * the next page.  Return false if there's no matching data at all.
-		 */
-		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
-		if (!_bt_steppage(scan, dir))
+		if (!_bt_readpage(scan, dir, offnum))
 		{
-			pfree(so->skipScanKey);
-			so->skipScanKey = NULL;
-			return false;
+			/*
+			 * There's no actually-matching data on this page.  Try to advance
+			 * to the next page.  Return false if there's no matching data at
+			 * all.
+			 */
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			if (!_bt_steppage(scan, dir))
+			{
+				pfree(so->skipScanKey);
+				so->skipScanKey = NULL;
+				return false;
+			}
+		}
+		else
+		{
+			/* Drop the lock, and maybe the pin, on the current page */
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
 		}
-	}
-	else
-	{
-		/* Drop the lock, and maybe the pin, on the current page */
-		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
 	}
 
 	/* And set IndexTuple */
-- 
2.18.2

#39

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Kyotaro Horiguchi (#38)

Re: Index Skip Scan

On Fri, Feb 14, 2020 at 05:23:13PM +0900, Kyotaro Horiguchi wrote:
The first attached (renamed to .txt not to confuse the cfbots) is a
small patch that makes sure if _bt_readpage is called with the proper
condition as written in its comment, that is, caller must have pinned
and read-locked so->currPos.buf. This patch reveals many instances of
breakage of the contract.

Thanks! On top of which patch version one can apply it? I'm asking
because I believe I've addressed similar issues in the last version, and
the last proposed diff (after resolving some conflicts) breaks tests for
me, so not sure if I miss something.

At the same time if you and Tomas strongly agree that it actually makes
sense to make moving forward/reading backward case work with dead tuples
correctly, I'll take a shot and try to teach the code around _bt_skip to
do what is required for that. I can merge your changes there and we can
see what would be the result.

#40

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Dmitry Dolgov (#39)

2 attachment(s)

Re: Index Skip Scan

On Fri, Feb 14, 2020 at 01:18:20PM +0100, Dmitry Dolgov wrote:

On Fri, Feb 14, 2020 at 05:23:13PM +0900, Kyotaro Horiguchi wrote:
The first attached (renamed to .txt not to confuse the cfbots) is a
small patch that makes sure if _bt_readpage is called with the proper
condition as written in its comment, that is, caller must have pinned
and read-locked so->currPos.buf. This patch reveals many instances of
breakage of the contract.

Thanks! On top of which patch version one can apply it? I'm asking
because I believe I've addressed similar issues in the last version, and
the last proposed diff (after resolving some conflicts) breaks tests for
me, so not sure if I miss something.

At the same time if you and Tomas strongly agree that it actually makes
sense to make moving forward/reading backward case work with dead tuples
correctly, I'll take a shot and try to teach the code around _bt_skip to
do what is required for that. I can merge your changes there and we can
see what would be the result.

Here is something similar to what I had in mind. In this version of the
patch IndexOnlyNext now verify if we returned to the same position as
before while reading in opposite to the advancing direction due to
visibility checks (similar to what is implemented inside _bt_skip for
the situation when some distinct keys being eliminated due to scankey
conditions). It's actually not that invasive as I feared, but still
pretty hacky. I'm not sure if it's ok to compare resulting heaptid in
this situation, but all the mention tests are passing. Also, this version
doesn't incorporate any planner feedback from Tomas yet, my intention is
just to check if it could be the right direction.

Attachments:

v32-0001-Unique-key.patchtext/x-diff; charset=us-asciiDownload

From 22e6b4ccd5f79ca069bd5cd90ba3696dd97f76ea Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Tue, 9 Jul 2019 06:44:57 -0400
Subject: [PATCH v32 1/2] Unique key

Design by David Rowley.

Author: Jesper Pedersen
---
 src/backend/nodes/outfuncs.c           |  14 +++
 src/backend/nodes/print.c              |  39 +++++++
 src/backend/optimizer/path/Makefile    |   3 +-
 src/backend/optimizer/path/allpaths.c  |   8 ++
 src/backend/optimizer/path/indxpath.c  |  41 +++++++
 src/backend/optimizer/path/pathkeys.c  |  71 ++++++++++--
 src/backend/optimizer/path/uniquekey.c | 147 +++++++++++++++++++++++++
 src/backend/optimizer/plan/planagg.c   |   1 +
 src/backend/optimizer/plan/planmain.c  |   1 +
 src/backend/optimizer/plan/planner.c   |  17 ++-
 src/backend/optimizer/util/pathnode.c  |  12 ++
 src/include/nodes/nodes.h              |   1 +
 src/include/nodes/pathnodes.h          |  18 +++
 src/include/nodes/print.h              |   1 +
 src/include/optimizer/pathnode.h       |   1 +
 src/include/optimizer/paths.h          |  11 ++
 16 files changed, 373 insertions(+), 13 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekey.c

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index d76fae44b8..16083e7a7e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1723,6 +1723,7 @@ _outPathInfo(StringInfo str, const Path *node)
 	WRITE_FLOAT_FIELD(startup_cost, "%.2f");
 	WRITE_FLOAT_FIELD(total_cost, "%.2f");
 	WRITE_NODE_FIELD(pathkeys);
+	WRITE_NODE_FIELD(uniquekeys);
 }
 
 /*
@@ -2205,6 +2206,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(eq_classes);
 	WRITE_BOOL_FIELD(ec_merging_done);
 	WRITE_NODE_FIELD(canon_pathkeys);
+	WRITE_NODE_FIELD(canon_uniquekeys);
 	WRITE_NODE_FIELD(left_join_clauses);
 	WRITE_NODE_FIELD(right_join_clauses);
 	WRITE_NODE_FIELD(full_join_clauses);
@@ -2214,6 +2216,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(placeholder_list);
 	WRITE_NODE_FIELD(fkey_list);
 	WRITE_NODE_FIELD(query_pathkeys);
+	WRITE_NODE_FIELD(query_uniquekeys);
 	WRITE_NODE_FIELD(group_pathkeys);
 	WRITE_NODE_FIELD(window_pathkeys);
 	WRITE_NODE_FIELD(distinct_pathkeys);
@@ -2401,6 +2404,14 @@ _outPathKey(StringInfo str, const PathKey *node)
 	WRITE_BOOL_FIELD(pk_nulls_first);
 }
 
+static void
+_outUniqueKey(StringInfo str, const UniqueKey *node)
+{
+	WRITE_NODE_TYPE("UNIQUEKEY");
+
+	WRITE_NODE_FIELD(eq_clause);
+}
+
 static void
 _outPathTarget(StringInfo str, const PathTarget *node)
 {
@@ -4092,6 +4103,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PathKey:
 				_outPathKey(str, obj);
 				break;
+			case T_UniqueKey:
+				_outUniqueKey(str, obj);
+				break;
 			case T_PathTarget:
 				_outPathTarget(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 42476724d8..d286b34544 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -459,6 +459,45 @@ print_pathkeys(const List *pathkeys, const List *rtable)
 	printf(")\n");
 }
 
+/*
+ * print_uniquekeys -
+ *	  uniquekeys list of UniqueKeys
+ */
+void
+print_uniquekeys(const List *uniquekeys, const List *rtable)
+{
+	ListCell   *l;
+
+	printf("(");
+	foreach(l, uniquekeys)
+	{
+		UniqueKey *unique_key = (UniqueKey *) lfirst(l);
+		EquivalenceClass *eclass = (EquivalenceClass *) unique_key->eq_clause;
+		ListCell   *k;
+		bool		first = true;
+
+		/* chase up */
+		while (eclass->ec_merged)
+			eclass = eclass->ec_merged;
+
+		printf("(");
+		foreach(k, eclass->ec_members)
+		{
+			EquivalenceMember *mem = (EquivalenceMember *) lfirst(k);
+
+			if (first)
+				first = false;
+			else
+				printf(", ");
+			print_expr((Node *) mem->em_expr, rtable);
+		}
+		printf(")");
+		if (lnext(uniquekeys, l))
+			printf(", ");
+	}
+	printf(")\n");
+}
+
 /*
  * print_tl
  *	  print targetlist in a more legible way.
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..63cc1505d9 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekey.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8286d9cf34..bbc13e6141 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3954,6 +3954,14 @@ print_path(PlannerInfo *root, Path *path, int indent)
 		print_pathkeys(path->pathkeys, root->parse->rtable);
 	}
 
+	if (path->uniquekeys)
+	{
+		for (i = 0; i < indent; i++)
+			printf("\t");
+		printf("  uniquekeys: ");
+		print_uniquekeys(path->uniquekeys, root->parse->rtable);
+	}
+
 	if (join)
 	{
 		JoinPath   *jp = (JoinPath *) path;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2a50272da6..bd1ea53e5c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -189,6 +189,7 @@ static Expr *match_clause_to_ordering_op(IndexOptInfo *index,
 static bool ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 									   EquivalenceClass *ec, EquivalenceMember *em,
 									   void *arg);
+static List *get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys);
 
 
 /*
@@ -874,6 +875,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	List	   *orderbyclausecols;
 	List	   *index_pathkeys;
 	List	   *useful_pathkeys;
+	List	   *useful_uniquekeys = NIL;
 	bool		found_lower_saop_clause;
 	bool		pathkeys_possibly_useful;
 	bool		index_is_ordered;
@@ -1036,11 +1038,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	if (index_clauses != NIL || useful_pathkeys != NIL || useful_predicate ||
 		index_only_scan)
 	{
+		if (has_useful_uniquekeys(root))
+			useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 		ipath = create_index_path(root, index,
 								  index_clauses,
 								  orderbyclauses,
 								  orderbyclausecols,
 								  useful_pathkeys,
+								  useful_uniquekeys,
 								  index_is_ordered ?
 								  ForwardScanDirection :
 								  NoMovementScanDirection,
@@ -1063,6 +1069,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 									  orderbyclauses,
 									  orderbyclausecols,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  index_is_ordered ?
 									  ForwardScanDirection :
 									  NoMovementScanDirection,
@@ -1093,11 +1100,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 													index_pathkeys);
 		if (useful_pathkeys != NIL)
 		{
+			if (has_useful_uniquekeys(root))
+				useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 			ipath = create_index_path(root, index,
 									  index_clauses,
 									  NIL,
 									  NIL,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  BackwardScanDirection,
 									  index_only_scan,
 									  outer_relids,
@@ -1115,6 +1126,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 										  NIL,
 										  NIL,
 										  useful_pathkeys,
+										  useful_uniquekeys,
 										  BackwardScanDirection,
 										  index_only_scan,
 										  outer_relids,
@@ -3365,6 +3377,35 @@ match_clause_to_ordering_op(IndexOptInfo *index,
 	return clause;
 }
 
+/*
+ * get_uniquekeys_for_index
+ */
+static List *
+get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys)
+{
+	ListCell *lc;
+
+	if (pathkeys)
+	{
+		List *uniquekeys = NIL;
+		foreach(lc, pathkeys)
+		{
+			UniqueKey *unique_key;
+			PathKey *pk = (PathKey *) lfirst(lc);
+			EquivalenceClass *ec = (EquivalenceClass *) pk->pk_eclass;
+
+			unique_key = makeNode(UniqueKey);
+			unique_key->eq_clause = ec;
+
+			lappend(uniquekeys, unique_key);
+		}
+
+		if (uniquekeys_contained_in(root->canon_uniquekeys, uniquekeys))
+			return uniquekeys;
+	}
+
+	return NIL;
+}
 
 /****************************************************************************
  *				----  ROUTINES TO DO PARTIAL INDEX PREDICATE TESTS	----
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 71b9d42c99..054df9a617 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -29,6 +29,7 @@
 #include "utils/lsyscache.h"
 
 
+static bool pathkey_is_unique(PathKey *new_pathkey, List *pathkeys);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
 											 RelOptInfo *partrel,
@@ -96,6 +97,29 @@ make_canonical_pathkey(PlannerInfo *root,
 	return pk;
 }
 
+/*
+ * pathkey_is_unique
+ *	   Checks if the new pathkey's equivalence class is the same as that of
+ *     any existing member of the pathkey list.
+ */
+static bool
+pathkey_is_unique(PathKey *new_pathkey, List *pathkeys)
+{
+	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
+	ListCell   *lc;
+
+	/* If same EC already is already in the list, then not unique */
+	foreach(lc, pathkeys)
+	{
+		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
+
+		if (new_ec == old_pathkey->pk_eclass)
+			return false;
+	}
+
+	return true;
+}
+
 /*
  * pathkey_is_redundant
  *	   Is a pathkey redundant with one already in the given list?
@@ -135,22 +159,12 @@ static bool
 pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys)
 {
 	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
-	ListCell   *lc;
 
 	/* Check for EC containing a constant --- unconditionally redundant */
 	if (EC_MUST_BE_REDUNDANT(new_ec))
 		return true;
 
-	/* If same EC already used in list, then redundant */
-	foreach(lc, pathkeys)
-	{
-		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
-
-		if (new_ec == old_pathkey->pk_eclass)
-			return true;
-	}
-
-	return false;
+	return !pathkey_is_unique(new_pathkey, pathkeys);
 }
 
 /*
@@ -1098,6 +1112,41 @@ make_pathkeys_for_sortclauses(PlannerInfo *root,
 	return pathkeys;
 }
 
+/*
+ * make_pathkeys_for_uniquekeyclauses
+ *		Generate a pathkeys list to be used for uniquekey clauses
+ */
+List *
+make_pathkeys_for_uniquekeys(PlannerInfo *root,
+							 List *sortclauses,
+							 List *tlist)
+{
+	List	   *pathkeys = NIL;
+	ListCell   *l;
+
+	foreach(l, sortclauses)
+	{
+		SortGroupClause *sortcl = (SortGroupClause *) lfirst(l);
+		Expr	   *sortkey;
+		PathKey    *pathkey;
+
+		sortkey = (Expr *) get_sortgroupclause_expr(sortcl, tlist);
+		Assert(OidIsValid(sortcl->sortop));
+		pathkey = make_pathkey_from_sortop(root,
+										   sortkey,
+										   root->nullable_baserels,
+										   sortcl->sortop,
+										   sortcl->nulls_first,
+										   sortcl->tleSortGroupRef,
+										   true);
+
+		if (pathkey_is_unique(pathkey, pathkeys))
+			pathkeys = lappend(pathkeys, pathkey);
+	}
+
+	return pathkeys;
+}
+
 /****************************************************************************
  *		PATHKEYS AND MERGECLAUSES
  ****************************************************************************/
diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c
new file mode 100644
index 0000000000..13d4ebb98c
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekey.c
@@ -0,0 +1,147 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekey.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekey.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "nodes/pg_list.h"
+
+static UniqueKey *make_canonical_uniquekey(PlannerInfo *root, EquivalenceClass *eclass);
+
+/*
+ * Build a list of unique keys
+ */
+List*
+build_uniquekeys(PlannerInfo *root, List *sortclauses)
+{
+	List *result = NIL;
+	List *sortkeys;
+	ListCell *l;
+
+	sortkeys = make_pathkeys_for_uniquekeys(root,
+											sortclauses,
+											root->processed_tlist);
+
+	/* Create a uniquekey and add it to the list */
+	foreach(l, sortkeys)
+	{
+		PathKey    *pathkey = (PathKey *) lfirst(l);
+		EquivalenceClass *ec = pathkey->pk_eclass;
+		UniqueKey *unique_key = make_canonical_uniquekey(root, ec);
+
+		result = lappend(result, unique_key);
+	}
+
+	return result;
+}
+
+/*
+ * uniquekeys_contained_in
+ *	  Are the keys2 included in the keys1 superset
+ */
+bool
+uniquekeys_contained_in(List *keys1, List *keys2)
+{
+	ListCell   *key1,
+			   *key2;
+
+	/*
+	 * Fall out quickly if we are passed two identical lists.  This mostly
+	 * catches the case where both are NIL, but that's common enough to
+	 * warrant the test.
+	 */
+	if (keys1 == keys2)
+		return true;
+
+	foreach(key2, keys2)
+	{
+		bool found = false;
+		UniqueKey  *uniquekey2 = (UniqueKey *) lfirst(key2);
+
+		foreach(key1, keys1)
+		{
+			UniqueKey  *uniquekey1 = (UniqueKey *) lfirst(key1);
+
+			if (uniquekey1->eq_clause == uniquekey2->eq_clause)
+			{
+				found = true;
+				break;
+			}
+		}
+
+		if (!found)
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * has_useful_uniquekeys
+ *		Detect whether the planner could have any uniquekeys that are
+ *		useful.
+ */
+bool
+has_useful_uniquekeys(PlannerInfo *root)
+{
+	if (root->query_uniquekeys != NIL)
+		return true;	/* there are some */
+	return false;		/* definitely useless */
+}
+
+/*
+ * make_canonical_uniquekey
+ *	  Given the parameters for a UniqueKey, find any pre-existing matching
+ *	  uniquekey in the query's list of "canonical" uniquekeys.  Make a new
+ *	  entry if there's not one already.
+ *
+ * Note that this function must not be used until after we have completed
+ * merging EquivalenceClasses.  (We don't try to enforce that here; instead,
+ * equivclass.c will complain if a merge occurs after root->canon_uniquekeys
+ * has become nonempty.)
+ */
+static UniqueKey *
+make_canonical_uniquekey(PlannerInfo *root,
+						 EquivalenceClass *eclass)
+{
+	UniqueKey  *uk;
+	ListCell   *lc;
+	MemoryContext oldcontext;
+
+	/* The passed eclass might be non-canonical, so chase up to the top */
+	while (eclass->ec_merged)
+		eclass = eclass->ec_merged;
+
+	foreach(lc, root->canon_uniquekeys)
+	{
+		uk = (UniqueKey *) lfirst(lc);
+		if (eclass == uk->eq_clause)
+			return uk;
+	}
+
+	/*
+	 * Be sure canonical uniquekeys are allocated in the main planning context.
+	 * Not an issue in normal planning, but it is for GEQO.
+	 */
+	oldcontext = MemoryContextSwitchTo(root->planner_cxt);
+
+	uk = makeNode(UniqueKey);
+	uk->eq_clause = eclass;
+
+	root->canon_uniquekeys = lappend(root->canon_uniquekeys, uk);
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return uk;
+}
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 8634940efc..dd64775d8f 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -511,6 +511,7 @@ minmax_qp_callback(PlannerInfo *root, void *extra)
 									  root->parse->targetList);
 
 	root->query_pathkeys = root->sort_pathkeys;
+	root->query_uniquekeys = NIL;
 }
 
 /*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..3a372af91b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -70,6 +70,7 @@ query_planner(PlannerInfo *root,
 	root->join_rel_level = NULL;
 	root->join_cur_level = 0;
 	root->canon_pathkeys = NIL;
+	root->canon_uniquekeys = NIL;
 	root->left_join_clauses = NIL;
 	root->right_join_clauses = NIL;
 	root->full_join_clauses = NIL;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d6f2153593..984fca0696 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3657,15 +3657,30 @@ standard_qp_callback(PlannerInfo *root, void *extra)
 	 * much easier, since we know that the parser ensured that one is a
 	 * superset of the other.
 	 */
+	root->query_uniquekeys = NIL;
+
 	if (root->group_pathkeys)
+	{
 		root->query_pathkeys = root->group_pathkeys;
+
+		if (!root->parse->hasAggs)
+			root->query_uniquekeys = build_uniquekeys(root, qp_extra->groupClause);
+	}
 	else if (root->window_pathkeys)
 		root->query_pathkeys = root->window_pathkeys;
 	else if (list_length(root->distinct_pathkeys) >
 			 list_length(root->sort_pathkeys))
+	{
 		root->query_pathkeys = root->distinct_pathkeys;
+		root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else if (root->sort_pathkeys)
+	{
 		root->query_pathkeys = root->sort_pathkeys;
+
+		if (root->distinct_pathkeys)
+			root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else
 		root->query_pathkeys = NIL;
 }
@@ -6222,7 +6237,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
 
 	/* Estimate the cost of index scan */
 	indexScanPath = create_index_path(root, indexInfo,
-									  NIL, NIL, NIL, NIL,
+									  NIL, NIL, NIL, NIL, NIL,
 									  ForwardScanDirection, false,
 									  NULL, 1.0, false);
 
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e6d08aede5..a006dbbe9c 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -940,6 +940,7 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = parallel_workers;
 	pathnode->pathkeys = NIL;	/* seqscan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_seqscan(pathnode, root, rel, pathnode->param_info);
 
@@ -964,6 +965,7 @@ create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* samplescan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_samplescan(pathnode, root, rel, pathnode->param_info);
 
@@ -1000,6 +1002,7 @@ create_index_path(PlannerInfo *root,
 				  List *indexorderbys,
 				  List *indexorderbycols,
 				  List *pathkeys,
+				  List *uniquekeys,
 				  ScanDirection indexscandir,
 				  bool indexonly,
 				  Relids required_outer,
@@ -1018,6 +1021,7 @@ create_index_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = 0;
 	pathnode->path.pathkeys = pathkeys;
+	pathnode->path.uniquekeys = uniquekeys;
 
 	pathnode->indexinfo = index;
 	pathnode->indexclauses = indexclauses;
@@ -1061,6 +1065,7 @@ create_bitmap_heap_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_degree;
 	pathnode->path.pathkeys = NIL;	/* always unordered */
+	pathnode->path.uniquekeys = NIL;
 
 	pathnode->bitmapqual = bitmapqual;
 
@@ -1922,6 +1927,7 @@ create_functionscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = pathkeys;
+	pathnode->uniquekeys = NIL;
 
 	cost_functionscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1948,6 +1954,7 @@ create_tablefuncscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_tablefuncscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1974,6 +1981,7 @@ create_valuesscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_valuesscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1999,6 +2007,7 @@ create_ctescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* XXX for now, result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2025,6 +2034,7 @@ create_namedtuplestorescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_namedtuplestorescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2051,6 +2061,7 @@ create_resultscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_resultscan(pathnode, root, rel, pathnode->param_info);
 
@@ -2077,6 +2088,7 @@ create_worktablescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	/* Cost is the same as for a regular CTE scan */
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index baced7eec0..a1511b46ea 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 3d3be197e0..4e329f0fb5 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -269,6 +269,8 @@ struct PlannerInfo
 
 	List	   *canon_pathkeys; /* list of "canonical" PathKeys */
 
+	List	   *canon_uniquekeys; /* list of "canonical" UniqueKeys */
+
 	List	   *left_join_clauses;	/* list of RestrictInfos for mergejoinable
 									 * outer join clauses w/nonnullable var on
 									 * left */
@@ -297,6 +299,8 @@ struct PlannerInfo
 
 	List	   *query_pathkeys; /* desired pathkeys for query_planner() */
 
+	List	   *query_uniquekeys; /* unique keys used for the query */
+
 	List	   *group_pathkeys; /* groupClause pathkeys, if any */
 	List	   *window_pathkeys;	/* pathkeys of bottom window, if any */
 	List	   *distinct_pathkeys;	/* distinctClause pathkeys, if any */
@@ -1077,6 +1081,15 @@ typedef struct ParamPathInfo
 	List	   *ppi_clauses;	/* join clauses available from outer rels */
 } ParamPathInfo;
 
+/*
+ * UniqueKey
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+
+	EquivalenceClass *eq_clause;	/* equivalence class */
+} UniqueKey;
 
 /*
  * Type "Path" is used as-is for sequential-scan paths, as well as some other
@@ -1106,6 +1119,9 @@ typedef struct ParamPathInfo
  *
  * "pathkeys" is a List of PathKey nodes (see above), describing the sort
  * ordering of the path's output rows.
+ *
+ * "uniquekeys", if not NIL, is a list of UniqueKey nodes (see above),
+ * describing the XXX.
  */
 typedef struct Path
 {
@@ -1129,6 +1145,8 @@ typedef struct Path
 
 	List	   *pathkeys;		/* sort ordering of path's output */
 	/* pathkeys is a List of PathKey nodes; see above */
+
+	List	   *uniquekeys;	/* the unique keys, or NIL if none */
 } Path;
 
 /* Macro for extracting a path's parameterization relids; beware double eval */
diff --git a/src/include/nodes/print.h b/src/include/nodes/print.h
index 6126b491bf..006248bfb5 100644
--- a/src/include/nodes/print.h
+++ b/src/include/nodes/print.h
@@ -28,6 +28,7 @@ extern char *pretty_format_node_dump(const char *dump);
 extern void print_rt(const List *rtable);
 extern void print_expr(const Node *expr, const List *rtable);
 extern void print_pathkeys(const List *pathkeys, const List *rtable);
+extern void print_uniquekeys(const List *uniquekeys, const List *rtable);
 extern void print_tl(const List *tlist, const List *rtable);
 extern void print_slot(TupleTableSlot *slot);
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e450fe112a..f75ff6f323 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -44,6 +44,7 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									List *indexorderbys,
 									List *indexorderbycols,
 									List *pathkeys,
+									List *uniquekeys,
 									ScanDirection indexscandir,
 									bool indexonly,
 									Relids required_outer,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ab73bd20c..5b6be383b3 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -214,6 +214,9 @@ extern List *build_join_pathkeys(PlannerInfo *root,
 extern List *make_pathkeys_for_sortclauses(PlannerInfo *root,
 										   List *sortclauses,
 										   List *tlist);
+extern List *make_pathkeys_for_uniquekeys(PlannerInfo *root,
+										  List *sortclauses,
+										  List *tlist);
 extern void initialize_mergeclause_eclasses(PlannerInfo *root,
 											RestrictInfo *restrictinfo);
 extern void update_mergeclause_eclasses(PlannerInfo *root,
@@ -240,4 +243,12 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+/*
+ * uniquekey.c
+ *	  Utilities for matching and building unique keys
+ */
+extern List *build_uniquekeys(PlannerInfo *root, List *sortclauses);
+extern bool uniquekeys_contained_in(List *keys1, List *keys2);
+extern bool has_useful_uniquekeys(PlannerInfo *root);
+
 #endif							/* PATHS_H */
-- 
2.21.0

v32-0002-Index-skip-scan-visibility-check.patchtext/x-diff; charset=us-asciiDownload

From 1345f11ca221219a57a98552773963b4c3ceeac0 Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Fri, 15 Nov 2019 09:46:53 -0500
Subject: [PATCH v32 2/2] Index skip scan

Implementation of Index Skip Scan (see Loose Index Scan in the wiki [1])
on top of IndexOnlyScan and IndexScan. To make it suitable for both
situations when there are small number of distinct values and
significant amount of distinct values the following approach is taken -
instead of searching from the root for every value we're searching for
then first on the current page, and then if not found continue searching
from the root.

Original patch and design were proposed by Thomas Munro [2], revived and
improved by Dmitry Dolgov and Jesper Pedersen.

[1] https://wiki.postgresql.org/wiki/Loose_indexscan
[2] https://www.postgresql.org/message-id/flat/CADLWmXXbTSBxP-MzJuPAYSsL_2f0iPm5VWPbCvDbVvfX93FKkw%40mail.gmail.com

Author: Jesper Pedersen, Dmitry Dolgov
Reviewed-by: Thomas Munro, David Rowley, Floris Van Nee, Kyotaro Horiguchi, Tomas Vondra, Peter Geoghegan
---
 contrib/bloom/blutils.c                       |   1 +
 doc/src/sgml/config.sgml                      |  15 +
 doc/src/sgml/indexam.sgml                     |  63 ++
 doc/src/sgml/indices.sgml                     |  23 +
 src/backend/access/brin/brin.c                |   1 +
 src/backend/access/gin/ginutil.c              |   1 +
 src/backend/access/gist/gist.c                |   1 +
 src/backend/access/hash/hash.c                |   1 +
 src/backend/access/index/indexam.c            |  18 +
 src/backend/access/nbtree/nbtree.c            |  13 +
 src/backend/access/nbtree/nbtsearch.c         | 466 +++++++++++++-
 src/backend/access/spgist/spgutils.c          |   1 +
 src/backend/commands/explain.c                |  25 +
 src/backend/executor/nodeIndexonlyscan.c      |  97 ++-
 src/backend/executor/nodeIndexscan.c          |  56 +-
 src/backend/nodes/copyfuncs.c                 |   2 +
 src/backend/nodes/outfuncs.c                  |   2 +
 src/backend/nodes/readfuncs.c                 |   2 +
 src/backend/optimizer/path/costsize.c         |   1 +
 src/backend/optimizer/plan/createplan.c       |  20 +-
 src/backend/optimizer/plan/planner.c          |  76 +++
 src/backend/optimizer/util/pathnode.c         |  39 ++
 src/backend/optimizer/util/plancat.c          |   1 +
 src/backend/utils/misc/guc.c                  |   9 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/amapi.h                    |   8 +
 src/include/access/genam.h                    |   2 +
 src/include/access/nbtree.h                   |   7 +
 src/include/access/sdir.h                     |   7 +
 src/include/nodes/execnodes.h                 |   6 +
 src/include/nodes/pathnodes.h                 |   5 +
 src/include/nodes/plannodes.h                 |   4 +
 src/include/optimizer/cost.h                  |   1 +
 src/include/optimizer/pathnode.h              |   5 +
 src/test/regress/expected/select_distinct.out | 601 ++++++++++++++++++
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/sql/select_distinct.sql      | 248 ++++++++
 37 files changed, 1820 insertions(+), 12 deletions(-)

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 0104d02f67..a018b7f3d0 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -133,6 +133,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = blbulkdelete;
 	amroutine->amvacuumcleanup = blvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = blcostestimate;
 	amroutine->amoptions = bloptions;
 	amroutine->amproperty = NULL;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e07dc01e80..36ba75b077 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4517,6 +4517,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-indexskipscan" xreflabel="enable_indexskipscan">
+      <term><varname>enable_indexskipscan</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_indexskipscan</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of index-skip-scan plan
+        types (see <xref linkend="indexes-index-skip-scans"/>). The default is
+        <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-material" xreflabel="enable_material">
       <term><varname>enable_material</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 37f8d8760a..a726d80878 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -148,6 +148,7 @@ typedef struct IndexAmRoutine
     amendscan_function amendscan;
     ammarkpos_function ammarkpos;       /* can be NULL */
     amrestrpos_function amrestrpos;     /* can be NULL */
+    amskip_function amskip;             /* can be NULL */
 
     /* interface functions to support parallel index scans */
     amestimateparallelscan_function amestimateparallelscan;    /* can be NULL */
@@ -691,6 +692,68 @@ amrestrpos (IndexScanDesc scan);
 
   <para>
 <programlisting>
+bool
+amskip (IndexScanDesc scan,
+        ScanDirection direction,
+        ScanDirection indexdir,
+        bool scanstart,
+        int prefix);
+</programlisting>
+  Skip past all tuples where the first 'prefix' columns have the same value as
+  the last tuple returned in the current scan. The arguments are:
+
+   <variablelist>
+    <varlistentry>
+     <term><parameter>scan</parameter></term>
+     <listitem>
+      <para>
+       Index scan information
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>direction</parameter></term>
+     <listitem>
+      <para>
+       The direction in which data is advancing.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>indexdir</parameter></term>
+     <listitem>
+      <para>
+        The index direction, in which data must be read.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>scanstart</parameter></term>
+     <listitem>
+      <para>
+        Whether or not it is a start of the scan.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>prefix</parameter></term>
+     <listitem>
+      <para>
+        Distinct prefix size.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+
+  </para>
+
+  <para>
+<programlisting>
 Size
 amestimateparallelscan (void);
 </programlisting>
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index c54bf0dbbd..c429d98fc7 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1254,6 +1254,29 @@ SELECT target FROM tests WHERE subject = 'some-subject' AND success;
    and later will recognize such cases and allow index-only scans to be
    generated, but older versions will not.
   </para>
+
+  <sect2 id="indexes-index-skip-scans">
+    <title>Index Skip Scans</title>
+
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index</primary>
+      <secondary>index-skip scans</secondary>
+    </indexterm>
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index-skip scan</primary>
+    </indexterm>
+
+    <para>
+     When the rows retrieved from an index scan are then deduplicated by
+     eliminating rows matching on a prefix of index keys (e.g. when using
+     <literal>SELECT DISTINCT</literal>), the planner will consider
+     skipping groups of rows with a matching key prefix. When a row with
+     a particular prefix is found, remaining rows with the same key prefix
+     are skipped.  The larger the number of rows with the same key prefix
+     rows (i.e. the lower the number of distinct key prefixes in the index),
+     the more efficient this is.
+    </para>
+  </sect2>
  </sect1>
 
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2e8f67ef10..4db31bb211 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -113,6 +113,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = brinbulkdelete;
 	amroutine->amvacuumcleanup = brinvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = brincostestimate;
 	amroutine->amoptions = brinoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index a7e55caf28..8dd1d30d2a 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -65,6 +65,7 @@ ginhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = ginbulkdelete;
 	amroutine->amvacuumcleanup = ginvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = gincostestimate;
 	amroutine->amoptions = ginoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index aefc302ed2..8c692f7fb4 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -86,6 +86,7 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = gistbulkdelete;
 	amroutine->amvacuumcleanup = gistvacuumcleanup;
 	amroutine->amcanreturn = gistcanreturn;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = gistcostestimate;
 	amroutine->amoptions = gistoptions;
 	amroutine->amproperty = gistproperty;
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 4871b7ff4d..e5fa4c7864 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -83,6 +83,7 @@ hashhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = hashbulkdelete;
 	amroutine->amvacuumcleanup = hashvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = hashcostestimate;
 	amroutine->amoptions = hashoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 01539b6bd6..1047a35ade 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -33,6 +33,7 @@
  *		index_can_return	- does index support index-only scans?
  *		index_getprocid - get a support procedure OID
  *		index_getprocinfo - get a support procedure's lookup info
+ *		index_skip		- advance past duplicate key values in a scan
  *
  * NOTES
  *		This file contains the index_ routines which used
@@ -730,6 +731,23 @@ index_can_return(Relation indexRelation, int attno)
 	return indexRelation->rd_indam->amcanreturn(indexRelation, attno);
 }
 
+/* ----------------
+ *		index_skip
+ *
+ *		Skip past all tuples where the first 'prefix' columns have the
+ *		same value as the last tuple returned in the current scan.
+ * ----------------
+ */
+bool
+index_skip(IndexScanDesc scan, ScanDirection direction,
+		   ScanDirection indexdir, bool scanstart, int prefix)
+{
+	SCAN_CHECKS;
+
+	return scan->indexRelation->rd_indam->amskip(scan, direction,
+												 indexdir, scanstart, prefix);
+}
+
 /* ----------------
  *		index_getprocid
  *
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 5254bc7ef5..8fde56fe60 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -132,6 +132,7 @@ bthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = btbulkdelete;
 	amroutine->amvacuumcleanup = btvacuumcleanup;
 	amroutine->amcanreturn = btcanreturn;
+	amroutine->amskip = btskip;
 	amroutine->amcostestimate = btcostestimate;
 	amroutine->amoptions = btoptions;
 	amroutine->amproperty = btproperty;
@@ -381,6 +382,8 @@ btbeginscan(Relation rel, int nkeys, int norderbys)
 	 */
 	so->currTuples = so->markTuples = NULL;
 
+	so->skipScanKey = NULL;
+
 	scan->xs_itupdesc = RelationGetDescr(rel);
 
 	scan->opaque = so;
@@ -448,6 +451,16 @@ btrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 	_bt_preprocess_array_keys(scan);
 }
 
+/*
+ * btskip() -- skip to the beginning of the next key prefix
+ */
+bool
+btskip(IndexScanDesc scan, ScanDirection direction,
+	   ScanDirection indexdir, bool start, int prefix)
+{
+	return _bt_skip(scan, direction, indexdir, start, prefix);
+}
+
 /*
  *	btendscan() -- close down a scan
  */
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index c573814f01..53518c6fdb 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -37,7 +37,10 @@ static bool _bt_parallel_readpage(IndexScanDesc scan, BlockNumber blkno,
 static Buffer _bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot);
 static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir);
 static inline void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir);
-
+static inline void _bt_update_skip_scankeys(IndexScanDesc scan,
+											Relation indexRel);
+static inline bool _bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+										Buffer buf, ScanDirection dir);
 
 /*
  *	_bt_drop_lock_and_maybe_pin()
@@ -1375,6 +1378,416 @@ _bt_next(IndexScanDesc scan, ScanDirection dir)
 	return true;
 }
 
+/*
+ *  _bt_skip() -- Skip items that have the same prefix as the most recently
+ * 				  fetched index tuple.
+ *
+ * 		The current position is set so that a subsequent call to _bt_next will
+ * 		fetch the first tuple that differs in the leading 'prefix' keys.
+ *
+ * 		There are four different kinds of skipping (depending on dir and
+ * 		indexdir, that are important to distinguish, especially in the presense
+ * 		of an index condition:
+ *
+ * 		* Advancing forward and reading forward
+ * 			simple scan
+ *
+ * 		* Advancing forward and reading backward
+ * 			scan inside a cursor fetching backward, when skipping is necessary
+ * 			right from the start
+ *
+ * 		* Advancing backward and reading forward
+ * 			scan with order by desc inside a cursor fetching forward, when
+ * 			skipping is necessary right from the start
+ *
+ * 		* Advancing backward and reading backward
+ * 			simple scan with order by desc
+ *
+ *      The current page is searched for the next unique value. If none is found
+ *      we will do a scan from the root in order to find the next page with
+ *      a unique value.
+ */
+bool
+_bt_skip(IndexScanDesc scan, ScanDirection dir,
+		 ScanDirection indexdir, bool scanstart, int prefix)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+	BTStack stack;
+	Buffer buf;
+	OffsetNumber offnum;
+	BTScanPosItem *currItem;
+	Relation 	 indexRel = scan->indexRelation;
+
+	/* We want to return tuples, and we need a starting point */
+	Assert(scan->xs_want_itup);
+	Assert(scan->xs_itup);
+
+	if (so->numKilled > 0)
+		_bt_killitems(scan);
+
+	/* If skipScanKey is NULL then we initialize it with _bt_mkscankey */
+	if (so->skipScanKey == NULL)
+	{
+		so->skipScanKey = _bt_mkscankey(indexRel, scan->xs_itup);
+		so->skipScanKey->keysz = prefix;
+		so->skipScanKey->scantid = NULL;
+	}
+	so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+	_bt_update_skip_scankeys(scan, indexRel);
+
+	/* Check if the next unique key can be found within the current page.
+	 * Since we do not lock the current page between jumps, it's possible
+	 * that it was splitted since the last time we saw it. This is fine in
+	 * case of scanning forward, since page split to the right and we are
+	 * still on the left most page. In case of scanning backwards it's
+	 * possible to loose some pages and we need to remember the previous
+	 * page, and then follow the right link from the current page until we
+	 * find the original one.
+	 *
+	 * Since the whole idea of checking the current page is to protect
+	 * ourselves and make more performant statistic mismatch case when
+	 * there are too many distinct values for jumping, it's not clear if
+	 * the complexity of this solution in case of backward scan is
+	 * justified, so for now just avoid it.
+	 */
+	if (BufferIsValid(so->currPos.buf) && ScanDirectionIsForward(dir))
+	{
+		LockBuffer(so->currPos.buf, BT_READ);
+
+		if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
+		{
+			bool keyFound = false;
+
+			offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, so->currPos.buf);
+
+			/* Lock the page for SERIALIZABLE transactions */
+			PredicateLockPage(scan->indexRelation, BufferGetBlockNumber(so->currPos.buf),
+							  scan->xs_snapshot);
+
+			/* We know in which direction to look */
+			_bt_initialize_more_data(so, dir);
+
+			/* Now read the data */
+			keyFound = _bt_readpage(scan, dir, offnum);
+
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			ReleaseBuffer(so->currPos.buf);
+			so->currPos.buf = InvalidBuffer;
+
+			if (keyFound)
+			{
+				/* set IndexTuple */
+				currItem = &so->currPos.items[so->currPos.itemIndex];
+				scan->xs_heaptid = currItem->heapTid;
+				scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+				return true;
+			}
+		}
+		else
+		{
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		}
+	}
+
+	if (BufferIsValid(so->currPos.buf))
+	{
+		ReleaseBuffer(so->currPos.buf);
+		so->currPos.buf = InvalidBuffer;
+	}
+
+	/*
+	 * We haven't found scan key within the current page, so let's scan from
+	 * the root. Use _bt_search and _bt_binsrch to get the buffer and offset
+	 * number
+	 */
+	so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+	stack = _bt_search(scan->indexRelation, so->skipScanKey,
+					   &buf, BT_READ, scan->xs_snapshot);
+	_bt_freestack(stack);
+	so->currPos.buf = buf;
+	offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+
+	/* Lock the page for SERIALIZABLE transactions */
+	PredicateLockPage(scan->indexRelation, BufferGetBlockNumber(buf),
+					  scan->xs_snapshot);
+
+	/* We know in which direction to look */
+	_bt_initialize_more_data(so, dir);
+
+	/*
+	 * Simplest case is when both directions are forward, when we are already
+	 * at the next distinct key at the beginning of the series (so everything
+	 * else would be done in _bt_readpage)
+	 *
+	 * The case when both directions are backwards is also simple, but we need
+	 * to go one step back, since we need a last element from the previous
+	 * series.
+	 */
+	if (ScanDirectionIsBackward(dir) && ScanDirectionIsBackward(indexdir))
+		 offnum = OffsetNumberPrev(offnum);
+
+	/*
+	 * Andvance backward but read forward. At this moment we are at the next
+	 * distinct key at the beginning of the series. In case if scan just
+	 * started, we can read forward without doing anything else. Otherwise
+	 * find previous distinct key and the beginning of it's series and read
+	 * forward from there. To do so, go back one step, perform binary search
+	 * to find the first item in the series and let _bt_readpage do everything
+	 * else.
+	 */
+	else if (ScanDirectionIsBackward(dir) && ScanDirectionIsForward(indexdir))
+	{
+		if (!scanstart)
+		{
+			/* Reading forward means we expect to see more data on the right */
+			so->currPos.moreRight = true;
+
+			offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+
+			/* One step back to find a previous value */
+			_bt_readpage(scan, dir, offnum);
+
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			if (_bt_next(scan, dir))
+			{
+				LockBuffer(so->currPos.buf, BT_READ);
+				_bt_update_skip_scankeys(scan, indexRel);
+
+				/*
+				 * And now find the last item from the sequence for the
+				 * current, value with the intention do OffsetNumberNext. As a
+				 * result we end up on a first element from the sequence.
+				 */
+				if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
+					offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				else
+				{
+					if (BufferIsValid(so->currPos.buf))
+					{
+						/* Before leaving current page, deal with any killed items */
+						if (so->numKilled > 0)
+							_bt_killitems(scan);
+
+						LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+						ReleaseBuffer(so->currPos.buf);
+						so->currPos.buf = InvalidBuffer;
+					}
+
+					stack = _bt_search(scan->indexRelation, so->skipScanKey,
+									   &buf, BT_READ, scan->xs_snapshot);
+					_bt_freestack(stack);
+					so->currPos.buf = buf;
+					offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				}
+			}
+			else
+			{
+				pfree(so->skipScanKey);
+				so->skipScanKey = NULL;
+				return false;
+			}
+		}
+	}
+
+	/*
+	 * Advance forward but read backward. At this moment we are at the next
+	 * distinct key at the beginning of the series. In case if scan just
+	 * started, we can go one step back and read forward without doing
+	 * anything else. Otherwise find the next distinct key and the beginning
+	 * of it's series, go one step back and read backward from there.
+	 *
+	 * An interesting situation can happen if one of distinct keys do not pass
+	 * a corresponding index condition at all. In this case reading backward
+	 * can lead to a previous distinct key being found, creating a loop. To
+	 * avoid that check the value to be returned, and jump one more time if
+	 * it's the same as at the beginning. Note that we do not check visibility
+	 * here, and dead tuples could also lead to the same situation. This has to
+	 * be checked on the caller side.
+	 */
+	else if (ScanDirectionIsForward(dir) && ScanDirectionIsBackward(indexdir))
+	{
+		if (scanstart)
+			offnum = OffsetNumberPrev(offnum);
+		else
+		{
+			OffsetNumber nextOffset,
+						startOffset,
+						jumpOffset;
+
+			IndexTuple startItup = CopyIndexTuple(scan->xs_itup);
+			Page page = BufferGetPage(so->currPos.buf);
+
+			/* We are at the end and need to return */
+			if ((offnum > PageGetMaxOffsetNumber(page)) &
+				(so->currPos.nextPage == P_NONE))
+			{
+				LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+
+				BTScanPosUnpinIfPinned(so->currPos);
+				BTScanPosInvalidate(so->currPos)
+
+				pfree(so->skipScanKey);
+				so->skipScanKey = NULL;
+				return false;
+			}
+
+			nextOffset = startOffset = ItemPointerGetOffsetNumber(&scan->xs_itup->t_tid);
+
+			/* Reading backwards means we expect to see more data on the left */
+			so->currPos.moreLeft = true;
+
+			while (nextOffset == startOffset)
+			{
+				IndexTuple itup;
+				CHECK_FOR_INTERRUPTS();
+
+				/*
+				 * Find a next index tuple to update scan key. It could be at
+				 * the end, so check for max offset
+				 */
+				if (!_bt_readpage(scan, ForwardScanDirection, offnum))
+				{
+					/*
+					 * There's no actually-matching data on this page.  Try to
+					 * advance to the next page. Return false if there's no
+					 * matching data at all.
+					 */
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+					if (!_bt_steppage(scan, dir))
+					{
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+					LockBuffer(so->currPos.buf, BT_READ);
+				}
+
+				currItem = &so->currPos.items[so->currPos.firstItem];
+				itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+
+				scan->xs_itup = itup;
+				so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+
+				_bt_update_skip_scankeys(scan, indexRel);
+				if (BufferIsValid(so->currPos.buf))
+				{
+					/* Before leaving current page, deal with any killed items */
+					if (so->numKilled > 0)
+						_bt_killitems(scan);
+
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+					ReleaseBuffer(so->currPos.buf);
+					so->currPos.buf = InvalidBuffer;
+				}
+
+				stack = _bt_search(scan->indexRelation, so->skipScanKey,
+								   &buf, BT_READ, scan->xs_snapshot);
+				_bt_freestack(stack);
+				so->currPos.buf = buf;
+				jumpOffset = offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				offnum = OffsetNumberPrev(offnum);
+
+				if (!_bt_readpage(scan, indexdir, offnum))
+				{
+					/*
+					 * There's no actually-matching data on this page.  Try to
+					 * advance to the next page. Return false if there's no
+					 * matching data at all.
+					 */
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+					if (!_bt_steppage(scan, indexdir))
+					{
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+					LockBuffer(so->currPos.buf, BT_READ);
+				}
+
+				currItem = &so->currPos.items[so->currPos.lastItem];
+				itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+				nextOffset = ItemPointerGetOffsetNumber(&itup->t_tid);
+
+				/*
+				 * To check if we returned the same tuple, try to find a
+				 * startItup on the current page. For that we need to update
+				 * scankey to match the whole tuple and set nextkey to return
+				 * an exact tuple, not the next one. If the nextOffset is the
+				 * same as before, it means we are in the loop, return offnum
+				 * to the original position and jump further
+				 */
+				scan->xs_itup = startItup;
+				_bt_update_skip_scankeys(scan, indexRel);
+
+				so->skipScanKey->keysz = IndexRelationGetNumberOfKeyAttributes(indexRel);
+				so->skipScanKey->nextkey = false;
+
+				if (_bt_scankey_within_page(scan, so->skipScanKey,
+											so->currPos.buf, dir))
+				{
+					OffsetNumber maxoff;
+					startOffset = _bt_binsrch(scan->indexRelation,
+											  so->skipScanKey,
+											  so->currPos.buf);
+
+					page = BufferGetPage(so->currPos.buf);
+					maxoff = PageGetMaxOffsetNumber(page);
+
+					if (nextOffset <= startOffset)
+					{
+						offnum = jumpOffset;
+						nextOffset = startOffset;
+					}
+
+					if ((offnum > maxoff) & (so->currPos.nextPage == P_NONE))
+					{
+						LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+
+						BTScanPosUnpinIfPinned(so->currPos);
+						BTScanPosInvalidate(so->currPos)
+
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+				}
+
+				/* Return original scankey options */
+				so->skipScanKey->keysz = prefix;
+				so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+			}
+		}
+	}
+
+	/* Now read the data */
+	if (!_bt_readpage(scan, indexdir, offnum))
+	{
+		/*
+		 * There's no actually-matching data on this page.  Try to advance to
+		 * the next page.  Return false if there's no matching data at all.
+		 */
+		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		if (!_bt_steppage(scan, dir))
+		{
+			pfree(so->skipScanKey);
+			so->skipScanKey = NULL;
+			return false;
+		}
+	}
+	else
+	{
+		/* Drop the lock, and maybe the pin, on the current page */
+		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+	}
+
+	/* And set IndexTuple */
+	currItem = &so->currPos.items[so->currPos.itemIndex];
+	scan->xs_heaptid = currItem->heapTid;
+	scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+
+	return true;
+}
+
 /*
  *	_bt_readpage() -- Load data from current index page into so->currPos
  *
@@ -2246,3 +2659,54 @@ _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir)
 	so->numKilled = 0;			/* just paranoia */
 	so->markItemIndex = -1;		/* ditto */
 }
+
+/*
+ * _bt_update_skip_scankeys() -- set up a new values for the existing scankeys
+ * 								 based on the current index tuple
+ */
+static inline void
+_bt_update_skip_scankeys(IndexScanDesc scan, Relation indexRel)
+{
+	TupleDesc		itupdesc;
+	int			indnkeyatts,
+				i;
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	ScanKey			scankeys = so->skipScanKey->scankeys;
+
+	itupdesc = RelationGetDescr(indexRel);
+	indnkeyatts = IndexRelationGetNumberOfKeyAttributes(indexRel);
+	for (i = 0; i < indnkeyatts; i++)
+	{
+		Datum datum;
+		bool null;
+		int flags;
+
+		datum = index_getattr(scan->xs_itup, i + 1, itupdesc, &null);
+		flags = (null ? SK_ISNULL : 0) |
+				(indexRel->rd_indoption[i] << SK_BT_INDOPTION_SHIFT);
+		scankeys[i].sk_flags = flags;
+		scankeys[i].sk_argument = datum;
+	}
+}
+
+/*
+ * _bt_scankey_within_page() -- check if the provided scankey could be found
+ * 								within a page, specified by the buffer.
+ */
+static inline bool
+_bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+						Buffer buf, ScanDirection dir)
+{
+	OffsetNumber low, high;
+	Page page = BufferGetPage(buf);
+	BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+
+	low = P_FIRSTDATAKEY(opaque);
+	high = PageGetMaxOffsetNumber(page);
+
+	if (unlikely(high < low))
+		return false;
+
+	return (_bt_compare(scan->indexRelation, key, page, low) > 0 &&
+			_bt_compare(scan->indexRelation, key, page, high) < 1);
+}
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 4924ae1c59..fa09a4685e 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -68,6 +68,7 @@ spghandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = spgbulkdelete;
 	amroutine->amvacuumcleanup = spgvacuumcleanup;
 	amroutine->amcanreturn = spgcanreturn;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = spgcostestimate;
 	amroutine->amoptions = spgoptions;
 	amroutine->amproperty = spgproperty;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c367c750b1..a7dd874531 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -141,6 +141,7 @@ static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
 static void ExplainIndentText(ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
+static void ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
 
 
@@ -1052,6 +1053,22 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 	return planstate_tree_walker(planstate, ExplainPreScanNode, rels_used);
 }
 
+/*
+ * ExplainIndexSkipScanKeys -
+ *	  Append information about index skip scan to es->str.
+ *
+ * Can be used to print the skip prefix size.
+ */
+static void
+ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es)
+{
+	if (skipPrefixSize > 0)
+	{
+		if (es->format != EXPLAIN_FORMAT_TEXT)
+			ExplainPropertyInteger("Distinct Prefix", NULL, skipPrefixSize, es);
+	}
+}
+
 /*
  * ExplainNode -
  *	  Appends a description of a plan tree to es->str
@@ -1386,6 +1403,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
 
+				ExplainIndexSkipScanKeys(indexscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexscan->indexid,
 										indexscan->indexorderdir,
 										es);
@@ -1396,6 +1415,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) plan;
 
+				ExplainIndexSkipScanKeys(indexonlyscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexonlyscan->indexid,
 										indexonlyscan->indexorderdir,
 										es);
@@ -1655,6 +1676,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	switch (nodeTag(plan))
 	{
 		case T_IndexScan:
+			if (((IndexScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyBool("Skip scan", true, es);
 			show_scan_qual(((IndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexScan *) plan)->indexqualorig)
@@ -1668,6 +1691,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			break;
 		case T_IndexOnlyScan:
+			if (((IndexOnlyScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyBool("Skip scan", true, es);
 			show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexOnlyScan *) plan)->indexqual)
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 5617ac29e7..c4e4b087a7 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -41,6 +41,7 @@
 #include "miscadmin.h"
 #include "storage/bufmgr.h"
 #include "storage/predicate.h"
+#include "storage/itemptr.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -62,9 +63,26 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	EState	   *estate;
 	ExprContext *econtext;
 	ScanDirection direction;
+	ScanDirection readDirection;
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
 	ItemPointer tid;
+	ItemPointerData startTid;
+	IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) node->ss.ps.plan;
+
+	/*
+	 * Tells if the current position was reached via skipping. In this case
+	 * there is no nead for the index_getnext_tid
+	 */
+	bool skipped = false;
+
+	/*
+	 * Index only scan must be aware that in case of skipping we can return to
+	 * the starting point due to visibility checks. In this situation we need
+	 * to jump further, and number of skipping attempts tell us how far do we
+	 * need to do so.
+	 */
+	int skipAttempts = 0;
 
 	/*
 	 * extract necessary information from index scan node
@@ -72,7 +90,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexOnlyScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexonlyscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -114,16 +132,87 @@ IndexOnlyNext(IndexOnlyScanState *node)
 						 node->ioss_OrderByKeys,
 						 node->ioss_NumOrderByKeys);
 	}
+	else
+	{
+		ItemPointerCopy(&scandesc->xs_heaptid, &startTid);
+	}
+
+	/*
+	 * Check if we need to skip to the next key prefix, because we've been
+	 * asked to implement DISTINCT.
+	 *
+	 * When fetching a cursor in the direction opposite to a general scan
+	 * direction, the result must be what normal fetching should have
+	 * returned, but in reversed order. In other words, return the last or
+	 * first scanned tuple in a DISTINCT set, depending on a cursor direction.
+	 * Due to that we skip also when the first tuple wasn't emitted yet, but
+	 * the directions are opposite.
+	 */
+	if (node->ioss_SkipPrefixSize > 0 &&
+		(node->ioss_FirstTupleEmitted ||
+		 ScanDirectionsAreOpposite(direction, indexonlyscan->indexorderdir)))
+	{
+		if (!index_skip(scandesc, direction, indexonlyscan->indexorderdir,
+						!node->ioss_FirstTupleEmitted, node->ioss_SkipPrefixSize))
+		{
+			/*
+			 * Reached end of index. At this point currPos is invalidated, and
+			 * we need to reset ioss_FirstTupleEmitted, since otherwise after
+			 * going backwards, reaching the end of index, and going forward
+			 * again we apply skip again. It would be incorrect and lead to an
+			 * extra skipped item.
+			 */
+			node->ioss_FirstTupleEmitted = false;
+			return ExecClearTuple(slot);
+		}
+		else
+		{
+			skipAttempts = 1;
+			skipped = true;
+			tid = &scandesc->xs_heaptid;
+		}
+	}
+
+	readDirection = skipped ? indexonlyscan->indexorderdir : direction;
 
 	/*
 	 * OK, now that we have what we need, fetch the next tuple.
 	 */
-	while ((tid = index_getnext_tid(scandesc, direction)) != NULL)
+	while (skipped || (tid = index_getnext_tid(scandesc, readDirection)) != NULL)
 	{
 		bool		tuple_from_heap = false;
 
 		CHECK_FOR_INTERRUPTS();
 
+		/*
+		 * While doing index only skip scan with advancing and reading in
+		 * different directions we can return to the same position where we
+		 * started after visibility check. Recognize such situations and skip
+		 * more.
+		 */
+		if ((readDirection != direction) &&
+			ItemPointerIsValid(&startTid) && ItemPointerEquals(&startTid, tid))
+		{
+			int i;
+			skipAttempts += 1;
+
+			for (i = 0; i < skipAttempts; i++)
+			{
+				if (!index_skip(scandesc, direction,
+								indexonlyscan->indexorderdir,
+								!node->ioss_FirstTupleEmitted,
+								node->ioss_SkipPrefixSize))
+				{
+					node->ioss_FirstTupleEmitted = false;
+					return ExecClearTuple(slot);
+				}
+			}
+
+			tid = &scandesc->xs_heaptid;
+		}
+
+		skipped = false;
+
 		/*
 		 * We can skip the heap fetch if the TID references a heap page on
 		 * which all tuples are known visible to everybody.  In any case,
@@ -250,6 +339,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 							  ItemPointerGetBlockNumber(tid),
 							  estate->es_snapshot);
 
+		node->ioss_FirstTupleEmitted = true;
+
 		return slot;
 	}
 
@@ -504,6 +595,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
 	indexstate->ss.ps.ExecProcNode = ExecIndexOnlyScan;
+	indexstate->ioss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->ioss_FirstTupleEmitted = false;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index d0a96a38e0..449aaec3ac 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -85,6 +85,13 @@ IndexNext(IndexScanState *node)
 	ScanDirection direction;
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
+	IndexScan *indexscan = (IndexScan *) node->ss.ps.plan;
+
+	/*
+	 * tells if the current position was reached via skipping. In this case
+	 * there is no nead for the index_getnext_tid
+	 */
+	bool skipped = false;
 
 	/*
 	 * extract necessary information from index scan node
@@ -92,7 +99,7 @@ IndexNext(IndexScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -117,6 +124,12 @@ IndexNext(IndexScanState *node)
 
 		node->iss_ScanDesc = scandesc;
 
+		/* Index skip scan assumes xs_want_itup, so set it to true */
+		if (indexscan->indexskipprefixsize > 0)
+			node->iss_ScanDesc->xs_want_itup = true;
+		else
+			node->iss_ScanDesc->xs_want_itup = false;
+
 		/*
 		 * If no run-time keys to calculate or they are ready, go ahead and
 		 * pass the scankeys to the index AM.
@@ -127,12 +140,48 @@ IndexNext(IndexScanState *node)
 						 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
 	}
 
+	/*
+	 * Check if we need to skip to the next key prefix, because we've been
+	 * asked to implement DISTINCT.
+	 *
+	 * When fetching a cursor in the direction opposite to a general scan
+	 * direction, the result must be what normal fetching should have
+	 * returned, but in reversed order. In other words, return the last or
+	 * first scanned tuple in a DISTINCT set, depending on a cursor direction.
+	 * Due to that we skip also when the first tuple wasn't emitted yet, but
+	 * the directions are opposite.
+	 */
+	if (node->iss_SkipPrefixSize > 0 &&
+		(node->iss_FirstTupleEmitted ||
+		 ScanDirectionsAreOpposite(direction, indexscan->indexorderdir)))
+	{
+		if (!index_skip(scandesc, direction, indexscan->indexorderdir,
+					   !node->iss_FirstTupleEmitted, node->iss_SkipPrefixSize))
+		{
+			/*
+			 * Reached end of index. At this point currPos is invalidated, and
+			 * we need to reset iss_FirstTupleEmitted, since otherwise after
+			 * going backwards, reaching the end of index, and going forward
+			 * again we apply skip again. It would be incorrect and lead to an
+			 * extra skipped item.
+			 */
+			node->iss_FirstTupleEmitted = false;
+			return ExecClearTuple(slot);
+		}
+		else
+		{
+			skipped = true;
+			index_fetch_heap(scandesc, slot);
+		}
+	}
+
 	/*
 	 * ok, now that we have what we need, fetch the next tuple.
 	 */
-	while (index_getnext_slot(scandesc, direction, slot))
+	while (skipped || index_getnext_slot(scandesc, direction, slot))
 	{
 		CHECK_FOR_INTERRUPTS();
+		skipped = false;
 
 		/*
 		 * If the index was lossy, we have to recheck the index quals using
@@ -149,6 +198,7 @@ IndexNext(IndexScanState *node)
 			}
 		}
 
+		node->iss_FirstTupleEmitted = true;
 		return slot;
 	}
 
@@ -910,6 +960,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
 	indexstate->ss.ps.ExecProcNode = ExecIndexScan;
+	indexstate->iss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->iss_FirstTupleEmitted = false;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 54ad62bb7f..e0cfd710c4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -493,6 +493,7 @@ _copyIndexScan(const IndexScan *from)
 	COPY_NODE_FIELD(indexorderbyorig);
 	COPY_NODE_FIELD(indexorderbyops);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
 
 	return newnode;
 }
@@ -518,6 +519,7 @@ _copyIndexOnlyScan(const IndexOnlyScan *from)
 	COPY_NODE_FIELD(indexorderby);
 	COPY_NODE_FIELD(indextlist);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 16083e7a7e..5f723cda4b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -562,6 +562,7 @@ _outIndexScan(StringInfo str, const IndexScan *node)
 	WRITE_NODE_FIELD(indexorderbyorig);
 	WRITE_NODE_FIELD(indexorderbyops);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
 }
 
 static void
@@ -576,6 +577,7 @@ _outIndexOnlyScan(StringInfo str, const IndexOnlyScan *node)
 	WRITE_NODE_FIELD(indexorderby);
 	WRITE_NODE_FIELD(indextlist);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 551ce6c41c..028d03a56d 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1820,6 +1820,7 @@ _readIndexScan(void)
 	READ_NODE_FIELD(indexorderbyorig);
 	READ_NODE_FIELD(indexorderbyops);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
 
 	READ_DONE();
 }
@@ -1839,6 +1840,7 @@ _readIndexOnlyScan(void)
 	READ_NODE_FIELD(indexorderby);
 	READ_NODE_FIELD(indextlist);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b5a0033721..710edf160a 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -124,6 +124,7 @@ int			max_parallel_workers_per_gather = 2;
 bool		enable_seqscan = true;
 bool		enable_indexscan = true;
 bool		enable_indexonlyscan = true;
+bool		enable_indexskipscan = true;
 bool		enable_bitmapscan = true;
 bool		enable_tidscan = true;
 bool		enable_sort = true;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index dff826a828..7b32f2cc7e 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -175,12 +175,14 @@ static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
 								 Oid indexid, List *indexqual, List *indexqualorig,
 								 List *indexorderby, List *indexorderbyorig,
 								 List *indexorderbyops,
-								 ScanDirection indexscandir);
+								 ScanDirection indexscandir,
+								 int skipprefix);
 static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 Index scanrelid, Oid indexid,
 										 List *indexqual, List *indexorderby,
 										 List *indextlist,
-										 ScanDirection indexscandir);
+										 ScanDirection indexscandir,
+										 int skipprefix);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
 											  List *indexqualorig);
@@ -2910,7 +2912,8 @@ create_indexscan_plan(PlannerInfo *root,
 												fixed_indexquals,
 												fixed_indexorderbys,
 												best_path->indexinfo->indextlist,
-												best_path->indexscandir);
+												best_path->indexscandir,
+												best_path->indexskipprefix);
 	else
 		scan_plan = (Scan *) make_indexscan(tlist,
 											qpqual,
@@ -2921,7 +2924,8 @@ create_indexscan_plan(PlannerInfo *root,
 											fixed_indexorderbys,
 											indexorderbys,
 											indexorderbyops,
-											best_path->indexscandir);
+											best_path->indexscandir,
+											best_path->indexskipprefix);
 
 	copy_generic_path_info(&scan_plan->plan, &best_path->path);
 
@@ -5184,7 +5188,8 @@ make_indexscan(List *qptlist,
 			   List *indexorderby,
 			   List *indexorderbyorig,
 			   List *indexorderbyops,
-			   ScanDirection indexscandir)
+			   ScanDirection indexscandir,
+			   int skipPrefixSize)
 {
 	IndexScan  *node = makeNode(IndexScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5201,6 +5206,7 @@ make_indexscan(List *qptlist,
 	node->indexorderbyorig = indexorderbyorig;
 	node->indexorderbyops = indexorderbyops;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
 
 	return node;
 }
@@ -5213,7 +5219,8 @@ make_indexonlyscan(List *qptlist,
 				   List *indexqual,
 				   List *indexorderby,
 				   List *indextlist,
-				   ScanDirection indexscandir)
+				   ScanDirection indexscandir,
+				   int skipPrefixSize)
 {
 	IndexOnlyScan *node = makeNode(IndexOnlyScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5228,6 +5235,7 @@ make_indexonlyscan(List *qptlist,
 	node->indexorderby = indexorderby;
 	node->indextlist = indextlist;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 984fca0696..c84388e6f7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4834,6 +4834,82 @@ create_distinct_paths(PlannerInfo *root,
 												  path,
 												  list_length(root->distinct_pathkeys),
 												  numDistinctRows));
+
+				/* Consider index skip scan as well */
+				if (enable_indexskipscan &&
+					IsA(path, IndexPath) &&
+					((IndexPath *) path)->indexinfo->amcanskip &&
+					root->distinct_pathkeys != NIL)
+				{
+					ListCell   		*lc;
+					IndexOptInfo 	*index = NULL;
+					bool 			different_columns_order = false,
+									not_empty_qual = false;
+					int 			i = 0;
+					int 			distinctPrefixKeys;
+
+					Assert(path->pathtype == T_IndexOnlyScan ||
+						   path->pathtype == T_IndexScan);
+
+					index = ((IndexPath *) path)->indexinfo;
+					distinctPrefixKeys = list_length(root->query_uniquekeys);
+
+					/*
+					 * Normally we can think about distinctPrefixKeys as just
+					 * a number of distinct keys. But if lets say we have a
+					 * distinct key a, and the index contains b, a in exactly
+					 * this order. In such situation we need to use position
+					 * of a in the index as distinctPrefixKeys, otherwise skip
+					 * will happen only by the first column.
+					 */
+					foreach(lc, root->query_uniquekeys)
+					{
+						UniqueKey *uniquekey = (UniqueKey *) lfirst(lc);
+						EquivalenceMember *em =
+							lfirst_node(EquivalenceMember,
+										list_head(uniquekey->eq_clause->ec_members));
+						Var *var = (Var *) em->em_expr;
+
+						Assert(i < index->ncolumns);
+
+						for (i = 0; i < index->ncolumns; i++)
+						{
+							if (index->indexkeys[i] == var->varattno)
+							{
+								distinctPrefixKeys = Max(i + 1, distinctPrefixKeys);
+								break;
+							}
+						}
+					}
+
+					/*
+					 * XXX: In case of index scan quals evaluation happens
+					 * after ExecScanFetch, which means skip results could be
+					 * fitered out. Consider the following query:
+					 *
+					 * 		select distinct (a, b) a, b, c from t where  c < 100;
+					 *
+					 * Skip scan returns one tuple for one distinct set of (a,
+					 * b) with arbitrary one of c, so if the choosed c does
+					 * not match the qual and there is any c that matches the
+					 * qual, we miss that tuple.
+					 */
+					if (path->pathtype == T_IndexScan &&
+						parse->jointree != NULL &&
+						parse->jointree->quals != NULL &&
+						list_length((List *) parse->jointree->quals) != 0)
+							not_empty_qual = true;
+
+					if (!different_columns_order &&	!not_empty_qual)
+					{
+						add_path(distinct_rel, (Path *)
+								 create_skipscan_unique_path(root,
+															 distinct_rel,
+															 path,
+															 distinctPrefixKeys,
+															 numDistinctRows));
+					}
+				}
 			}
 		}
 
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index a006dbbe9c..d483ec38f2 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2915,6 +2915,45 @@ create_upper_unique_path(PlannerInfo *root,
 	return pathnode;
 }
 
+/*
+ * create_skipscan_unique_path
+ *	  Creates a pathnode the same as an existing IndexPath except based on
+ *	  skipping duplicate values.  This may or may not be cheaper than using
+ *	  create_upper_unique_path.
+ *
+ * The input path must be an IndexPath for an index that supports amskip.
+ */
+IndexPath *
+create_skipscan_unique_path(PlannerInfo *root,
+							RelOptInfo *rel,
+							Path *basepath,
+							int distinctPrefixKeys,
+							double numGroups)
+{
+	IndexPath *pathnode = makeNode(IndexPath);
+
+	Assert(IsA(basepath, IndexPath));
+
+	/* We don't want to modify basepath, so make a copy. */
+	memcpy(pathnode, basepath, sizeof(IndexPath));
+
+	/* The size of the prefix we'll use for skipping. */
+	Assert(pathnode->indexinfo->amcanskip);
+	Assert(distinctPrefixKeys > 0);
+	pathnode->indexskipprefix = distinctPrefixKeys;
+
+	/*
+	 * The cost to skip to each distinct value should be roughly the same as
+	 * the cost of finding the first key times the number of distinct values
+	 * we expect to find.
+	 */
+	pathnode->path.startup_cost = basepath->startup_cost;
+	pathnode->path.total_cost = basepath->startup_cost * numGroups;
+	pathnode->path.rows = numGroups;
+
+	return pathnode;
+}
+
 /*
  * create_agg_path
  *	  Creates a pathnode that represents performing aggregation/grouping
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d82fc5ab8b..f65b299f37 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -271,6 +271,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			info->amoptionalkey = amroutine->amoptionalkey;
 			info->amsearcharray = amroutine->amsearcharray;
 			info->amsearchnulls = amroutine->amsearchnulls;
+			info->amcanskip = (amroutine->amskip != NULL);
 			info->amcanparallel = amroutine->amcanparallel;
 			info->amhasgettuple = (amroutine->amgettuple != NULL);
 			info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index cacbe904db..7c71ee4499 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -923,6 +923,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_indexskipscan", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of index-skip-scan plans."),
+			NULL
+		},
+		&enable_indexskipscan,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_bitmapscan", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of bitmap-scan plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e1048c0047..a002ee2143 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -353,6 +353,7 @@
 #enable_hashjoin = on
 #enable_indexscan = on
 #enable_indexonlyscan = on
+#enable_indexskipscan = on
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 3b3e22f73d..3d39cd9d07 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -130,6 +130,13 @@ typedef void (*amrescan_function) (IndexScanDesc scan,
 typedef bool (*amgettuple_function) (IndexScanDesc scan,
 									 ScanDirection direction);
 
+/* skip past duplicates in a given prefix */
+typedef bool (*amskip_function) (IndexScanDesc scan,
+								 ScanDirection dir,
+								 ScanDirection indexdir,
+								 bool start,
+								 int prefix);
+
 /* fetch all valid tuples */
 typedef int64 (*amgetbitmap_function) (IndexScanDesc scan,
 									   TIDBitmap *tbm);
@@ -229,6 +236,7 @@ typedef struct IndexAmRoutine
 	amendscan_function amendscan;
 	ammarkpos_function ammarkpos;	/* can be NULL */
 	amrestrpos_function amrestrpos; /* can be NULL */
+	amskip_function amskip;				/* can be NULL */
 
 	/* interface functions to support parallel index scans */
 	amestimateparallelscan_function amestimateparallelscan; /* can be NULL */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 7e9364a50c..815de4e4dd 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,8 @@ extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
 extern IndexBulkDeleteResult *index_vacuum_cleanup(IndexVacuumInfo *info,
 												   IndexBulkDeleteResult *stats);
 extern bool index_can_return(Relation indexRelation, int attno);
+extern bool index_skip(IndexScanDesc scan, ScanDirection direction,
+					   ScanDirection indexdir, bool start, int prefix);
 extern RegProcedure index_getprocid(Relation irel, AttrNumber attnum,
 									uint16 procnum);
 extern FmgrInfo *index_getprocinfo(Relation irel, AttrNumber attnum,
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 20ace69dab..e098c6a1ab 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -662,6 +662,9 @@ typedef struct BTScanOpaqueData
 	 */
 	int			markItemIndex;	/* itemIndex, or -1 if not valid */
 
+	/* Work space for _bt_skip */
+	BTScanInsert	skipScanKey;	/* used to control skipping */
+
 	/* keep these last in struct for efficiency */
 	BTScanPosData currPos;		/* current position data */
 	BTScanPosData markPos;		/* marked position, if any */
@@ -793,6 +796,8 @@ extern OffsetNumber _bt_binsrch_insert(Relation rel, BTInsertState insertstate);
 extern int32 _bt_compare(Relation rel, BTScanInsert key, Page page, OffsetNumber offnum);
 extern bool _bt_first(IndexScanDesc scan, ScanDirection dir);
 extern bool _bt_next(IndexScanDesc scan, ScanDirection dir);
+extern bool _bt_skip(IndexScanDesc scan, ScanDirection dir,
+					 ScanDirection indexdir, bool start, int prefix);
 extern Buffer _bt_get_endpoint(Relation rel, uint32 level, bool rightmost,
 							   Snapshot snapshot);
 
@@ -817,6 +822,8 @@ extern void _bt_end_vacuum_callback(int code, Datum arg);
 extern Size BTreeShmemSize(void);
 extern void BTreeShmemInit(void);
 extern bytea *btoptions(Datum reloptions, bool validate);
+extern bool btskip(IndexScanDesc scan, ScanDirection dir,
+				   ScanDirection indexdir, bool start, int prefix);
 extern bool btproperty(Oid index_oid, int attno,
 					   IndexAMProperty prop, const char *propname,
 					   bool *res, bool *isnull);
diff --git a/src/include/access/sdir.h b/src/include/access/sdir.h
index 23feb90986..094a127464 100644
--- a/src/include/access/sdir.h
+++ b/src/include/access/sdir.h
@@ -55,4 +55,11 @@ typedef enum ScanDirection
 #define ScanDirectionIsForward(direction) \
 	((bool) ((direction) == ForwardScanDirection))
 
+/*
+ * ScanDirectionsAreOpposite
+ *		True iff scan directions are backward/forward or forward/backward.
+ */
+#define ScanDirectionsAreOpposite(dirA, dirB) \
+	((bool) (dirA != NoMovementScanDirection && dirA == -dirB))
+
 #endif							/* SDIR_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 1f6f5bbc20..2c6acc160a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1423,6 +1423,8 @@ typedef struct IndexScanState
 	ExprContext *iss_RuntimeContext;
 	Relation	iss_RelationDesc;
 	struct IndexScanDescData *iss_ScanDesc;
+	int         iss_SkipPrefixSize;
+	bool		iss_FirstTupleEmitted;
 
 	/* These are needed for re-checking ORDER BY expr ordering */
 	pairingheap *iss_ReorderQueue;
@@ -1452,6 +1454,8 @@ typedef struct IndexScanState
  *		TableSlot		   slot for holding tuples fetched from the table
  *		VMBuffer		   buffer in use for visibility map testing, if any
  *		PscanLen		   size of parallel index-only scan descriptor
+ *		SkipPrefixSize	   number of keys for skip-based DISTINCT
+ *		FirstTupleEmitted  has the first tuple been emitted
  * ----------------
  */
 typedef struct IndexOnlyScanState
@@ -1470,6 +1474,8 @@ typedef struct IndexOnlyScanState
 	struct IndexScanDescData *ioss_ScanDesc;
 	TupleTableSlot *ioss_TableSlot;
 	Buffer		ioss_VMBuffer;
+	int         ioss_SkipPrefixSize;
+	bool		ioss_FirstTupleEmitted;
 	Size		ioss_PscanLen;
 } IndexOnlyScanState;
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4e329f0fb5..b0ff9ca3a8 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -839,6 +839,7 @@ struct IndexOptInfo
 	bool		amsearchnulls;	/* can AM search for NULL/NOT NULL entries? */
 	bool		amhasgettuple;	/* does AM have amgettuple interface? */
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
+	bool		amcanskip;		/* can AM skip duplicate values? */
 	bool		amcanparallel;	/* does AM support parallel scan? */
 	/* Rather than include amapi.h here, we declare amcostestimate like this */
 	void		(*amcostestimate) ();	/* AM's cost estimator */
@@ -1189,6 +1190,9 @@ typedef struct Path
  * we need not recompute them when considering using the same index in a
  * bitmap index/heap scan (see BitmapHeapPath).  The costs of the IndexPath
  * itself represent the costs of an IndexScan or IndexOnlyScan plan type.
+ *
+ * 'indexskipprefix' represents the number of columns to consider for skip
+ * scans.
  *----------
  */
 typedef struct IndexPath
@@ -1201,6 +1205,7 @@ typedef struct IndexPath
 	ScanDirection indexscandir;
 	Cost		indextotalcost;
 	Selectivity indexselectivity;
+	int			indexskipprefix;
 } IndexPath;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 32c0d87f80..03a00e8e1d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -409,6 +409,8 @@ typedef struct IndexScan
 	List	   *indexorderbyorig;	/* the same in original form */
 	List	   *indexorderbyops;	/* OIDs of sort ops for ORDER BY exprs */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for distinct
+										 * scans */
 } IndexScan;
 
 /* ----------------
@@ -436,6 +438,8 @@ typedef struct IndexOnlyScan
 	List	   *indexorderby;	/* list of index ORDER BY exprs */
 	List	   *indextlist;		/* TargetEntry list describing index's cols */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for distinct
+										 * scans */
 } IndexOnlyScan;
 
 /* ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index cb012ba198..847f34f02b 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -50,6 +50,7 @@ extern PGDLLIMPORT int max_parallel_workers_per_gather;
 extern PGDLLIMPORT bool enable_seqscan;
 extern PGDLLIMPORT bool enable_indexscan;
 extern PGDLLIMPORT bool enable_indexonlyscan;
+extern PGDLLIMPORT bool enable_indexskipscan;
 extern PGDLLIMPORT bool enable_bitmapscan;
 extern PGDLLIMPORT bool enable_tidscan;
 extern PGDLLIMPORT bool enable_sort;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index f75ff6f323..6c8c9dadbb 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -201,6 +201,11 @@ extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
 												 Path *subpath,
 												 int numCols,
 												 double numGroups);
+extern IndexPath *create_skipscan_unique_path(PlannerInfo *root,
+											  RelOptInfo *rel,
+											  Path *subpath,
+											  int numCols,
+											  double numGroups);
 extern AggPath *create_agg_path(PlannerInfo *root,
 								RelOptInfo *rel,
 								Path *subpath,
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index f3696c6d1d..259db10c81 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -244,3 +244,604 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+ANALYZE distinct_a;
+SELECT DISTINCT a FROM distinct_a;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+ a 
+---
+ 1
+(1 row)
+
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Index Only Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: true
+(2 rows)
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: true
+   Index Cond: (b = 2)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: true
+   Index Cond: (b = 2)
+(3 rows)
+
+DROP INDEX distinct_a_b_a;
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+FETCH FROM c;
+ a | b 
+---+---
+ 1 | 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+END;
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+FETCH FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+END;
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Index Only Scan using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: true
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 1 | 2
+ 3 | 1 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 1 | 2
+ 1 | 1 | 2
+(2 rows)
+
+END;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Index Only Scan Backward using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: true
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 2 | 2
+ 1 | 2 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 2 | 2
+ 3 | 2 | 2
+(2 rows)
+
+END;
+DROP TABLE distinct_abc;
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+ 2 | 1 | 10
+ 3 | 1 | 10
+ 4 | 1 | 10
+ 5 | 1 | 10
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Index Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: true
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Unique
+   ->  Bitmap Heap Scan on distinct_a
+         Recheck Cond: (a = 1)
+         ->  Bitmap Index Scan on distinct_a_a_b_idx
+               Index Cond: (a = 1)
+(5 rows)
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Unique
+   ->  Index Scan using distinct_a_a_b_idx on distinct_a
+         Index Cond: (b = 2)
+         Filter: (c = 10)
+(4 rows)
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+ a | a 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+ 4 | 4
+ 5 | 5
+(5 rows)
+
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+ a | ?column? 
+---+----------
+ 1 |        1
+ 2 |        1
+ 3 |        1
+ 4 |        1
+ 5 |        1
+(5 rows)
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+FETCH FROM c;
+ a 
+---
+ 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a 
+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+END;
+DROP TABLE distinct_a;
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 |  9999
+ 1 | 10000
+(5 rows)
+
+DROP TABLE distinct_visibility;
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Index Only Scan using distinct_boundaries_a_b_c_idx on distinct_boundaries
+   Skip scan: true
+   Index Cond: ((b >= 1) AND (c = 0))
+(3 rows)
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+ a | b | c 
+---+---+---
+ 1 | 2 | 0
+ 2 | 2 | 0
+ 3 | 2 | 0
+ 4 | 2 | 0
+ 5 | 2 | 0
+(5 rows)
+
+DROP TABLE distinct_boundaries;
+-- test tuple killing
+-- DESC ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed where a = 3;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 5 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 1 | 1000 | 0 | 10
+(4 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 1 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 5 | 1000 | 0 | 10
+(4 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
+-- regular ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed where a = 3;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a, b;
+    FETCH FORWARD ALL FROM c;
+ a | b | c | d  
+---+---+---+----
+ 1 | 1 | 1 | 10
+ 2 | 1 | 1 | 10
+ 4 | 1 | 1 | 10
+ 5 | 1 | 1 | 10
+(4 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a | b | c | d  
+---+---+---+----
+ 5 | 1 | 1 | 10
+ 4 | 1 | 1 | 10
+ 2 | 1 | 1 | 10
+ 1 | 1 | 1 | 10
+(4 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
+-- partial delete
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed WHERE a = 3 AND b <= 999;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 5 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 3 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 1 | 1000 | 0 | 10
+(5 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 1 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 3 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 5 | 1000 | 0 | 10
+(5 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index a1c90eb905..bd3b373515 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -78,6 +78,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashjoin                | on
  enable_indexonlyscan           | on
  enable_indexscan               | on
+ enable_indexskipscan           | on
  enable_material                | on
  enable_mergejoin               | on
  enable_nestloop                | on
@@ -89,7 +90,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(17 rows)
+(18 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index a605e86449..843efeb28f 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -73,3 +73,251 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+ANALYZE distinct_a;
+
+SELECT DISTINCT a FROM distinct_a;
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+DROP INDEX distinct_a_b_a;
+
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+DROP TABLE distinct_abc;
+
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+DROP TABLE distinct_a;
+
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DROP TABLE distinct_visibility;
+
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+DROP TABLE distinct_boundaries;
+
+-- test tuple killing
+
+-- DESC ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed where a = 3;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
+
+-- regular ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed where a = 3;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a, b;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
+
+-- partial delete
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed WHERE a = 3 AND b <= 999;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
-- 
2.21.0

#41

David Rowley

dgrowleyml@gmail.com

almost 6 years ago

In reply to: Dmitry Dolgov (#40)

Re: Index Skip Scan

On Tue, 18 Feb 2020 at 05:24, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Here is something similar to what I had in mind.

(changing to this email address for future emails)

Hi,

I've been looking over v32 of the patch and have a few comments
regarding the planner changes.

I think the changes in create_distinct_paths() need more work. The
way I think this should work is that create_distinct_paths() gets to
know exactly nothing about what path types support the elimination of
duplicate values. The Path should carry the UniqueKeys so that can be
determined. In create_distinct_paths() you should just be able to make
use of those paths, which should already have been created when
creating index paths for the rel due to PlannerInfo's query_uniquekeys
having been set.

The reason it must be done this way is that when the RelOptInfo that
we're performing the DISTINCT on is a joinrel, then we're not going to
see any IndexPaths in the RelOptInfo's pathlist. We'll have some sort
of Join path instead. I understand you're not yet supporting doing
this optimisation when joins are involved, but it should be coded in
such a way that it'll work when we do. (It's probably also a separate
question as to whether we should have this only work when there are no
joins. I don't think I personally object to it for stage 1, but
perhaps someone else might think differently.)

For storing these new paths with UniqueKeys, I'm not sure exactly if
we can just add_path() such paths into the RelOptInfo's pathlist.
What we don't want to do is accidentally make use of paths which
eliminate duplicate values when we don't want that behaviour. If we
did store these paths in RelOptInfo->pathlist then we'd need to go and
modify a bunch of places to ignore such paths. set_cheapest() would
have to do something special for them too, which makes me think
pathlist is the incorrect place. Parallel query added
partial_pathlist, so perhaps we need unique_pathlist to make this
work.

Also, should create_grouping_paths() be getting the same code?
Jesper's UniqueKey patch seems to set query_uniquekeys when there's a
GROUP BY with no aggregates. So it looks like he has intended that
something like:

SELECT x FROM t GROUP BY x;

should work the same way as

SELECT DISTINCT x FROM t;

but the 0002 patch does not make this work. Has that just been overlooked?

There's also some weird looking assumptions that an EquivalenceMember
can only be a Var in create_distinct_paths(). I think you're only
saved from crashing there because a ProjectionPath will be created
atop of the IndexPath to evaluate expressions, in which case you're
not seeing the IndexPath. This results in the optimisation not
working in cases like:

postgres=# create table t (a int); create index on t ((a+1)); explain
select distinct a+1 from t;
CREATE TABLE
CREATE INDEX
QUERY PLAN
-----------------------------------------------------------
HashAggregate (cost=48.25..50.75 rows=200 width=4)
Group Key: (a + 1)
-> Seq Scan on t (cost=0.00..41.88 rows=2550 width=4)

Using unique paths as I mentioned above should see that fixed.

David

#42

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 6 years ago

In reply to: David Rowley (#41)

Re: Index Skip Scan

On Wed, Mar 04, 2020 at 11:32:00AM +1300, David Rowley wrote:

On Tue, 18 Feb 2020 at 05:24, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Here is something similar to what I had in mind.

(changing to this email address for future emails)

Hi,

I've been looking over v32 of the patch and have a few comments
regarding the planner changes.

I think the changes in create_distinct_paths() need more work. The
way I think this should work is that create_distinct_paths() gets to
know exactly nothing about what path types support the elimination of
duplicate values. The Path should carry the UniqueKeys so that can be
determined. In create_distinct_paths() you should just be able to make
use of those paths, which should already have been created when
creating index paths for the rel due to PlannerInfo's query_uniquekeys
having been set.

+1 to code this in a generic way, using query_uniquekeys (if possible)

The reason it must be done this way is that when the RelOptInfo that
we're performing the DISTINCT on is a joinrel, then we're not going to
see any IndexPaths in the RelOptInfo's pathlist. We'll have some sort
of Join path instead. I understand you're not yet supporting doing
this optimisation when joins are involved, but it should be coded in
such a way that it'll work when we do. (It's probably also a separate
question as to whether we should have this only work when there are no
joins. I don't think I personally object to it for stage 1, but
perhaps someone else might think differently.)

I don't follow. Can you elaborate more?

AFAICS skip-scan is essentially a capability of an (index) AM. I don't
see how we could ever do that for joinrels? We can do that at the scan
level, below a join, but that's what this patch already supports, I
think. When you say "supporting this optimisation" with joins, do you
mean doing skip-scan for join inputs, or on top of the join?

For storing these new paths with UniqueKeys, I'm not sure exactly if
we can just add_path() such paths into the RelOptInfo's pathlist.
What we don't want to do is accidentally make use of paths which
eliminate duplicate values when we don't want that behaviour. If we
did store these paths in RelOptInfo->pathlist then we'd need to go and
modify a bunch of places to ignore such paths. set_cheapest() would
have to do something special for them too, which makes me think
pathlist is the incorrect place. Parallel query added
partial_pathlist, so perhaps we need unique_pathlist to make this
work.

Hmmm, good point. Do we actually produce incorrect plans with the
current patch, using skip-scan path when we should not?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#43

David Rowley

dgrowleyml@gmail.com

almost 6 years ago

In reply to: Tomas Vondra (#42)

Re: Index Skip Scan

On Sat, 7 Mar 2020 at 03:49, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On Wed, Mar 04, 2020 at 11:32:00AM +1300, David Rowley wrote:

The reason it must be done this way is that when the RelOptInfo that
we're performing the DISTINCT on is a joinrel, then we're not going to
see any IndexPaths in the RelOptInfo's pathlist. We'll have some sort
of Join path instead. I understand you're not yet supporting doing
this optimisation when joins are involved, but it should be coded in
such a way that it'll work when we do. (It's probably also a separate
question as to whether we should have this only work when there are no
joins. I don't think I personally object to it for stage 1, but
perhaps someone else might think differently.)

I don't follow. Can you elaborate more?

AFAICS skip-scan is essentially a capability of an (index) AM. I don't
see how we could ever do that for joinrels? We can do that at the scan
level, below a join, but that's what this patch already supports, I
think. When you say "supporting this optimisation" with joins, do you
mean doing skip-scan for join inputs, or on top of the join?

The skip index scan Path would still be created at the base rel level,
but the join path on the join relation would have one of the sub-paths
of the join as an index skip scan.

An example query that could make use of this is:

SELECT * FROM some_table WHERE a IN(SELECT
indexed_col_with_few_distinct_values FROM big_table);

In this case, we might want to create a Skip Scan path on big_table
using the index on the "indexed_col_with_few_distinct_values", then
Hash Join to "some_table". That class of query is likely stage 2 or 3
of this work, but we need to lay foundations that'll support it.

As for not having IndexScan paths in joinrels. Yes, of course, but
that's exactly why create_distinct_paths() cannot work the way the
patch currently codes it. The patch does:

+ /*
+ * XXX: In case of index scan quals evaluation happens
+ * after ExecScanFetch, which means skip results could be
+ * fitered out. Consider the following query:
+ *
+ * select distinct (a, b) a, b, c from t where  c < 100;
+ *
+ * Skip scan returns one tuple for one distinct set of (a,
+ * b) with arbitrary one of c, so if the choosed c does
+ * not match the qual and there is any c that matches the
+ * qual, we miss that tuple.
+ */
+ if (path->pathtype == T_IndexScan &&

which will never work for join relations since they'll only have paths
for Loop/Merge/Hash type joins. The key here is to determine which
skip scan paths we should create when we're building the normal index
paths then see if we can make use of those when planning joins.
Subsequently, we'll then see if we can make use of the resulting join
paths during create_distinct_paths(). Doing it this way will allow us
to use skip scans in queries such as:

SELECT DISTINCT t1.z FROM t1 INNER JOIN t2 ON t1.a = t2.unique_col;

We'll first create the skip scan paths on t1, then when creating the
join paths we'll create additional join paths which use the skipscan
path. Because t1.unique_col will at most have 1 join partner for each
t2 row, then the join path will have the same unique_keys as the
skipscan path. That'll allow us to use the join path which has the
skip scan on whichever side of the join the t1 relation ends up. All
create_distinct_paths() should be doing is looking for paths that are
already implicitly unique on the distinct clause and consider using
those in a cost-based way. It shouldn't be making such paths itself.

For storing these new paths with UniqueKeys, I'm not sure exactly if
we can just add_path() such paths into the RelOptInfo's pathlist.
What we don't want to do is accidentally make use of paths which
eliminate duplicate values when we don't want that behaviour. If we
did store these paths in RelOptInfo->pathlist then we'd need to go and
modify a bunch of places to ignore such paths. set_cheapest() would
have to do something special for them too, which makes me think
pathlist is the incorrect place. Parallel query added
partial_pathlist, so perhaps we need unique_pathlist to make this
work.

Hmmm, good point. Do we actually produce incorrect plans with the
current patch, using skip-scan path when we should not?

I don't think so. The patch is only creating skip scan paths on the
base rel when we discover it's valid to do so. That's not the way it
should work though. How the patch currently works would be similar to
initially only creating a SeqScan path for a query such as: SELECT *
FROM tab ORDER BY a;, but then, during create_ordered_paths() go and
create some IndexPath to scan the btree index on tab.a because we
suddenly realise that it'll be good to use that for the ORDER BY.
The planner does not work that way. We always create all the paths
that we think will be useful during set_base_rel_pathlists(). We then
make use of only existing paths in the upper planner. See what
build_index_paths() in particular:

/* see if we can generate ordering operators for query_pathkeys */
match_pathkeys_to_index(index, root->query_pathkeys,
&orderbyclauses,
&orderbyclausecols);

We'll need something similar to that but for the query_uniquekeys and
ensure we build the skip scan paths when we think they'll be useful
and do so during the call to set_base_rel_pathlists(). Later in stage
2 or 3, we can go build skip scan paths when there are semi/anti joins
that could make use of them. Making that work will just be some
plumbing work in build_index_paths() and making use of those paths
during add_paths_to_joinrel().

David

#44

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: David Rowley (#41)

Re: Index Skip Scan

On Wed, Mar 04, 2020 at 11:32:00AM +1300, David Rowley wrote:

I've been looking over v32 of the patch and have a few comments
regarding the planner changes.

Thanks for the commentaries!

I think the changes in create_distinct_paths() need more work. The
way I think this should work is that create_distinct_paths() gets to
know exactly nothing about what path types support the elimination of
duplicate values. The Path should carry the UniqueKeys so that can be
determined. In create_distinct_paths() you should just be able to make
use of those paths, which should already have been created when
creating index paths for the rel due to PlannerInfo's query_uniquekeys
having been set.

Just for me to clarify. The idea is to "move" information about what
path types support skipping into UniqueKeys (derived from PlannerInfo's
query_uniquekeys), but other checks (e.g. if index am supports that)
still perform in create_distinct_paths?

Also, should create_grouping_paths() be getting the same code?
Jesper's UniqueKey patch seems to set query_uniquekeys when there's a
GROUP BY with no aggregates. So it looks like he has intended that
something like:

SELECT x FROM t GROUP BY x;

should work the same way as

SELECT DISTINCT x FROM t;

but the 0002 patch does not make this work. Has that just been overlooked?

I believe it wasn't overlooked in 0002 patch, but rather added just in
case in 0001. I guess there are no theoretical problems in implementing
it, but since we wanted to keep scope of the patch under control and
concentrate on the existing functionality it probably makes sense to
plan it as one of the next steps?

There's also some weird looking assumptions that an EquivalenceMember
can only be a Var in create_distinct_paths(). I think you're only
saved from crashing there because a ProjectionPath will be created
atop of the IndexPath to evaluate expressions, in which case you're
not seeing the IndexPath. This results in the optimisation not
working in cases like:

postgres=# create table t (a int); create index on t ((a+1)); explain
select distinct a+1 from t;
CREATE TABLE
CREATE INDEX
QUERY PLAN
-----------------------------------------------------------
HashAggregate (cost=48.25..50.75 rows=200 width=4)
Group Key: (a + 1)
-> Seq Scan on t (cost=0.00..41.88 rows=2550 width=4)

Yes, I need to fix it.

Using unique paths as I mentioned above should see that fixed.

I'm a bit confused about this statement, how exactly unique paths should
fix this?

#45

David Rowley

dgrowleyml@gmail.com

almost 6 years ago

In reply to: Dmitry Dolgov (#44)

Re: Index Skip Scan

On Mon, 9 Mar 2020 at 03:21, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

I've been looking over v32 of the patch and have a few comments
regarding the planner changes.

Thanks for the commentaries!

I think the changes in create_distinct_paths() need more work. The
way I think this should work is that create_distinct_paths() gets to
know exactly nothing about what path types support the elimination of
duplicate values. The Path should carry the UniqueKeys so that can be
determined. In create_distinct_paths() you should just be able to make
use of those paths, which should already have been created when
creating index paths for the rel due to PlannerInfo's query_uniquekeys
having been set.

Just for me to clarify. The idea is to "move" information about what
path types support skipping into UniqueKeys (derived from PlannerInfo's
query_uniquekeys), but other checks (e.g. if index am supports that)
still perform in create_distinct_paths?

create_distinct_paths() shouldn't know any details specific to the
pathtype that it's using or considering using. All the details should
just be in Path. e.g. uniquekeys, pathkeys, costs etc. There should be
no IsA(path, ...). Please have a look over the details in my reply to
Tomas. I hope that reply has enough information in it, but please
reply there if I've missed something.

On Wed, Mar 04, 2020 at 11:32:00AM +1300, David Rowley wrote:
There's also some weird looking assumptions that an EquivalenceMember
can only be a Var in create_distinct_paths(). I think you're only
saved from crashing there because a ProjectionPath will be created
atop of the IndexPath to evaluate expressions, in which case you're
not seeing the IndexPath. This results in the optimisation not
working in cases like:

postgres=# create table t (a int); create index on t ((a+1)); explain
select distinct a+1 from t;
CREATE TABLE
CREATE INDEX
QUERY PLAN
-----------------------------------------------------------
HashAggregate (cost=48.25..50.75 rows=200 width=4)
Group Key: (a + 1)
-> Seq Scan on t (cost=0.00..41.88 rows=2550 width=4)

Yes, I need to fix it.

Using unique paths as I mentioned above should see that fixed.

I'm a bit confused about this statement, how exactly unique paths should
fix this?

The path's uniquekeys would mention that it's unique on (a+1). You'd
compare the uniquekeys of the path to the DISTINCT clause and see that
the uniquekeys are a subset of the DISTINCT clause therefore the
DISTINCT is a no-op. If that uniquekey path is cheaper than the
cheapest_total_path + <cost of uniquification method>, then you should
pick the unique path, otherwise use the cheapest_total_path and
uniquify that.

I think the UniqueKeys may need to be changed from using
EquivalenceClasses to use Exprs instead.

#46

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: David Rowley (#45)

Re: Index Skip Scan

On Mon, Mar 09, 2020 at 10:27:26AM +1300, David Rowley wrote:

I think the changes in create_distinct_paths() need more work. The
way I think this should work is that create_distinct_paths() gets to
know exactly nothing about what path types support the elimination of
duplicate values. The Path should carry the UniqueKeys so that can be
determined. In create_distinct_paths() you should just be able to make
use of those paths, which should already have been created when
creating index paths for the rel due to PlannerInfo's query_uniquekeys
having been set.

Just for me to clarify. The idea is to "move" information about what
path types support skipping into UniqueKeys (derived from PlannerInfo's
query_uniquekeys), but other checks (e.g. if index am supports that)
still perform in create_distinct_paths?

create_distinct_paths() shouldn't know any details specific to the
pathtype that it's using or considering using. All the details should
just be in Path. e.g. uniquekeys, pathkeys, costs etc. There should be
no IsA(path, ...). Please have a look over the details in my reply to
Tomas. I hope that reply has enough information in it, but please
reply there if I've missed something.

Yes, I've read this reply, just wanted to ask here, since I had other
questions as well. Speaking of which:

On Wed, Mar 04, 2020 at 11:32:00AM +1300, David Rowley wrote:
There's also some weird looking assumptions that an EquivalenceMember
can only be a Var in create_distinct_paths(). I think you're only
saved from crashing there because a ProjectionPath will be created
atop of the IndexPath to evaluate expressions, in which case you're
not seeing the IndexPath.

I'm probably missing something, so to eliminate any misunderstanding
from my side:

This results in the optimisation not working in cases like:

postgres=# create table t (a int); create index on t ((a+1)); explain
select distinct a+1 from t;
CREATE TABLE
CREATE INDEX
QUERY PLAN
-----------------------------------------------------------
HashAggregate (cost=48.25..50.75 rows=200 width=4)
Group Key: (a + 1)
-> Seq Scan on t (cost=0.00..41.88 rows=2550 width=4)

In this particular example skipping is not applied because, as you've
mentioned, we're dealing with ProjectionPath (not IndexScan /
IndexOnlyScan). Which means we're not even reaching the code with
EquivalenceMember, so I'm still not sure how do they connected?

Assuming we'll implement it in a way that we do not know about what kind
of path type is that in create_distinct_path, then it can also work for
ProjectionPath or anything else (if UniqueKeys are present). But then
still EquivalenceMember are used only to figure out correct
distinctPrefixKeys and do not affect whether or not skipping is applied.
What do I miss?

#47

David Rowley

dgrowleyml@gmail.com

almost 6 years ago

In reply to: Dmitry Dolgov (#46)

Re: Index Skip Scan

On Tue, 10 Mar 2020 at 08:56, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Assuming we'll implement it in a way that we do not know about what kind
of path type is that in create_distinct_path, then it can also work for
ProjectionPath or anything else (if UniqueKeys are present). But then
still EquivalenceMember are used only to figure out correct
distinctPrefixKeys and do not affect whether or not skipping is applied.
What do I miss?

I'm not sure I fully understand the question correctly, but let me
explain further.

In the 0001 patch, standard_qp_callback sets the query_uniquekeys
depending on the DISTINCT / GROUP BY clause. When building index
paths in build_index_paths(), the 0002 patch should be looking at the
root->query_uniquekeys to see if it can build any index paths that
suit those keys. Such paths should be tagged with the uniquekeys they
satisfy, basically, exactly the same as how pathkeys work. Many
create_*_path functions will need to be modified to carry forward
their uniquekeys. For example, create_projection_path(),
create_limit_path() don't do anything which would cause the created
path to violate the unique keys. This way when you get down to
create_distinct_paths(), paths other than IndexPath may have
uniquekeys. You'll be able to check which existing paths satisfy the
unique keys required by the DISTINCT / GROUP BY and select those paths
instead of having to create any HashAggregate / Unique paths.

Does that answer the question?

#48

James Coleman

jtc331@gmail.com

almost 6 years ago

In reply to: Dmitry Dolgov (#46)

Re: Index Skip Scan

On Mon, Mar 9, 2020 at 3:56 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Assuming we'll implement it in a way that we do not know about what kind
of path type is that in create_distinct_path, then it can also work for
ProjectionPath or anything else (if UniqueKeys are present). But then
still EquivalenceMember are used only to figure out correct
distinctPrefixKeys and do not affect whether or not skipping is applied.
What do I miss?

Part of the puzzle seems to me to this part of the response:

I think the UniqueKeys may need to be changed from using
EquivalenceClasses to use Exprs instead.

But I can't say I'm being overly helpful by pointing that out, since I
don't have my head in the code enough to understand how you'd
accomplish that :)

James

#49

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: David Rowley (#47)

Re: Index Skip Scan

On Tue, Mar 10, 2020 at 09:29:32AM +1300, David Rowley wrote:

On Tue, 10 Mar 2020 at 08:56, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Assuming we'll implement it in a way that we do not know about what kind
of path type is that in create_distinct_path, then it can also work for
ProjectionPath or anything else (if UniqueKeys are present). But then
still EquivalenceMember are used only to figure out correct
distinctPrefixKeys and do not affect whether or not skipping is applied.
What do I miss?

I'm not sure I fully understand the question correctly, but let me
explain further.

In the 0001 patch, standard_qp_callback sets the query_uniquekeys
depending on the DISTINCT / GROUP BY clause. When building index
paths in build_index_paths(), the 0002 patch should be looking at the
root->query_uniquekeys to see if it can build any index paths that
suit those keys. Such paths should be tagged with the uniquekeys they
satisfy, basically, exactly the same as how pathkeys work. Many
create_*_path functions will need to be modified to carry forward
their uniquekeys. For example, create_projection_path(),
create_limit_path() don't do anything which would cause the created
path to violate the unique keys. This way when you get down to
create_distinct_paths(), paths other than IndexPath may have
uniquekeys. You'll be able to check which existing paths satisfy the
unique keys required by the DISTINCT / GROUP BY and select those paths
instead of having to create any HashAggregate / Unique paths.

Does that answer the question?

Hmm... I'm afraid no, this was already clear. But looks like now I see
that I've misinterpreted one part.

There's also some weird looking assumptions that an EquivalenceMember
can only be a Var in create_distinct_paths(). I think you're only
saved from crashing there because a ProjectionPath will be created
atop of the IndexPath to evaluate expressions, in which case you're
not seeing the IndexPath. This results in the optimisation not
working in cases like:

I've read it as "an assumption that an EquivalenceMember can only be a
Var" results in "the optimisation not working in cases like this". But
you've meant that ignoring a ProjectionPath with an IndexPath inside
results in this optimisation not working, right? If so, then everything
is clear, and my apologies, maybe I need to finally fix my sleep
schedule :)

#50

David Rowley

dgrowleyml@gmail.com

almost 6 years ago

In reply to: Dmitry Dolgov (#49)

Re: Index Skip Scan

On Wed, 11 Mar 2020 at 01:38, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Tue, Mar 10, 2020 at 09:29:32AM +1300, David Rowley wrote:

There's also some weird looking assumptions that an EquivalenceMember
can only be a Var in create_distinct_paths(). I think you're only
saved from crashing there because a ProjectionPath will be created
atop of the IndexPath to evaluate expressions, in which case you're
not seeing the IndexPath. This results in the optimisation not
working in cases like:

I've read it as "an assumption that an EquivalenceMember can only be a
Var" results in "the optimisation not working in cases like this". But
you've meant that ignoring a ProjectionPath with an IndexPath inside
results in this optimisation not working, right? If so, then everything
is clear, and my apologies, maybe I need to finally fix my sleep
schedule :)

Yes, I was complaining that a ProjectionPath breaks the optimisation
and I don't believe there's any reason that it should.

I believe the way to make that work correctly requires paying
attention to the Path's uniquekeys rather than what type of path it
is.

#51

Andy Fan

zhihui.fan1213@gmail.com

almost 6 years ago

In reply to: David Rowley (#45)

Re: Index Skip Scan

I think the UniqueKeys may need to be changed from using
EquivalenceClasses to use Exprs instead.

When I try to understand why UniqueKeys needs EquivalenceClasses,
see your comments here. I feel that FuncExpr can't be
used to as a UniquePath even we can create unique index on f(a)
and f->strict == true. The reason is even we know a is not null,
f->strict = true. it is still be possible that f(a) == null. unique index
allows more than 1 null values. so shall we move further to use varattrno
instead of Expr? if so, we can also use a list of Bitmapset to present
multi
unique path of a single RelOptInfo.

#52

Andy Fan

zhihui.fan1213@gmail.com

almost 6 years ago

In reply to: James Coleman (#48)

Re: Index Skip Scan

On Tue, Mar 10, 2020 at 4:32 AM James Coleman <jtc331@gmail.com> wrote:

On Mon, Mar 9, 2020 at 3:56 PM Dmitry Dolgov <9erthalion6@gmail.com>
wrote:

Assuming we'll implement it in a way that we do not know about what kind
of path type is that in create_distinct_path, then it can also work for
ProjectionPath or anything else (if UniqueKeys are present). But then
still EquivalenceMember are used only to figure out correct
distinctPrefixKeys and do not affect whether or not skipping is applied.
What do I miss?

Part of the puzzle seems to me to this part of the response:

I think the UniqueKeys may need to be changed from using
EquivalenceClasses to use Exprs instead.

But I can't say I'm being overly helpful by pointing that out, since I
don't have my head in the code enough to understand how you'd
accomplish that :)

There was a dedicated thread [1]/messages/by-id/CAApHDvq7i0=O97r4Y1pv68+tprVczKsXRsV28rM9H-rVPOfeNQ@mail.gmail.com where David explain his idea very
detailed,
and you can also check that messages around that message for the context.
hope it helps.

[1]: /messages/by-id/CAApHDvq7i0=O97r4Y1pv68+tprVczKsXRsV28rM9H-rVPOfeNQ@mail.gmail.com
/messages/by-id/CAApHDvq7i0=O97r4Y1pv68+tprVczKsXRsV28rM9H-rVPOfeNQ@mail.gmail.com

#53

David Rowley

dgrowleyml@gmail.com

almost 6 years ago

In reply to: Andy Fan (#51)

Re: Index Skip Scan

On Wed, 11 Mar 2020 at 16:44, Andy Fan <zhihui.fan1213@gmail.com> wrote:

I think the UniqueKeys may need to be changed from using
EquivalenceClasses to use Exprs instead.

When I try to understand why UniqueKeys needs EquivalenceClasses,
see your comments here. I feel that FuncExpr can't be
used to as a UniquePath even we can create unique index on f(a)
and f->strict == true. The reason is even we know a is not null,
f->strict = true. it is still be possible that f(a) == null. unique index
allows more than 1 null values. so shall we move further to use varattrno
instead of Expr? if so, we can also use a list of Bitmapset to present multi
unique path of a single RelOptInfo.

We do need some method to determine if NULL values are possible. At
the base relation level that can probably be done by checking NOT NULL
constraints and strict base quals. At higher levels, we can use strict
join quals as proofs.

As for bit a Bitmapset of varattnos, that would certainly work well at
the base relation level when there are no unique expression indexes,
but it's not so simple with join relations when the varattnos only
mean something when you know which base relation it comes from. I'm
not saying that Lists of Exprs is ideal, but I think trying to
optimise some code that does not yet exist is premature.

There was some other talk in [1]/messages/by-id/CAKJS1f8v-fUG8YpaAGj309ZuALo3aEk7f6cqMHr_AVwz1fKXug@mail.gmail.com on how we might make checking if a
List contains a given Node. That could be advantageous in a few
places in the query planner, and it might be useful for this too.

[1]: /messages/by-id/CAKJS1f8v-fUG8YpaAGj309ZuALo3aEk7f6cqMHr_AVwz1fKXug@mail.gmail.com

#54

Floris Van Nee

florisvannee@Optiver.com

almost 6 years ago

In reply to: David Rowley (#53)

3 attachment(s)

Hello hackers,

Recently I've put some effort in extending the functionality of this patch. So far, we've been trying to keep the scope of this patch relatively small to DISTINCT-clauses only. The advantage of this approach was that it keeps impact to the indexam api to a minimum. However, given the problems we've been facing in getting the implementation to work correctly in all cases, I started wondering if this implementation was the right direction to go in. My main worry is that the current indexam api for skipping is not suited to other future use cases of skipping, but also that we're already struggling with it now to get it to work correctly in all edge cases.

In the approach taken so far, the amskip function is defined taking two ScanDirection parameters. The function amgettuple is left unchanged. However, I think we need amgettuple to take two ScanDirection parameters as well (or create a separate function amgetskiptuple). This patch proposes that.

Currently, I've just added 'skip' functions to the indexam api for beginscan and gettuple. Maybe it'd be better to just modify the existing functions to take an extra parameter instead. Any thoughts on this?

The result is a patch that can apply skipping in many more cases than previous patches. For example, filtering on columns that are not part of the index, properly handling visibility checks without moving these into the nbtree code, skipping not only on prefix but also on extra conditions that become available (eg. prefix a=1 and we happen to have a WHERE clause with b=200, which we can now use to skip all the way to a=1 AND b=200). There's a fair amount of changes in the nbtree code to support this.

Patch 0001 is Jesper's unique keys patch.
Patch 0002 modifies executor-level code to support skip scans and enables it for DISTINCT queries.
Patch 0003 just provides a very basic planner hack that enables the skip scan for practically all index scans (index, index only and bitmap index). This is not what we would normally want, but this way I could easily test the skip scan code. It's so large because I modify all the test cases expected results that now include an extra line 'Skip scan: All'. The actual code changed is only a few lines in this patch.

The planner part of the code still needs work. The planner code in this patch is similar to the previous patch. David's comments about projection paths haven't been addressed yet. Also, there's no proper way of hooking up the index scan for regular (non-DISTINCT) queries yet. That's why I hacked up patch 0003 just to test stuff.

I'd welcome any help on these patches. If someone with more planner knowledge than me is willing to do part of the planner code, please feel free to do so. I believe supporting this will speed up a large number of queries for all kinds of users. It can be a really powerful feature.

Tomas, would you be willing to repeat the performance tests you did earlier? I believe this version will perform better than the previous patch for the cases where you noticed the 10-20x slow-down. There will obviously still be a performance penalty for cases where the planner picks a skip scan that are not well suited, but I think it'll be smaller.

-Floris

-----
To give a few simple examples:

Initialization:
-- table t1 has 100 unique values for a
-- and 10000 b values for each a
-- very suitable for skip scan
create table t1 as select a,b,b%5 as c, random() as d from generate_series(1, 100) a, generate_series(1,10000) b;
create index on t1 (a,b,c);

-- table t2 has 10000 unique values for a
-- and 100 b values for each a
-- this is not very suitable for skip scan
-- because the next matching value is always either
-- on the current page or on the next page
create table t2 as select a,b,b%5 as c, random() as d from generate_series(1, 10000) a, generate_series(1,100) b;
create index on t2 (a,b,c);

analyze t1;
analyze t2;

-- First 'Execution Time' line is this patched version (0001+0002+0003) (without including 0003, the non-DISTINCT queries would be equal to master)
-- Second 'Execution Time' line is master
-- Third 'Execution Time' is previous skip scan patch version
-- Just ran a couple of times to give an indication
-- on order of magnitude, not a full benchmark.
select distinct on (a) * from t1;
Execution Time: 1.407 ms (current patch)
Execution Time: 480.704 ms (master)
Execution Time: 1.711 ms (previous patch)

select distinct on (a) * from t1 where b > 50;
Execution Time: 1.432 ms
Execution Time: 481.530 ms
Execution Time: 481.206 ms

select distinct on (a) * from t1 where b > 9990;
Execution Time: 1.074 ms
Execution Time: 33.937 ms
Execution Time: 33.115 ms

select distinct on (a) * from t1 where d > 0.5;
Execution Time: 0.811 ms
Execution Time: 446.549 ms
Execution Time: 436.091 ms

select * from t1 where b=50;
Execution Time: 1.111 ms
Execution Time: 33.242 ms
Execution Time: 36.555 ms

select * from t1 where b between 50 and 75 and d > 0.5;
Execution Time: 2.370 ms
Execution Time: 60.744 ms
Execution Time: 62.820 ms

select * from t1 where b in (100, 200);
Execution Time: 2.464 ms
Execution Time: 252.224 ms
Execution Time: 244.872 ms

select * from t1 where b in (select distinct a from t1);
Execution Time: 91.000 ms
Execution Time: 842.969 ms
Execution Time: 386.871 ms

select distinct on (a) * from t2;
Execution Time: 47.155 ms
Execution Time: 714.102 ms
Execution Time: 56.327 ms

select distinct on (a) * from t2 where b > 5;
Execution Time: 60.100 ms
Execution Time: 709.960 ms
Execution Time: 727.949 ms

select distinct on (a) * from t2 where b > 95;
Execution Time: 55.420 ms
Execution Time: 71.007 ms
Execution Time: 69.229 ms

select distinct on (a) * from t2 where d > 0.5;
Execution Time: 49.254 ms
Execution Time: 719.820 ms
Execution Time: 705.991 ms

-- slight performance degradation here compared to regular index scan
-- due to data unfavorable data distribution
select * from t2 where b=50;
Execution Time: 47.603 ms
Execution Time: 37.327 ms
Execution Time: 40.448 ms

select * from t2 where b between 50 and 75 and d > 0.5;
Execution Time: 244.546 ms
Execution Time: 228.579 ms
Execution Time: 227.541 ms

select * from t2 where b in (100, 200);
Execution Time: 64.021 ms
Execution Time: 242.905 ms
Execution Time: 258.864 ms

select * from t2 where b in (select distinct a from t2);
Execution Time: 758.350 ms
Execution Time: 1271.230 ms
Execution Time: 761.311 ms

I wrote a few things here about the method as well:
https://github.com/fvannee/postgres/wiki/Index-Skip-Scan
Code can be found there on Github as well in branch 'skip-scan'

Attachments:

0001-Unique-key.patchapplication/octet-stream; name=0001-Unique-key.patchDownload

From fd48c4a0067c1c96a2b53fd162bbe9456a9608dd Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Tue, 9 Jul 2019 06:44:57 -0400
Subject: [PATCH 1/3] Unique key

Design by David Rowley.

Author: Jesper Pedersen
---
 src/backend/nodes/outfuncs.c           |  14 +++
 src/backend/nodes/print.c              |  39 +++++++
 src/backend/optimizer/path/Makefile    |   3 +-
 src/backend/optimizer/path/allpaths.c  |   8 ++
 src/backend/optimizer/path/indxpath.c  |  41 +++++++
 src/backend/optimizer/path/pathkeys.c  |  71 ++++++++++--
 src/backend/optimizer/path/uniquekey.c | 147 +++++++++++++++++++++++++
 src/backend/optimizer/plan/planagg.c   |   1 +
 src/backend/optimizer/plan/planmain.c  |   1 +
 src/backend/optimizer/plan/planner.c   |  17 ++-
 src/backend/optimizer/util/pathnode.c  |  12 ++
 src/include/nodes/nodes.h              |   1 +
 src/include/nodes/pathnodes.h          |  18 +++
 src/include/nodes/print.h              |   1 +
 src/include/optimizer/pathnode.h       |   1 +
 src/include/optimizer/paths.h          |  11 ++
 16 files changed, 373 insertions(+), 13 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekey.c

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 89d00444ed..82fcabd9ee 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1724,6 +1724,7 @@ _outPathInfo(StringInfo str, const Path *node)
 	WRITE_FLOAT_FIELD(startup_cost, "%.2f");
 	WRITE_FLOAT_FIELD(total_cost, "%.2f");
 	WRITE_NODE_FIELD(pathkeys);
+	WRITE_NODE_FIELD(uniquekeys);
 }
 
 /*
@@ -2208,6 +2209,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(eq_classes);
 	WRITE_BOOL_FIELD(ec_merging_done);
 	WRITE_NODE_FIELD(canon_pathkeys);
+	WRITE_NODE_FIELD(canon_uniquekeys);
 	WRITE_NODE_FIELD(left_join_clauses);
 	WRITE_NODE_FIELD(right_join_clauses);
 	WRITE_NODE_FIELD(full_join_clauses);
@@ -2217,6 +2219,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(placeholder_list);
 	WRITE_NODE_FIELD(fkey_list);
 	WRITE_NODE_FIELD(query_pathkeys);
+	WRITE_NODE_FIELD(query_uniquekeys);
 	WRITE_NODE_FIELD(group_pathkeys);
 	WRITE_NODE_FIELD(window_pathkeys);
 	WRITE_NODE_FIELD(distinct_pathkeys);
@@ -2404,6 +2407,14 @@ _outPathKey(StringInfo str, const PathKey *node)
 	WRITE_BOOL_FIELD(pk_nulls_first);
 }
 
+static void
+_outUniqueKey(StringInfo str, const UniqueKey *node)
+{
+	WRITE_NODE_TYPE("UNIQUEKEY");
+
+	WRITE_NODE_FIELD(eq_clause);
+}
+
 static void
 _outPathTarget(StringInfo str, const PathTarget *node)
 {
@@ -4097,6 +4108,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PathKey:
 				_outPathKey(str, obj);
 				break;
+			case T_UniqueKey:
+				_outUniqueKey(str, obj);
+				break;
 			case T_PathTarget:
 				_outPathTarget(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 42476724d8..d286b34544 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -459,6 +459,45 @@ print_pathkeys(const List *pathkeys, const List *rtable)
 	printf(")\n");
 }
 
+/*
+ * print_uniquekeys -
+ *	  uniquekeys list of UniqueKeys
+ */
+void
+print_uniquekeys(const List *uniquekeys, const List *rtable)
+{
+	ListCell   *l;
+
+	printf("(");
+	foreach(l, uniquekeys)
+	{
+		UniqueKey *unique_key = (UniqueKey *) lfirst(l);
+		EquivalenceClass *eclass = (EquivalenceClass *) unique_key->eq_clause;
+		ListCell   *k;
+		bool		first = true;
+
+		/* chase up */
+		while (eclass->ec_merged)
+			eclass = eclass->ec_merged;
+
+		printf("(");
+		foreach(k, eclass->ec_members)
+		{
+			EquivalenceMember *mem = (EquivalenceMember *) lfirst(k);
+
+			if (first)
+				first = false;
+			else
+				printf(", ");
+			print_expr((Node *) mem->em_expr, rtable);
+		}
+		printf(")");
+		if (lnext(uniquekeys, l))
+			printf(", ");
+	}
+	printf(")\n");
+}
+
 /*
  * print_tl
  *	  print targetlist in a more legible way.
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..63cc1505d9 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekey.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 905bbe77d8..e98ab4eada 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3954,6 +3954,14 @@ print_path(PlannerInfo *root, Path *path, int indent)
 		print_pathkeys(path->pathkeys, root->parse->rtable);
 	}
 
+	if (path->uniquekeys)
+	{
+		for (i = 0; i < indent; i++)
+			printf("\t");
+		printf("  uniquekeys: ");
+		print_uniquekeys(path->uniquekeys, root->parse->rtable);
+	}
+
 	if (join)
 	{
 		JoinPath   *jp = (JoinPath *) path;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2a50272da6..bd1ea53e5c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -189,6 +189,7 @@ static Expr *match_clause_to_ordering_op(IndexOptInfo *index,
 static bool ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 									   EquivalenceClass *ec, EquivalenceMember *em,
 									   void *arg);
+static List *get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys);
 
 
 /*
@@ -874,6 +875,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	List	   *orderbyclausecols;
 	List	   *index_pathkeys;
 	List	   *useful_pathkeys;
+	List	   *useful_uniquekeys = NIL;
 	bool		found_lower_saop_clause;
 	bool		pathkeys_possibly_useful;
 	bool		index_is_ordered;
@@ -1036,11 +1038,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	if (index_clauses != NIL || useful_pathkeys != NIL || useful_predicate ||
 		index_only_scan)
 	{
+		if (has_useful_uniquekeys(root))
+			useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 		ipath = create_index_path(root, index,
 								  index_clauses,
 								  orderbyclauses,
 								  orderbyclausecols,
 								  useful_pathkeys,
+								  useful_uniquekeys,
 								  index_is_ordered ?
 								  ForwardScanDirection :
 								  NoMovementScanDirection,
@@ -1063,6 +1069,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 									  orderbyclauses,
 									  orderbyclausecols,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  index_is_ordered ?
 									  ForwardScanDirection :
 									  NoMovementScanDirection,
@@ -1093,11 +1100,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 													index_pathkeys);
 		if (useful_pathkeys != NIL)
 		{
+			if (has_useful_uniquekeys(root))
+				useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 			ipath = create_index_path(root, index,
 									  index_clauses,
 									  NIL,
 									  NIL,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  BackwardScanDirection,
 									  index_only_scan,
 									  outer_relids,
@@ -1115,6 +1126,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 										  NIL,
 										  NIL,
 										  useful_pathkeys,
+										  useful_uniquekeys,
 										  BackwardScanDirection,
 										  index_only_scan,
 										  outer_relids,
@@ -3365,6 +3377,35 @@ match_clause_to_ordering_op(IndexOptInfo *index,
 	return clause;
 }
 
+/*
+ * get_uniquekeys_for_index
+ */
+static List *
+get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys)
+{
+	ListCell *lc;
+
+	if (pathkeys)
+	{
+		List *uniquekeys = NIL;
+		foreach(lc, pathkeys)
+		{
+			UniqueKey *unique_key;
+			PathKey *pk = (PathKey *) lfirst(lc);
+			EquivalenceClass *ec = (EquivalenceClass *) pk->pk_eclass;
+
+			unique_key = makeNode(UniqueKey);
+			unique_key->eq_clause = ec;
+
+			lappend(uniquekeys, unique_key);
+		}
+
+		if (uniquekeys_contained_in(root->canon_uniquekeys, uniquekeys))
+			return uniquekeys;
+	}
+
+	return NIL;
+}
 
 /****************************************************************************
  *				----  ROUTINES TO DO PARTIAL INDEX PREDICATE TESTS	----
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 71b9d42c99..054df9a617 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -29,6 +29,7 @@
 #include "utils/lsyscache.h"
 
 
+static bool pathkey_is_unique(PathKey *new_pathkey, List *pathkeys);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
 											 RelOptInfo *partrel,
@@ -96,6 +97,29 @@ make_canonical_pathkey(PlannerInfo *root,
 	return pk;
 }
 
+/*
+ * pathkey_is_unique
+ *	   Checks if the new pathkey's equivalence class is the same as that of
+ *     any existing member of the pathkey list.
+ */
+static bool
+pathkey_is_unique(PathKey *new_pathkey, List *pathkeys)
+{
+	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
+	ListCell   *lc;
+
+	/* If same EC already is already in the list, then not unique */
+	foreach(lc, pathkeys)
+	{
+		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
+
+		if (new_ec == old_pathkey->pk_eclass)
+			return false;
+	}
+
+	return true;
+}
+
 /*
  * pathkey_is_redundant
  *	   Is a pathkey redundant with one already in the given list?
@@ -135,22 +159,12 @@ static bool
 pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys)
 {
 	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
-	ListCell   *lc;
 
 	/* Check for EC containing a constant --- unconditionally redundant */
 	if (EC_MUST_BE_REDUNDANT(new_ec))
 		return true;
 
-	/* If same EC already used in list, then redundant */
-	foreach(lc, pathkeys)
-	{
-		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
-
-		if (new_ec == old_pathkey->pk_eclass)
-			return true;
-	}
-
-	return false;
+	return !pathkey_is_unique(new_pathkey, pathkeys);
 }
 
 /*
@@ -1098,6 +1112,41 @@ make_pathkeys_for_sortclauses(PlannerInfo *root,
 	return pathkeys;
 }
 
+/*
+ * make_pathkeys_for_uniquekeyclauses
+ *		Generate a pathkeys list to be used for uniquekey clauses
+ */
+List *
+make_pathkeys_for_uniquekeys(PlannerInfo *root,
+							 List *sortclauses,
+							 List *tlist)
+{
+	List	   *pathkeys = NIL;
+	ListCell   *l;
+
+	foreach(l, sortclauses)
+	{
+		SortGroupClause *sortcl = (SortGroupClause *) lfirst(l);
+		Expr	   *sortkey;
+		PathKey    *pathkey;
+
+		sortkey = (Expr *) get_sortgroupclause_expr(sortcl, tlist);
+		Assert(OidIsValid(sortcl->sortop));
+		pathkey = make_pathkey_from_sortop(root,
+										   sortkey,
+										   root->nullable_baserels,
+										   sortcl->sortop,
+										   sortcl->nulls_first,
+										   sortcl->tleSortGroupRef,
+										   true);
+
+		if (pathkey_is_unique(pathkey, pathkeys))
+			pathkeys = lappend(pathkeys, pathkey);
+	}
+
+	return pathkeys;
+}
+
 /****************************************************************************
  *		PATHKEYS AND MERGECLAUSES
  ****************************************************************************/
diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c
new file mode 100644
index 0000000000..13d4ebb98c
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekey.c
@@ -0,0 +1,147 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekey.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekey.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "nodes/pg_list.h"
+
+static UniqueKey *make_canonical_uniquekey(PlannerInfo *root, EquivalenceClass *eclass);
+
+/*
+ * Build a list of unique keys
+ */
+List*
+build_uniquekeys(PlannerInfo *root, List *sortclauses)
+{
+	List *result = NIL;
+	List *sortkeys;
+	ListCell *l;
+
+	sortkeys = make_pathkeys_for_uniquekeys(root,
+											sortclauses,
+											root->processed_tlist);
+
+	/* Create a uniquekey and add it to the list */
+	foreach(l, sortkeys)
+	{
+		PathKey    *pathkey = (PathKey *) lfirst(l);
+		EquivalenceClass *ec = pathkey->pk_eclass;
+		UniqueKey *unique_key = make_canonical_uniquekey(root, ec);
+
+		result = lappend(result, unique_key);
+	}
+
+	return result;
+}
+
+/*
+ * uniquekeys_contained_in
+ *	  Are the keys2 included in the keys1 superset
+ */
+bool
+uniquekeys_contained_in(List *keys1, List *keys2)
+{
+	ListCell   *key1,
+			   *key2;
+
+	/*
+	 * Fall out quickly if we are passed two identical lists.  This mostly
+	 * catches the case where both are NIL, but that's common enough to
+	 * warrant the test.
+	 */
+	if (keys1 == keys2)
+		return true;
+
+	foreach(key2, keys2)
+	{
+		bool found = false;
+		UniqueKey  *uniquekey2 = (UniqueKey *) lfirst(key2);
+
+		foreach(key1, keys1)
+		{
+			UniqueKey  *uniquekey1 = (UniqueKey *) lfirst(key1);
+
+			if (uniquekey1->eq_clause == uniquekey2->eq_clause)
+			{
+				found = true;
+				break;
+			}
+		}
+
+		if (!found)
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * has_useful_uniquekeys
+ *		Detect whether the planner could have any uniquekeys that are
+ *		useful.
+ */
+bool
+has_useful_uniquekeys(PlannerInfo *root)
+{
+	if (root->query_uniquekeys != NIL)
+		return true;	/* there are some */
+	return false;		/* definitely useless */
+}
+
+/*
+ * make_canonical_uniquekey
+ *	  Given the parameters for a UniqueKey, find any pre-existing matching
+ *	  uniquekey in the query's list of "canonical" uniquekeys.  Make a new
+ *	  entry if there's not one already.
+ *
+ * Note that this function must not be used until after we have completed
+ * merging EquivalenceClasses.  (We don't try to enforce that here; instead,
+ * equivclass.c will complain if a merge occurs after root->canon_uniquekeys
+ * has become nonempty.)
+ */
+static UniqueKey *
+make_canonical_uniquekey(PlannerInfo *root,
+						 EquivalenceClass *eclass)
+{
+	UniqueKey  *uk;
+	ListCell   *lc;
+	MemoryContext oldcontext;
+
+	/* The passed eclass might be non-canonical, so chase up to the top */
+	while (eclass->ec_merged)
+		eclass = eclass->ec_merged;
+
+	foreach(lc, root->canon_uniquekeys)
+	{
+		uk = (UniqueKey *) lfirst(lc);
+		if (eclass == uk->eq_clause)
+			return uk;
+	}
+
+	/*
+	 * Be sure canonical uniquekeys are allocated in the main planning context.
+	 * Not an issue in normal planning, but it is for GEQO.
+	 */
+	oldcontext = MemoryContextSwitchTo(root->planner_cxt);
+
+	uk = makeNode(UniqueKey);
+	uk->eq_clause = eclass;
+
+	root->canon_uniquekeys = lappend(root->canon_uniquekeys, uk);
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return uk;
+}
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 8634940efc..dd64775d8f 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -511,6 +511,7 @@ minmax_qp_callback(PlannerInfo *root, void *extra)
 									  root->parse->targetList);
 
 	root->query_pathkeys = root->sort_pathkeys;
+	root->query_uniquekeys = NIL;
 }
 
 /*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..3a372af91b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -70,6 +70,7 @@ query_planner(PlannerInfo *root,
 	root->join_rel_level = NULL;
 	root->join_cur_level = 0;
 	root->canon_pathkeys = NIL;
+	root->canon_uniquekeys = NIL;
 	root->left_join_clauses = NIL;
 	root->right_join_clauses = NIL;
 	root->full_join_clauses = NIL;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5da0528382..6a7b55abd2 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3654,15 +3654,30 @@ standard_qp_callback(PlannerInfo *root, void *extra)
 	 * much easier, since we know that the parser ensured that one is a
 	 * superset of the other.
 	 */
+	root->query_uniquekeys = NIL;
+
 	if (root->group_pathkeys)
+	{
 		root->query_pathkeys = root->group_pathkeys;
+
+		if (!root->parse->hasAggs)
+			root->query_uniquekeys = build_uniquekeys(root, qp_extra->groupClause);
+	}
 	else if (root->window_pathkeys)
 		root->query_pathkeys = root->window_pathkeys;
 	else if (list_length(root->distinct_pathkeys) >
 			 list_length(root->sort_pathkeys))
+	{
 		root->query_pathkeys = root->distinct_pathkeys;
+		root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else if (root->sort_pathkeys)
+	{
 		root->query_pathkeys = root->sort_pathkeys;
+
+		if (root->distinct_pathkeys)
+			root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else
 		root->query_pathkeys = NIL;
 }
@@ -6215,7 +6230,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
 
 	/* Estimate the cost of index scan */
 	indexScanPath = create_index_path(root, indexInfo,
-									  NIL, NIL, NIL, NIL,
+									  NIL, NIL, NIL, NIL, NIL,
 									  ForwardScanDirection, false,
 									  NULL, 1.0, false);
 
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 8ba8122ee2..278436f102 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -940,6 +940,7 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = parallel_workers;
 	pathnode->pathkeys = NIL;	/* seqscan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_seqscan(pathnode, root, rel, pathnode->param_info);
 
@@ -964,6 +965,7 @@ create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* samplescan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_samplescan(pathnode, root, rel, pathnode->param_info);
 
@@ -1000,6 +1002,7 @@ create_index_path(PlannerInfo *root,
 				  List *indexorderbys,
 				  List *indexorderbycols,
 				  List *pathkeys,
+				  List *uniquekeys,
 				  ScanDirection indexscandir,
 				  bool indexonly,
 				  Relids required_outer,
@@ -1018,6 +1021,7 @@ create_index_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = 0;
 	pathnode->path.pathkeys = pathkeys;
+	pathnode->path.uniquekeys = uniquekeys;
 
 	pathnode->indexinfo = index;
 	pathnode->indexclauses = indexclauses;
@@ -1061,6 +1065,7 @@ create_bitmap_heap_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_degree;
 	pathnode->path.pathkeys = NIL;	/* always unordered */
+	pathnode->path.uniquekeys = NIL;
 
 	pathnode->bitmapqual = bitmapqual;
 
@@ -1923,6 +1928,7 @@ create_functionscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = pathkeys;
+	pathnode->uniquekeys = NIL;
 
 	cost_functionscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1949,6 +1955,7 @@ create_tablefuncscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_tablefuncscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1975,6 +1982,7 @@ create_valuesscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_valuesscan(pathnode, root, rel, pathnode->param_info);
 
@@ -2000,6 +2008,7 @@ create_ctescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* XXX for now, result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2026,6 +2035,7 @@ create_namedtuplestorescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_namedtuplestorescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2052,6 +2062,7 @@ create_resultscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_resultscan(pathnode, root, rel, pathnode->param_info);
 
@@ -2078,6 +2089,7 @@ create_worktablescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	/* Cost is the same as for a regular CTE scan */
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 8a76afe8cc..679cc4cc9c 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ceb809644..d4816c180d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -269,6 +269,8 @@ struct PlannerInfo
 
 	List	   *canon_pathkeys; /* list of "canonical" PathKeys */
 
+	List	   *canon_uniquekeys; /* list of "canonical" UniqueKeys */
+
 	List	   *left_join_clauses;	/* list of RestrictInfos for mergejoinable
 									 * outer join clauses w/nonnullable var on
 									 * left */
@@ -297,6 +299,8 @@ struct PlannerInfo
 
 	List	   *query_pathkeys; /* desired pathkeys for query_planner() */
 
+	List	   *query_uniquekeys; /* unique keys used for the query */
+
 	List	   *group_pathkeys; /* groupClause pathkeys, if any */
 	List	   *window_pathkeys;	/* pathkeys of bottom window, if any */
 	List	   *distinct_pathkeys;	/* distinctClause pathkeys, if any */
@@ -1077,6 +1081,15 @@ typedef struct ParamPathInfo
 	List	   *ppi_clauses;	/* join clauses available from outer rels */
 } ParamPathInfo;
 
+/*
+ * UniqueKey
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+
+	EquivalenceClass *eq_clause;	/* equivalence class */
+} UniqueKey;
 
 /*
  * Type "Path" is used as-is for sequential-scan paths, as well as some other
@@ -1106,6 +1119,9 @@ typedef struct ParamPathInfo
  *
  * "pathkeys" is a List of PathKey nodes (see above), describing the sort
  * ordering of the path's output rows.
+ *
+ * "uniquekeys", if not NIL, is a list of UniqueKey nodes (see above),
+ * describing the XXX.
  */
 typedef struct Path
 {
@@ -1129,6 +1145,8 @@ typedef struct Path
 
 	List	   *pathkeys;		/* sort ordering of path's output */
 	/* pathkeys is a List of PathKey nodes; see above */
+
+	List	   *uniquekeys;	/* the unique keys, or NIL if none */
 } Path;
 
 /* Macro for extracting a path's parameterization relids; beware double eval */
diff --git a/src/include/nodes/print.h b/src/include/nodes/print.h
index 6126b491bf..006248bfb5 100644
--- a/src/include/nodes/print.h
+++ b/src/include/nodes/print.h
@@ -28,6 +28,7 @@ extern char *pretty_format_node_dump(const char *dump);
 extern void print_rt(const List *rtable);
 extern void print_expr(const Node *expr, const List *rtable);
 extern void print_pathkeys(const List *pathkeys, const List *rtable);
+extern void print_uniquekeys(const List *uniquekeys, const List *rtable);
 extern void print_tl(const List *tlist, const List *rtable);
 extern void print_slot(TupleTableSlot *slot);
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e450fe112a..f75ff6f323 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -44,6 +44,7 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									List *indexorderbys,
 									List *indexorderbycols,
 									List *pathkeys,
+									List *uniquekeys,
 									ScanDirection indexscandir,
 									bool indexonly,
 									Relids required_outer,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ab73bd20c..5b6be383b3 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -214,6 +214,9 @@ extern List *build_join_pathkeys(PlannerInfo *root,
 extern List *make_pathkeys_for_sortclauses(PlannerInfo *root,
 										   List *sortclauses,
 										   List *tlist);
+extern List *make_pathkeys_for_uniquekeys(PlannerInfo *root,
+										  List *sortclauses,
+										  List *tlist);
 extern void initialize_mergeclause_eclasses(PlannerInfo *root,
 											RestrictInfo *restrictinfo);
 extern void update_mergeclause_eclasses(PlannerInfo *root,
@@ -240,4 +243,12 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+/*
+ * uniquekey.c
+ *	  Utilities for matching and building unique keys
+ */
+extern List *build_uniquekeys(PlannerInfo *root, List *sortclauses);
+extern bool uniquekeys_contained_in(List *keys1, List *keys2);
+extern bool has_useful_uniquekeys(PlannerInfo *root);
+
 #endif							/* PATHS_H */
-- 
2.25.0

0002-Index-skip-scan.patchapplication/octet-stream; name=0002-Index-skip-scan.patchDownload

From e7423f550a5cfe8e08f70f0e709b46a2343a74ee Mon Sep 17 00:00:00 2001
From: Floris van Nee <floris.vannee@gmail.com>
Date: Fri, 15 Nov 2019 09:46:53 -0500
Subject: [PATCH 2/3] Index skip scan

Implementation of Index Skip Scan (see Loose Index Scan in the wiki [1])
as part of the IndexOnlyScan, IndexScan and BitmapIndexScan for nbtree.
This patch improves performance of two main types of queries significantly:
- SELECT DISTINCT, SELECT DISTINCT ON
- Regular SELECTs with WHERE-clauses on non-leading index attributes
For example, given an nbtree index on three columns (a,b,c), the following queries
may now be significantly faster:
- SELECT DISTINCT ON (a) * FROM t1
- SELECT * FROM t1 WHERE b=2
- SELECT * FROM t1 WHERE b IN (10,40)
- SELECT DISTINCT ON (a,b) * FROM t1 WHERE c BETWEEN 10 AND 100 ORDER BY a,b,c

Original patch and design were proposed by Thomas Munro [2], revived and
improved by Dmitry Dolgov and Jesper Pedersen. Further enhanced functionality
added by Floris van Nee regarding a more general and performant skip implementation.

[1] https://wiki.postgresql.org/wiki/Loose_indexscan
[2] https://www.postgresql.org/message-id/flat/CADLWmXXbTSBxP-MzJuPAYSsL_2f0iPm5VWPbCvDbVvfX93FKkw%40mail.gmail.com

Author: Floris van Nee, Jesper Pedersen, Dmitry Dolgov
Reviewed-by: Thomas Munro, David Rowley, Kyotaro Horiguchi, Tomas Vondra, Peter Geoghegan
---
 contrib/amcheck/verify_nbtree.c               |    4 +-
 contrib/bloom/blutils.c                       |    3 +
 doc/src/sgml/config.sgml                      |   15 +
 doc/src/sgml/indexam.sgml                     |  121 ++
 doc/src/sgml/indices.sgml                     |   28 +
 src/backend/access/brin/brin.c                |    3 +
 src/backend/access/gin/ginutil.c              |    3 +
 src/backend/access/gist/gist.c                |    3 +
 src/backend/access/hash/hash.c                |    3 +
 src/backend/access/index/indexam.c            |  163 ++
 src/backend/access/nbtree/Makefile            |    1 +
 src/backend/access/nbtree/nbtinsert.c         |    2 +-
 src/backend/access/nbtree/nbtpage.c           |    2 +-
 src/backend/access/nbtree/nbtree.c            |   56 +-
 src/backend/access/nbtree/nbtsearch.c         |  788 ++++------
 src/backend/access/nbtree/nbtskip.c           | 1317 +++++++++++++++++
 src/backend/access/nbtree/nbtsort.c           |    2 +-
 src/backend/access/nbtree/nbtutils.c          |  821 +++++++++-
 src/backend/access/spgist/spgutils.c          |    3 +
 src/backend/commands/explain.c                |   29 +
 src/backend/executor/execScan.c               |   35 +-
 src/backend/executor/nodeBitmapIndexscan.c    |   21 +-
 src/backend/executor/nodeIndexonlyscan.c      |   69 +-
 src/backend/executor/nodeIndexscan.c          |   71 +-
 src/backend/nodes/copyfuncs.c                 |    5 +
 src/backend/nodes/outfuncs.c                  |    6 +
 src/backend/nodes/readfuncs.c                 |    5 +
 src/backend/optimizer/path/costsize.c         |    1 +
 src/backend/optimizer/plan/createplan.c       |   38 +-
 src/backend/optimizer/plan/planner.c          |   64 +
 src/backend/optimizer/util/pathnode.c         |   40 +
 src/backend/optimizer/util/plancat.c          |    3 +
 src/backend/utils/misc/guc.c                  |    9 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/backend/utils/sort/tuplesort.c            |    4 +-
 src/include/access/amapi.h                    |   19 +
 src/include/access/genam.h                    |   16 +
 src/include/access/nbtree.h                   |  140 +-
 src/include/executor/executor.h               |    4 +
 src/include/nodes/execnodes.h                 |    7 +
 src/include/nodes/pathnodes.h                 |    6 +
 src/include/nodes/plannodes.h                 |    5 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    5 +
 src/interfaces/libpq/encnames.c               |    1 +
 src/interfaces/libpq/wchar.c                  |    1 +
 src/test/regress/expected/select_distinct.out |  601 ++++++++
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/sql/select_distinct.sql      |  248 ++++
 49 files changed, 4254 insertions(+), 542 deletions(-)
 create mode 100644 src/backend/access/nbtree/nbtskip.c
 create mode 120000 src/interfaces/libpq/encnames.c
 create mode 120000 src/interfaces/libpq/wchar.c

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index ceaaa27168..553965beba 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -2504,7 +2504,7 @@ bt_rootdescend(BtreeCheckState *state, IndexTuple itup)
 	Buffer		lbuf;
 	bool		exists;
 
-	key = _bt_mkscankey(state->rel, itup);
+	key = _bt_mkscankey(state->rel, itup, NULL);
 	Assert(key->heapkeyspace && key->scantid != NULL);
 
 	/*
@@ -2936,7 +2936,7 @@ bt_mkscankey_pivotsearch(Relation rel, IndexTuple itup)
 {
 	BTScanInsert skey;
 
-	skey = _bt_mkscankey(rel, itup);
+	skey = _bt_mkscankey(rel, itup, NULL);
 	skey->pivotsearch = true;
 
 	return skey;
diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 0104d02f67..f7bdfc959a 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -133,6 +133,9 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = blbulkdelete;
 	amroutine->amvacuumcleanup = blvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
+	amroutine->ambeginskipscan = NULL;
+	amroutine->amgetskiptuple = NULL;
 	amroutine->amcostestimate = blcostestimate;
 	amroutine->amoptions = bloptions;
 	amroutine->amproperty = NULL;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 9cc5281f01..7624110ea4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4597,6 +4597,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-indexskipscan" xreflabel="enable_indexskipscan">
+      <term><varname>enable_indexskipscan</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_indexskipscan</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of index-skip-scan plan
+        types (see <xref linkend="indexes-index-skip-scans"/>). The default is
+        <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-material" xreflabel="enable_material">
       <term><varname>enable_material</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 37f8d8760a..d1c19d0d51 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -148,6 +148,9 @@ typedef struct IndexAmRoutine
     amendscan_function amendscan;
     ammarkpos_function ammarkpos;       /* can be NULL */
     amrestrpos_function amrestrpos;     /* can be NULL */
+    amskip_function amskip;                        /* can be NULL */
+    ambeginscan_skip_function ambeginskipscan;     /* can be NULL */
+    amgettuple_with_skip_function amgetskiptuple;  /* can be NULL */
 
     /* interface functions to support parallel index scans */
     amestimateparallelscan_function amestimateparallelscan;    /* can be NULL */
@@ -691,6 +694,124 @@ amrestrpos (IndexScanDesc scan);
 
   <para>
 <programlisting>
+bool
+amskip (IndexScanDesc scan,
+        ScanDirection prefixDir,
+	ScanDirection postfixDir);
+</programlisting>
+  Skip past all tuples where the first 'prefix' columns have the same value as
+  the last tuple returned in the current scan. The arguments are:
+
+   <variablelist>
+    <varlistentry>
+     <term><parameter>scan</parameter></term>
+     <listitem>
+      <para>
+       Index scan information
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>prefixDir</parameter></term>
+     <listitem>
+      <para>
+       The direction in which the prefix part of the tuple is advancing.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>postfixDir</parameter></term>
+     <listitem>
+      <para>
+        The direction in which the postfix (everything after the prefix) of the tuple is advancing.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+
+  </para>
+
+  <para>
+<programlisting>
+
+<programlisting>
+IndexScanDesc
+ambeginscan_skip (Relation indexRelation,
+             int nkeys,
+	     int norderbys,
+	     int prefix);
+</programlisting>
+   Prepare for an index scan.  The <literal>nkeys</literal> and <literal>norderbys</literal>
+   parameters indicate the number of quals and ordering operators that will be
+   used in the scan; these may be useful for space allocation purposes.
+   Note that the actual values of the scan keys aren't provided yet.
+   The result must be a palloc'd struct.
+   For implementation reasons the index access method
+   <emphasis>must</emphasis> create this struct by calling
+   <function>RelationGetIndexScan()</function>.  In most cases
+   <function>ambeginscan</function> does little beyond making that call and perhaps
+   acquiring locks;
+   the interesting parts of index-scan startup are in <function>amrescan</function>.
+   If this is a skip scan, prefix must indicate the length of the prefix that can be
+   skipped over. Prefix can be set to -1 to disable skipping, which will yield an
+   identical scan to a regular call to <function>ambeginscan</function>.
+  </para>
+  <programlisting>
+  boolean
+  amgettuple_skip (IndexScanDesc scan,
+              ScanDirection prefixDir,
+	      ScanDirection postfixDir);
+  </programlisting>
+     Fetch the next tuple in the given scan, moving in the given
+     directions. Directions are specified by the direction of the prefix we're moving in,
+     of which the size of the prefix has been specified in the <function>btbeginscan_skip</function>
+     call. This direction can be different in DISTINCT scans when fetching backwards
+     from a cursor.
+     Returns true if a tuple was
+     obtained, false if no matching tuples remain.  In the true case the tuple
+     TID is stored into the <literal>scan</literal> structure.  Note that
+     <quote>success</quote> means only that the index contains an entry that matches
+     the scan keys, not that the tuple necessarily still exists in the heap or
+     will pass the caller's snapshot test.  On success, <function>amgettuple</function>
+     must also set <literal>scan-&gt;xs_recheck</literal> to true or false.
+     False means it is certain that the index entry matches the scan keys.
+     true means this is not certain, and the conditions represented by the
+     scan keys must be rechecked against the heap tuple after fetching it.
+     This provision supports <quote>lossy</quote> index operators.
+     Note that rechecking will extend only to the scan conditions; a partial
+     index predicate (if any) is never rechecked by <function>amgettuple</function>
+     callers.
+    </para>
+
+    <para>
+     If the index supports <link linkend="indexes-index-only-scans">index-only
+     scans</link> (i.e., <function>amcanreturn</function> returns true for it),
+     then on success the AM must also check <literal>scan-&gt;xs_want_itup</literal>,
+     and if that is true it must return the originally indexed data for the
+     index entry.  The data can be returned in the form of an
+     <structname>IndexTuple</structname> pointer stored at <literal>scan-&gt;xs_itup</literal>,
+     with tuple descriptor <literal>scan-&gt;xs_itupdesc</literal>; or in the form of
+     a <structname>HeapTuple</structname> pointer stored at <literal>scan-&gt;xs_hitup</literal>,
+     with tuple descriptor <literal>scan-&gt;xs_hitupdesc</literal>.  (The latter
+     format should be used when reconstructing data that might possibly not fit
+     into an <structname>IndexTuple</structname>.)  In either case,
+     management of the data referenced by the pointer is the access method's
+     responsibility.  The data must remain good at least until the next
+     <function>amgettuple</function>, <function>amrescan</function>, or <function>amendscan</function>
+     call for the scan.
+    </para>
+
+    <para>
+     The <function>amgettuple</function> function need only be provided if the access
+     method supports <quote>plain</quote> index scans.  If it doesn't, the
+     <structfield>amgettuple</structfield> field in its <structname>IndexAmRoutine</structname>
+     struct must be set to NULL.
+    </para>
+
+    <para>
+<programlisting>
 Size
 amestimateparallelscan (void);
 </programlisting>
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 86539a781c..b4349039e7 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1298,6 +1298,34 @@ SELECT target FROM tests WHERE subject = 'some-subject' AND success;
    and later will recognize such cases and allow index-only scans to be
    generated, but older versions will not.
   </para>
+
+  <sect2 id="indexes-index-skip-scans">
+    <title>Index Skip Scans</title>
+
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index</primary>
+      <secondary>index-skip scans</secondary>
+    </indexterm>
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index-skip scan</primary>
+    </indexterm>
+
+    <para>
+     When the rows retrieved from an index scan are then deduplicated by
+     eliminating rows matching on a prefix of index keys (e.g. when using
+     <literal>SELECT DISTINCT</literal>), the planner will consider
+     skipping groups of rows with a matching key prefix. When a row with
+     a particular prefix is found, remaining rows with the same key prefix
+     are skipped.  The larger the number of rows with the same key prefix
+     rows (i.e. the lower the number of distinct key prefixes in the index),
+     the more efficient this is.
+    </para>
+    <para>
+      Additionally, a skip scan can be considered in regular <literal>SELECT</literal>
+      queries. When filtering on an non-leading attribute of an index, the planner
+      may choose a skip scan.
+    </para>
+  </sect2>
  </sect1>
 
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index c481838389..94440781cf 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -114,6 +114,9 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = brinbulkdelete;
 	amroutine->amvacuumcleanup = brinvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
+	amroutine->ambeginskipscan = NULL;
+	amroutine->amgetskiptuple = NULL;
 	amroutine->amcostestimate = brincostestimate;
 	amroutine->amoptions = brinoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index a7e55caf28..ffbac1d1b8 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -65,6 +65,9 @@ ginhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = ginbulkdelete;
 	amroutine->amvacuumcleanup = ginvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
+	amroutine->ambeginskipscan = NULL;
+	amroutine->amgetskiptuple = NULL;
 	amroutine->amcostestimate = gincostestimate;
 	amroutine->amoptions = ginoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 90c46e86a1..0d3691324c 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -86,6 +86,9 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = gistbulkdelete;
 	amroutine->amvacuumcleanup = gistvacuumcleanup;
 	amroutine->amcanreturn = gistcanreturn;
+	amroutine->amskip = NULL;
+	amroutine->ambeginskipscan = NULL;
+	amroutine->amgetskiptuple = NULL;
 	amroutine->amcostestimate = gistcostestimate;
 	amroutine->amoptions = gistoptions;
 	amroutine->amproperty = gistproperty;
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 4871b7ff4d..a95a48d57d 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -83,6 +83,9 @@ hashhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = hashbulkdelete;
 	amroutine->amvacuumcleanup = hashvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
+	amroutine->ambeginskipscan = NULL;
+	amroutine->amgetskiptuple = NULL;
 	amroutine->amcostestimate = hashcostestimate;
 	amroutine->amoptions = hashoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index a5210d0b34..695d2e1273 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -14,7 +14,9 @@
  *		index_open		- open an index relation by relation OID
  *		index_close		- close an index relation
  *		index_beginscan - start a scan of an index with amgettuple
+ *		index_beginscan_skip - start a scan of an index with amgettuple and skipping
  *		index_beginscan_bitmap - start a scan of an index with amgetbitmap
+ *		index_beginscan_bitmap_skip - start a skip scan of an index with amgetbitmap
  *		index_rescan	- restart a scan of an index
  *		index_endscan	- end a scan
  *		index_insert	- insert an index tuple into a relation
@@ -25,14 +27,17 @@
  *		index_parallelrescan  - (re)start a parallel scan of an index
  *		index_beginscan_parallel - join parallel index scan
  *		index_getnext_tid	- get the next TID from a scan
+ *		index_getnext_tid_skip	- get the next TID from a skip scan
  *		index_fetch_heap		- get the scan's next heap tuple
  *		index_getnext_slot	- get the next tuple from a scan
+ *		index_getnext_slot	- get the next tuple from a skip scan
  *		index_getbitmap - get all tuples from a scan
  *		index_bulk_delete	- bulk deletion of index tuples
  *		index_vacuum_cleanup	- post-deletion cleanup of an index
  *		index_can_return	- does index support index-only scans?
  *		index_getprocid - get a support procedure OID
  *		index_getprocinfo - get a support procedure's lookup info
+ *		index_skip		- advance past duplicate key values in a scan
  *
  * NOTES
  *		This file contains the index_ routines which used
@@ -216,6 +221,78 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+static IndexScanDesc
+index_beginscan_internal_skip(Relation indexRelation,
+						 int nkeys, int norderbys, int prefix, Snapshot snapshot,
+						 ParallelIndexScanDesc pscan, bool temp_snap)
+{
+	IndexScanDesc scan;
+
+	RELATION_CHECKS;
+	CHECK_REL_PROCEDURE(ambeginskipscan);
+
+	if (!(indexRelation->rd_indam->ampredlocks))
+		PredicateLockRelation(indexRelation, snapshot);
+
+	/*
+	 * We hold a reference count to the relcache entry throughout the scan.
+	 */
+	RelationIncrementReferenceCount(indexRelation);
+
+	/*
+	 * Tell the AM to open a scan.
+	 */
+	scan = indexRelation->rd_indam->ambeginskipscan(indexRelation, nkeys,
+												norderbys, prefix);
+	/* Initialize information for parallel scan. */
+	scan->parallel_scan = pscan;
+	scan->xs_temp_snap = temp_snap;
+
+	return scan;
+}
+
+IndexScanDesc
+index_beginscan_skip(Relation heapRelation,
+				Relation indexRelation,
+				Snapshot snapshot,
+				int nkeys, int norderbys, int prefix)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_internal_skip(indexRelation, nkeys, norderbys, prefix, snapshot, NULL, false);
+
+	/*
+	 * Save additional parameters into the scandesc.  Everything else was set
+	 * up by RelationGetIndexScan.
+	 */
+	scan->heapRelation = heapRelation;
+	scan->xs_snapshot = snapshot;
+
+	/* prepare to fetch index matches from table */
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+
+	return scan;
+}
+
+IndexScanDesc
+index_beginscan_bitmap_skip(Relation indexRelation,
+					   Snapshot snapshot,
+					   int nkeys,
+					   int prefix)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_internal_skip(indexRelation, nkeys, 0, prefix, snapshot, NULL, false);
+
+	/*
+	 * Save additional parameters into the scandesc.  Everything else was set
+	 * up by RelationGetIndexScan.
+	 */
+	scan->xs_snapshot = snapshot;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -544,6 +621,45 @@ index_getnext_tid(IndexScanDesc scan, ScanDirection direction)
 	return &scan->xs_heaptid;
 }
 
+ItemPointer
+index_getnext_tid_skip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	bool		found;
+
+	SCAN_CHECKS;
+	CHECK_SCAN_PROCEDURE(amgetskiptuple);
+
+	Assert(TransactionIdIsValid(RecentGlobalXmin));
+
+	/*
+	 * The AM's amgettuple proc finds the next index entry matching the scan
+	 * keys, and puts the TID into scan->xs_heaptid.  It should also set
+	 * scan->xs_recheck and possibly scan->xs_itup/scan->xs_hitup, though we
+	 * pay no attention to those fields here.
+	 */
+	found = scan->indexRelation->rd_indam->amgetskiptuple(scan, prefixDir, postfixDir);
+
+	/* Reset kill flag immediately for safety */
+	scan->kill_prior_tuple = false;
+	scan->xs_heap_continue = false;
+
+	/* If we're out of index entries, we're done */
+	if (!found)
+	{
+		/* release resources (like buffer pins) from table accesses */
+		if (scan->xs_heapfetch)
+			table_index_fetch_reset(scan->xs_heapfetch);
+
+		return NULL;
+	}
+	Assert(ItemPointerIsValid(&scan->xs_heaptid));
+
+	pgstat_count_index_tuples(scan->indexRelation, 1);
+
+	/* Return the TID of the tuple we found. */
+	return &scan->xs_heaptid;
+}
+
 /* ----------------
  *		index_fetch_heap - get the scan's next heap tuple
  *
@@ -635,6 +751,38 @@ index_getnext_slot(IndexScanDesc scan, ScanDirection direction, TupleTableSlot *
 	return false;
 }
 
+bool
+index_getnext_slot_skip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir, TupleTableSlot *slot)
+{
+	for (;;)
+	{
+		if (!scan->xs_heap_continue)
+		{
+			ItemPointer tid;
+
+			/* Time to fetch the next TID from the index */
+			tid = index_getnext_tid_skip(scan, prefixDir, postfixDir);
+
+			/* If we're out of index entries, we're done */
+			if (tid == NULL)
+				break;
+
+			Assert(ItemPointerEquals(tid, &scan->xs_heaptid));
+		}
+
+		/*
+		 * Fetch the next (or only) visible heap tuple for this index entry.
+		 * If we don't find anything, loop around and grab the next TID from
+		 * the index.
+		 */
+		Assert(ItemPointerIsValid(&scan->xs_heaptid));
+		if (index_fetch_heap(scan, slot))
+			return true;
+	}
+
+	return false;
+}
+
 /* ----------------
  *		index_getbitmap - get all tuples at once from an index scan
  *
@@ -730,6 +878,21 @@ index_can_return(Relation indexRelation, int attno)
 	return indexRelation->rd_indam->amcanreturn(indexRelation, attno);
 }
 
+/* ----------------
+ *		index_skip
+ *
+ *		Skip past all tuples where the first 'prefix' columns have the
+ *		same value as the last tuple returned in the current scan.
+ * ----------------
+ */
+bool
+index_skip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	SCAN_CHECKS;
+
+	return scan->indexRelation->rd_indam->amskip(scan, prefixDir, postfixDir);
+}
+
 /* ----------------
  *		index_getprocid
  *
diff --git a/src/backend/access/nbtree/Makefile b/src/backend/access/nbtree/Makefile
index d69808e78c..da96ac00a6 100644
--- a/src/backend/access/nbtree/Makefile
+++ b/src/backend/access/nbtree/Makefile
@@ -19,6 +19,7 @@ OBJS = \
 	nbtpage.o \
 	nbtree.o \
 	nbtsearch.o \
+	nbtskip.o \
 	nbtsort.o \
 	nbtsplitloc.o \
 	nbtutils.o \
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index 00df0e1b88..749c6e3744 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -89,7 +89,7 @@ _bt_doinsert(Relation rel, IndexTuple itup,
 	bool		checkingunique = (checkUnique != UNIQUE_CHECK_NO);
 
 	/* we need an insertion scan key to do our search, so build one */
-	itup_key = _bt_mkscankey(rel, itup);
+	itup_key = _bt_mkscankey(rel, itup, NULL);
 
 	if (checkingunique)
 	{
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 39b8f17f4b..4d48b8bd63 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -1638,7 +1638,7 @@ _bt_pagedel(Relation rel, Buffer buf)
 				}
 
 				/* we need an insertion scan key for the search, so build one */
-				itup_key = _bt_mkscankey(rel, targetkey);
+				itup_key = _bt_mkscankey(rel, targetkey, NULL);
 				/* find the leftmost leaf page with matching pivot/high key */
 				itup_key->pivotsearch = true;
 				stack = _bt_search(rel, itup_key, &lbuf, BT_READ, NULL);
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 4bb16297c3..2b9e045ae0 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -136,14 +136,17 @@ bthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = btbulkdelete;
 	amroutine->amvacuumcleanup = btvacuumcleanup;
 	amroutine->amcanreturn = btcanreturn;
+	amroutine->amskip = btskip;
 	amroutine->amcostestimate = btcostestimate;
 	amroutine->amoptions = btoptions;
 	amroutine->amproperty = btproperty;
 	amroutine->ambuildphasename = btbuildphasename;
 	amroutine->amvalidate = btvalidate;
 	amroutine->ambeginscan = btbeginscan;
+	amroutine->ambeginskipscan = btbeginscan_skip;
 	amroutine->amrescan = btrescan;
 	amroutine->amgettuple = btgettuple;
+	amroutine->amgetskiptuple = btgettuple_skip;
 	amroutine->amgetbitmap = btgetbitmap;
 	amroutine->amendscan = btendscan;
 	amroutine->ammarkpos = btmarkpos;
@@ -219,6 +222,15 @@ btinsert(Relation rel, Datum *values, bool *isnull,
  */
 bool
 btgettuple(IndexScanDesc scan, ScanDirection dir)
+{
+	return btgettuple_skip(scan, dir, dir);
+}
+
+/*
+ *	btgettuple() -- Get the next tuple in the scan.
+ */
+bool
+btgettuple_skip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir)
 {
 	BTScanOpaque so = (BTScanOpaque) scan->opaque;
 	bool		res;
@@ -237,7 +249,7 @@ btgettuple(IndexScanDesc scan, ScanDirection dir)
 		if (so->numArrayKeys < 0)
 			return false;
 
-		_bt_start_array_keys(scan, dir);
+		_bt_start_array_keys(scan, prefixDir);
 	}
 
 	/* This loop handles advancing to the next array elements, if any */
@@ -249,7 +261,7 @@ btgettuple(IndexScanDesc scan, ScanDirection dir)
 		 * _bt_first() to get the first item in the scan.
 		 */
 		if (!BTScanPosIsValid(so->currPos))
-			res = _bt_first(scan, dir);
+			res = _bt_first(scan, prefixDir, postfixDir);
 		else
 		{
 			/*
@@ -276,14 +288,14 @@ btgettuple(IndexScanDesc scan, ScanDirection dir)
 			/*
 			 * Now continue the scan.
 			 */
-			res = _bt_next(scan, dir);
+			res = _bt_next(scan, prefixDir, postfixDir);
 		}
 
 		/* If we have a tuple, return it ... */
 		if (res)
 			break;
 		/* ... otherwise see if we have more array keys to deal with */
-	} while (so->numArrayKeys && _bt_advance_array_keys(scan, dir));
+	} while (so->numArrayKeys && _bt_advance_array_keys(scan, prefixDir));
 
 	return res;
 }
@@ -314,7 +326,7 @@ btgetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 	do
 	{
 		/* Fetch the first page & tuple */
-		if (_bt_first(scan, ForwardScanDirection))
+		if (_bt_first(scan, ForwardScanDirection, ForwardScanDirection))
 		{
 			/* Save tuple ID, and continue scanning */
 			heapTid = &scan->xs_heaptid;
@@ -330,7 +342,7 @@ btgetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 				if (++so->currPos.itemIndex > so->currPos.lastItem)
 				{
 					/* let _bt_next do the heavy lifting */
-					if (!_bt_next(scan, ForwardScanDirection))
+					if (!_bt_next(scan, ForwardScanDirection, ForwardScanDirection))
 						break;
 				}
 
@@ -351,6 +363,16 @@ btgetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
  */
 IndexScanDesc
 btbeginscan(Relation rel, int nkeys, int norderbys)
+{
+	return btbeginscan_skip(rel, nkeys, norderbys, -1);
+}
+
+
+/*
+ *	btbeginscan() -- start a scan on a btree index
+ */
+IndexScanDesc
+btbeginscan_skip(Relation rel, int nkeys, int norderbys, int skipPrefix)
 {
 	IndexScanDesc scan;
 	BTScanOpaque so;
@@ -385,10 +407,18 @@ btbeginscan(Relation rel, int nkeys, int norderbys)
 	 */
 	so->currTuples = so->markTuples = NULL;
 
+	so->skipData = NULL;
+
 	scan->xs_itupdesc = RelationGetDescr(rel);
 
 	scan->opaque = so;
 
+	if (skipPrefix > 0)
+	{
+		so->skipData = (BTSkip) palloc0(sizeof(BTSkipData));
+		so->skipData->prefix = skipPrefix;
+	}
+
 	return scan;
 }
 
@@ -452,6 +482,15 @@ btrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 	_bt_preprocess_array_keys(scan);
 }
 
+/*
+ * btskip() -- skip to the beginning of the next key prefix
+ */
+bool
+btskip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	return _bt_skip(scan, prefixDir, postfixDir);
+}
+
 /*
  *	btendscan() -- close down a scan
  */
@@ -485,6 +524,8 @@ btendscan(IndexScanDesc scan)
 	if (so->currTuples != NULL)
 		pfree(so->currTuples);
 	/* so->markTuples should not be pfree'd, see btrescan */
+	if (_bt_skip_enabled(so))
+		pfree(so->skipData);
 	pfree(so);
 }
 
@@ -568,6 +609,9 @@ btrestrpos(IndexScanDesc scan)
 			if (so->currTuples)
 				memcpy(so->currTuples, so->markTuples,
 					   so->markPos.nextTupleOffset);
+			if (so->skipData)
+				memcpy(&so->skipData->curPos, &so->skipData->markPos,
+					   sizeof(BTSkipPosData));
 		}
 		else
 			BTScanPosInvalidate(so->currPos);
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index 8ff49ce6d6..f0c042c9ba 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -17,19 +17,17 @@
 
 #include "access/nbtree.h"
 #include "access/relscan.h"
+#include "catalog/catalog.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/predicate.h"
+#include "utils/guc.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 
 
-static void _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp);
-static OffsetNumber _bt_binsrch(Relation rel, BTScanInsert key, Buffer buf);
 static int	_bt_binsrch_posting(BTScanInsert key, Page page,
 								OffsetNumber offnum);
-static bool _bt_readpage(IndexScanDesc scan, ScanDirection dir,
-						 OffsetNumber offnum);
 static void _bt_saveitem(BTScanOpaque so, int itemIndex,
 						 OffsetNumber offnum, IndexTuple itup);
 static int	_bt_setuppostingitems(BTScanOpaque so, int itemIndex,
@@ -38,14 +36,12 @@ static int	_bt_setuppostingitems(BTScanOpaque so, int itemIndex,
 static inline void _bt_savepostingitem(BTScanOpaque so, int itemIndex,
 									   OffsetNumber offnum,
 									   ItemPointer heapTid, int tupleOffset);
-static bool _bt_steppage(IndexScanDesc scan, ScanDirection dir);
-static bool _bt_readnextpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir);
 static bool _bt_parallel_readpage(IndexScanDesc scan, BlockNumber blkno,
 								  ScanDirection dir);
-static Buffer _bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot);
 static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir);
-static inline void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir);
-
+static inline bool _bt_checkkeys_extended(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
+										  ScanDirection dir, bool isRegularMode,
+										  bool *continuescan, int *prefixskipindex);
 
 /*
  *	_bt_drop_lock_and_maybe_pin()
@@ -61,7 +57,7 @@ static inline void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir);
  * will remain in shared memory for as long as it takes to scan the index
  * buffer page.
  */
-static void
+void
 _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp)
 {
 	LockBuffer(sp->buf, BUFFER_LOCK_UNLOCK);
@@ -344,7 +340,7 @@ _bt_moveright(Relation rel,
  * the given page.  _bt_binsrch() has no lock or refcount side effects
  * on the buffer.
  */
-static OffsetNumber
+OffsetNumber
 _bt_binsrch(Relation rel,
 			BTScanInsert key,
 			Buffer buf)
@@ -850,25 +846,23 @@ _bt_compare(Relation rel,
  * in locating the scan start position.
  */
 bool
-_bt_first(IndexScanDesc scan, ScanDirection dir)
+_bt_first(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir)
 {
 	Relation	rel = scan->indexRelation;
 	BTScanOpaque so = (BTScanOpaque) scan->opaque;
 	Buffer		buf;
 	BTStack		stack;
 	OffsetNumber offnum;
-	StrategyNumber strat;
-	bool		nextkey;
 	bool		goback;
 	BTScanInsertData inskey;
 	ScanKey		startKeys[INDEX_MAX_KEYS];
 	ScanKeyData notnullkeys[INDEX_MAX_KEYS];
 	int			keysCount = 0;
-	int			i;
 	bool		status = true;
 	StrategyNumber strat_total;
 	BTScanPosItem *currItem;
 	BlockNumber blkno;
+	IndexTuple itup;
 
 	Assert(!BTScanPosIsValid(so->currPos));
 
@@ -905,184 +899,13 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
 		}
 		else if (blkno != InvalidBlockNumber)
 		{
-			if (!_bt_parallel_readpage(scan, blkno, dir))
+			if (!_bt_parallel_readpage(scan, blkno, prefixDir))
 				return false;
 			goto readcomplete;
 		}
 	}
 
-	/*----------
-	 * Examine the scan keys to discover where we need to start the scan.
-	 *
-	 * We want to identify the keys that can be used as starting boundaries;
-	 * these are =, >, or >= keys for a forward scan or =, <, <= keys for
-	 * a backwards scan.  We can use keys for multiple attributes so long as
-	 * the prior attributes had only =, >= (resp. =, <=) keys.  Once we accept
-	 * a > or < boundary or find an attribute with no boundary (which can be
-	 * thought of as the same as "> -infinity"), we can't use keys for any
-	 * attributes to its right, because it would break our simplistic notion
-	 * of what initial positioning strategy to use.
-	 *
-	 * When the scan keys include cross-type operators, _bt_preprocess_keys
-	 * may not be able to eliminate redundant keys; in such cases we will
-	 * arbitrarily pick a usable one for each attribute.  This is correct
-	 * but possibly not optimal behavior.  (For example, with keys like
-	 * "x >= 4 AND x >= 5" we would elect to scan starting at x=4 when
-	 * x=5 would be more efficient.)  Since the situation only arises given
-	 * a poorly-worded query plus an incomplete opfamily, live with it.
-	 *
-	 * When both equality and inequality keys appear for a single attribute
-	 * (again, only possible when cross-type operators appear), we *must*
-	 * select one of the equality keys for the starting point, because
-	 * _bt_checkkeys() will stop the scan as soon as an equality qual fails.
-	 * For example, if we have keys like "x >= 4 AND x = 10" and we elect to
-	 * start at x=4, we will fail and stop before reaching x=10.  If multiple
-	 * equality quals survive preprocessing, however, it doesn't matter which
-	 * one we use --- by definition, they are either redundant or
-	 * contradictory.
-	 *
-	 * Any regular (not SK_SEARCHNULL) key implies a NOT NULL qualifier.
-	 * If the index stores nulls at the end of the index we'll be starting
-	 * from, and we have no boundary key for the column (which means the key
-	 * we deduced NOT NULL from is an inequality key that constrains the other
-	 * end of the index), then we cons up an explicit SK_SEARCHNOTNULL key to
-	 * use as a boundary key.  If we didn't do this, we might find ourselves
-	 * traversing a lot of null entries at the start of the scan.
-	 *
-	 * In this loop, row-comparison keys are treated the same as keys on their
-	 * first (leftmost) columns.  We'll add on lower-order columns of the row
-	 * comparison below, if possible.
-	 *
-	 * The selected scan keys (at most one per index column) are remembered by
-	 * storing their addresses into the local startKeys[] array.
-	 *----------
-	 */
-	strat_total = BTEqualStrategyNumber;
-	if (so->numberOfKeys > 0)
-	{
-		AttrNumber	curattr;
-		ScanKey		chosen;
-		ScanKey		impliesNN;
-		ScanKey		cur;
-
-		/*
-		 * chosen is the so-far-chosen key for the current attribute, if any.
-		 * We don't cast the decision in stone until we reach keys for the
-		 * next attribute.
-		 */
-		curattr = 1;
-		chosen = NULL;
-		/* Also remember any scankey that implies a NOT NULL constraint */
-		impliesNN = NULL;
-
-		/*
-		 * Loop iterates from 0 to numberOfKeys inclusive; we use the last
-		 * pass to handle after-last-key processing.  Actual exit from the
-		 * loop is at one of the "break" statements below.
-		 */
-		for (cur = so->keyData, i = 0;; cur++, i++)
-		{
-			if (i >= so->numberOfKeys || cur->sk_attno != curattr)
-			{
-				/*
-				 * Done looking at keys for curattr.  If we didn't find a
-				 * usable boundary key, see if we can deduce a NOT NULL key.
-				 */
-				if (chosen == NULL && impliesNN != NULL &&
-					((impliesNN->sk_flags & SK_BT_NULLS_FIRST) ?
-					 ScanDirectionIsForward(dir) :
-					 ScanDirectionIsBackward(dir)))
-				{
-					/* Yes, so build the key in notnullkeys[keysCount] */
-					chosen = &notnullkeys[keysCount];
-					ScanKeyEntryInitialize(chosen,
-										   (SK_SEARCHNOTNULL | SK_ISNULL |
-											(impliesNN->sk_flags &
-											 (SK_BT_DESC | SK_BT_NULLS_FIRST))),
-										   curattr,
-										   ((impliesNN->sk_flags & SK_BT_NULLS_FIRST) ?
-											BTGreaterStrategyNumber :
-											BTLessStrategyNumber),
-										   InvalidOid,
-										   InvalidOid,
-										   InvalidOid,
-										   (Datum) 0);
-				}
-
-				/*
-				 * If we still didn't find a usable boundary key, quit; else
-				 * save the boundary key pointer in startKeys.
-				 */
-				if (chosen == NULL)
-					break;
-				startKeys[keysCount++] = chosen;
-
-				/*
-				 * Adjust strat_total, and quit if we have stored a > or <
-				 * key.
-				 */
-				strat = chosen->sk_strategy;
-				if (strat != BTEqualStrategyNumber)
-				{
-					strat_total = strat;
-					if (strat == BTGreaterStrategyNumber ||
-						strat == BTLessStrategyNumber)
-						break;
-				}
-
-				/*
-				 * Done if that was the last attribute, or if next key is not
-				 * in sequence (implying no boundary key is available for the
-				 * next attribute).
-				 */
-				if (i >= so->numberOfKeys ||
-					cur->sk_attno != curattr + 1)
-					break;
-
-				/*
-				 * Reset for next attr.
-				 */
-				curattr = cur->sk_attno;
-				chosen = NULL;
-				impliesNN = NULL;
-			}
-
-			/*
-			 * Can we use this key as a starting boundary for this attr?
-			 *
-			 * If not, does it imply a NOT NULL constraint?  (Because
-			 * SK_SEARCHNULL keys are always assigned BTEqualStrategyNumber,
-			 * *any* inequality key works for that; we need not test.)
-			 */
-			switch (cur->sk_strategy)
-			{
-				case BTLessStrategyNumber:
-				case BTLessEqualStrategyNumber:
-					if (chosen == NULL)
-					{
-						if (ScanDirectionIsBackward(dir))
-							chosen = cur;
-						else
-							impliesNN = cur;
-					}
-					break;
-				case BTEqualStrategyNumber:
-					/* override any non-equality choice */
-					chosen = cur;
-					break;
-				case BTGreaterEqualStrategyNumber:
-				case BTGreaterStrategyNumber:
-					if (chosen == NULL)
-					{
-						if (ScanDirectionIsForward(dir))
-							chosen = cur;
-						else
-							impliesNN = cur;
-					}
-					break;
-			}
-		}
-	}
+	keysCount = _bt_choose_scan_keys(so->keyData, so->numberOfKeys, prefixDir, startKeys, notnullkeys, &strat_total, 0);
 
 	/*
 	 * If we found no usable boundary keys, we have to start from one end of
@@ -1093,260 +916,112 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
 	{
 		bool		match;
 
-		match = _bt_endpoint(scan, dir);
-
-		if (!match)
+		if (!_bt_skip_enabled(so))
 		{
-			/* No match, so mark (parallel) scan finished */
-			_bt_parallel_done(scan);
-		}
+			match = _bt_endpoint(scan, prefixDir);
 
-		return match;
-	}
+			if (!match)
+			{
+				/* No match, so mark (parallel) scan finished */
+				_bt_parallel_done(scan);
+			}
 
-	/*
-	 * We want to start the scan somewhere within the index.  Set up an
-	 * insertion scankey we can use to search for the boundary point we
-	 * identified above.  The insertion scankey is built using the keys
-	 * identified by startKeys[].  (Remaining insertion scankey fields are
-	 * initialized after initial-positioning strategy is finalized.)
-	 */
-	Assert(keysCount <= INDEX_MAX_KEYS);
-	for (i = 0; i < keysCount; i++)
-	{
-		ScanKey		cur = startKeys[i];
+			return match;
+		}
+		else
+		{
+			Relation	rel = scan->indexRelation;
+			Buffer		buf;
+			Page		page;
+			BTPageOpaque opaque;
+			OffsetNumber start;
+			BTSkipCompareResult cmp = {0};
 
-		Assert(cur->sk_attno == i + 1);
+			_bt_skip_create_scankeys(rel, so);
 
-		if (cur->sk_flags & SK_ROW_HEADER)
-		{
 			/*
-			 * Row comparison header: look to the first row member instead.
-			 *
-			 * The member scankeys are already in insertion format (ie, they
-			 * have sk_func = 3-way-comparison function), but we have to watch
-			 * out for nulls, which _bt_preprocess_keys didn't check. A null
-			 * in the first row member makes the condition unmatchable, just
-			 * like qual_ok = false.
+			 * Scan down to the leftmost or rightmost leaf page and position
+			 * the scan on the leftmost or rightmost item on that page.
+			 * Start the skip scan from there to find the first matching item
 			 */
-			ScanKey		subkey = (ScanKey) DatumGetPointer(cur->sk_argument);
+			buf = _bt_get_endpoint(rel, 0, ScanDirectionIsBackward(prefixDir), scan->xs_snapshot);
 
-			Assert(subkey->sk_flags & SK_ROW_MEMBER);
-			if (subkey->sk_flags & SK_ISNULL)
+			if (!BufferIsValid(buf))
 			{
-				_bt_parallel_done(scan);
+				/*
+				 * Empty index. Lock the whole relation, as nothing finer to lock
+				 * exists.
+				 */
+				PredicateLockRelation(rel, scan->xs_snapshot);
+				BTScanPosInvalidate(so->currPos);
 				return false;
 			}
-			memcpy(inskey.scankeys + i, subkey, sizeof(ScanKeyData));
 
-			/*
-			 * If the row comparison is the last positioning key we accepted,
-			 * try to add additional keys from the lower-order row members.
-			 * (If we accepted independent conditions on additional index
-			 * columns, we use those instead --- doesn't seem worth trying to
-			 * determine which is more restrictive.)  Note that this is OK
-			 * even if the row comparison is of ">" or "<" type, because the
-			 * condition applied to all but the last row member is effectively
-			 * ">=" or "<=", and so the extra keys don't break the positioning
-			 * scheme.  But, by the same token, if we aren't able to use all
-			 * the row members, then the part of the row comparison that we
-			 * did use has to be treated as just a ">=" or "<=" condition, and
-			 * so we'd better adjust strat_total accordingly.
-			 */
-			if (i == keysCount - 1)
+			PredicateLockPage(rel, BufferGetBlockNumber(buf), scan->xs_snapshot);
+			page = BufferGetPage(buf);
+			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+			Assert(P_ISLEAF(opaque));
+
+			if (ScanDirectionIsForward(prefixDir))
 			{
-				bool		used_all_subkeys = false;
+				/* There could be dead pages to the left, so not this: */
+				/* Assert(P_LEFTMOST(opaque)); */
 
-				Assert(!(subkey->sk_flags & SK_ROW_END));
-				for (;;)
-				{
-					subkey++;
-					Assert(subkey->sk_flags & SK_ROW_MEMBER);
-					if (subkey->sk_attno != keysCount + 1)
-						break;	/* out-of-sequence, can't use it */
-					if (subkey->sk_strategy != cur->sk_strategy)
-						break;	/* wrong direction, can't use it */
-					if (subkey->sk_flags & SK_ISNULL)
-						break;	/* can't use null keys */
-					Assert(keysCount < INDEX_MAX_KEYS);
-					memcpy(inskey.scankeys + keysCount, subkey,
-						   sizeof(ScanKeyData));
-					keysCount++;
-					if (subkey->sk_flags & SK_ROW_END)
-					{
-						used_all_subkeys = true;
-						break;
-					}
-				}
-				if (!used_all_subkeys)
-				{
-					switch (strat_total)
-					{
-						case BTLessStrategyNumber:
-							strat_total = BTLessEqualStrategyNumber;
-							break;
-						case BTGreaterStrategyNumber:
-							strat_total = BTGreaterEqualStrategyNumber;
-							break;
-					}
-				}
-				break;			/* done with outer loop */
+				start = P_FIRSTDATAKEY(opaque);
 			}
-		}
-		else
-		{
-			/*
-			 * Ordinary comparison key.  Transform the search-style scan key
-			 * to an insertion scan key by replacing the sk_func with the
-			 * appropriate btree comparison function.
-			 *
-			 * If scankey operator is not a cross-type comparison, we can use
-			 * the cached comparison function; otherwise gotta look it up in
-			 * the catalogs.  (That can't lead to infinite recursion, since no
-			 * indexscan initiated by syscache lookup will use cross-data-type
-			 * operators.)
-			 *
-			 * We support the convention that sk_subtype == InvalidOid means
-			 * the opclass input type; this is a hack to simplify life for
-			 * ScanKeyInit().
-			 */
-			if (cur->sk_subtype == rel->rd_opcintype[i] ||
-				cur->sk_subtype == InvalidOid)
+			else if (ScanDirectionIsBackward(prefixDir))
 			{
-				FmgrInfo   *procinfo;
-
-				procinfo = index_getprocinfo(rel, cur->sk_attno, BTORDER_PROC);
-				ScanKeyEntryInitializeWithInfo(inskey.scankeys + i,
-											   cur->sk_flags,
-											   cur->sk_attno,
-											   InvalidStrategy,
-											   cur->sk_subtype,
-											   cur->sk_collation,
-											   procinfo,
-											   cur->sk_argument);
+				Assert(P_RIGHTMOST(opaque));
+
+				start = PageGetMaxOffsetNumber(page);
 			}
 			else
 			{
-				RegProcedure cmp_proc;
-
-				cmp_proc = get_opfamily_proc(rel->rd_opfamily[i],
-											 rel->rd_opcintype[i],
-											 cur->sk_subtype,
-											 BTORDER_PROC);
-				if (!RegProcedureIsValid(cmp_proc))
-					elog(ERROR, "missing support function %d(%u,%u) for attribute %d of index \"%s\"",
-						 BTORDER_PROC, rel->rd_opcintype[i], cur->sk_subtype,
-						 cur->sk_attno, RelationGetRelationName(rel));
-				ScanKeyEntryInitialize(inskey.scankeys + i,
-									   cur->sk_flags,
-									   cur->sk_attno,
-									   InvalidStrategy,
-									   cur->sk_subtype,
-									   cur->sk_collation,
-									   cmp_proc,
-									   cur->sk_argument);
+				elog(ERROR, "invalid scan direction: %d", (int) prefixDir);
 			}
-		}
-	}
 
-	/*----------
-	 * Examine the selected initial-positioning strategy to determine exactly
-	 * where we need to start the scan, and set flag variables to control the
-	 * code below.
-	 *
-	 * If nextkey = false, _bt_search and _bt_binsrch will locate the first
-	 * item >= scan key.  If nextkey = true, they will locate the first
-	 * item > scan key.
-	 *
-	 * If goback = true, we will then step back one item, while if
-	 * goback = false, we will start the scan on the located item.
-	 *----------
-	 */
-	switch (strat_total)
-	{
-		case BTLessStrategyNumber:
-
-			/*
-			 * Find first item >= scankey, then back up one to arrive at last
-			 * item < scankey.  (Note: this positioning strategy is only used
-			 * for a backward scan, so that is always the correct starting
-			 * position.)
-			 */
-			nextkey = false;
-			goback = true;
-			break;
-
-		case BTLessEqualStrategyNumber:
-
-			/*
-			 * Find first item > scankey, then back up one to arrive at last
-			 * item <= scankey.  (Note: this positioning strategy is only used
-			 * for a backward scan, so that is always the correct starting
-			 * position.)
-			 */
-			nextkey = true;
-			goback = true;
-			break;
-
-		case BTEqualStrategyNumber:
-
-			/*
-			 * If a backward scan was specified, need to start with last equal
-			 * item not first one.
+			/* remember which buffer we have pinned */
+			so->currPos.buf = buf;
+			so->currPos.currPage = BufferGetBlockNumber(so->currPos.buf);
+
+			itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, start));
+			/* in some cases, we can (or have to) skip further inside the prefix.
+			 * we can do this if we have extra quals becoming available, eg.
+			 * WHERE b=2 on an index on (a,b).
+			 * We must, if this is not regular mode (prefixDir!=postfixDir).
+			 * Because this means we're at the end of the prefix, while we should be
+			 * at the beginning.
 			 */
-			if (ScanDirectionIsBackward(dir))
+			if (_bt_has_extra_quals_after_skip(so->skipData, postfixDir, 0) ||
+					!_bt_skip_is_regular_mode(prefixDir, postfixDir))
 			{
-				/*
-				 * This is the same as the <= strategy.  We will check at the
-				 * end whether the found item is actually =.
-				 */
-				nextkey = true;
-				goback = true;
+				_bt_skip_extra_conditions(scan, &itup, &start, prefixDir, postfixDir, &cmp);
 			}
-			else
+			/* now find the next matching tuple */
+			match = _bt_skip_find_next(scan, itup, start, prefixDir, postfixDir);
+			if (!match)
 			{
-				/*
-				 * This is the same as the >= strategy.  We will check at the
-				 * end whether the found item is actually =.
-				 */
-				nextkey = false;
-				goback = false;
+				if (_bt_skip_is_always_valid(so))
+					_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
+				return false;
 			}
-			break;
 
-		case BTGreaterEqualStrategyNumber:
+			_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
 
-			/*
-			 * Find first item >= scankey.  (This is only used for forward
-			 * scans.)
-			 */
-			nextkey = false;
-			goback = false;
-			break;
-
-		case BTGreaterStrategyNumber:
-
-			/*
-			 * Find first item > scankey.  (This is only used for forward
-			 * scans.)
-			 */
-			nextkey = true;
-			goback = false;
-			break;
+			currItem = &so->currPos.items[so->currPos.itemIndex];
+			scan->xs_heaptid = currItem->heapTid;
+			if (scan->xs_want_itup)
+				scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
 
-		default:
-			/* can't get here, but keep compiler quiet */
-			elog(ERROR, "unrecognized strat_total: %d", (int) strat_total);
-			return false;
+			return true;
+		}
 	}
 
-	/* Initialize remaining insertion scan key fields */
-	_bt_metaversion(rel, &inskey.heapkeyspace, &inskey.allequalimage);
-	inskey.anynullkeys = false; /* unused */
-	inskey.nextkey = nextkey;
-	inskey.pivotsearch = false;
-	inskey.scantid = NULL;
-	inskey.keysz = keysCount;
+	if (!_bt_create_insertion_scan_key(rel, prefixDir, startKeys, keysCount, &inskey, &strat_total,  &goback))
+	{
+		_bt_parallel_done(scan);
+		return false;
+	}
 
 	/*
 	 * Use the manufactured insertion scan key to descend the tree and
@@ -1378,7 +1053,7 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
 		PredicateLockPage(rel, BufferGetBlockNumber(buf),
 						  scan->xs_snapshot);
 
-	_bt_initialize_more_data(so, dir);
+	_bt_initialize_more_data(so, prefixDir);
 
 	/* position to the precise item on the page */
 	offnum = _bt_binsrch(rel, &inskey, buf);
@@ -1408,23 +1083,79 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
 	Assert(!BTScanPosIsValid(so->currPos));
 	so->currPos.buf = buf;
 
-	/*
-	 * Now load data from the first page of the scan.
-	 */
-	if (!_bt_readpage(scan, dir, offnum))
+	if (_bt_skip_enabled(so))
 	{
-		/*
-		 * There's no actually-matching data on this page.  Try to advance to
-		 * the next page.  Return false if there's no matching data at all.
+		Page page;
+		BTPageOpaque opaque;
+		OffsetNumber minoff;
+		bool match;
+		BTSkipCompareResult cmp = {0};
+
+		/* first create the skip scan keys */
+		_bt_skip_create_scankeys(rel, so);
+
+		/* remember which page we have pinned */
+		so->currPos.currPage = BufferGetBlockNumber(so->currPos.buf);
+
+		page = BufferGetPage(so->currPos.buf);
+		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+		minoff = P_FIRSTDATAKEY(opaque);
+		/* _binsrch + goback parameter can leave the offnum before the first item on the page
+		 * or after the last item on the page. if that is the case we need to either step
+		 * back or forward one page
 		 */
-		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
-		if (!_bt_steppage(scan, dir))
+		if (offnum < minoff)
+		{
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			if (!_bt_step_back_page(scan, &itup, &offnum))
+				return false;
+		}
+		else if (offnum > PageGetMaxOffsetNumber(page))
+		{
+			BlockNumber next = opaque->btpo_next;
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			if (!_bt_step_forward_page(scan, next, &itup, &offnum))
+				return false;
+		}
+
+		itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
+		/* check if we can skip even more because we can use new conditions */
+		if (_bt_has_extra_quals_after_skip(so->skipData, postfixDir, inskey.keysz) ||
+				!_bt_skip_is_regular_mode(prefixDir, postfixDir))
+		{
+			_bt_skip_extra_conditions(scan, &itup, &offnum, prefixDir, postfixDir, &cmp);
+		}
+		/* now find the tuple */
+		match = _bt_skip_find_next(scan, itup, offnum, prefixDir, postfixDir);
+		if (!match)
+		{
+			if (_bt_skip_is_always_valid(so))
+				_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
 			return false;
+		}
+
+		_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
 	}
 	else
 	{
-		/* Drop the lock, and maybe the pin, on the current page */
-		_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
+		/*
+		 * Now load data from the first page of the scan.
+		 */
+		if (!_bt_readpage(scan, prefixDir, &offnum, true))
+		{
+			/*
+			 * There's no actually-matching data on this page.  Try to advance to
+			 * the next page.  Return false if there's no matching data at all.
+			 */
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			if (!_bt_steppage(scan, prefixDir))
+				return false;
+		}
+		else
+		{
+			/* Drop the lock, and maybe the pin, on the current page */
+			_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
+		}
 	}
 
 readcomplete:
@@ -1452,29 +1183,113 @@ readcomplete:
  *		so->currPos.buf to InvalidBuffer.
  */
 bool
-_bt_next(IndexScanDesc scan, ScanDirection dir)
+_bt_next(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir)
 {
 	BTScanOpaque so = (BTScanOpaque) scan->opaque;
 	BTScanPosItem *currItem;
 
-	/*
-	 * Advance to next tuple on current page; or if there's no more, try to
-	 * step to the next page with data.
-	 */
-	if (ScanDirectionIsForward(dir))
+	if (!_bt_skip_enabled(so))
 	{
-		if (++so->currPos.itemIndex > so->currPos.lastItem)
+		/*
+		 * Advance to next tuple on current page; or if there's no more, try to
+		 * step to the next page with data.
+		 */
+		if (ScanDirectionIsForward(prefixDir))
 		{
-			if (!_bt_steppage(scan, dir))
-				return false;
+			if (++so->currPos.itemIndex > so->currPos.lastItem)
+			{
+				if (!_bt_steppage(scan, prefixDir))
+					return false;
+			}
+		}
+		else
+		{
+			if (--so->currPos.itemIndex < so->currPos.firstItem)
+			{
+				if (!_bt_steppage(scan, prefixDir))
+					return false;
+			}
 		}
 	}
 	else
 	{
-		if (--so->currPos.itemIndex < so->currPos.firstItem)
+		bool match;
+		IndexTuple itup = NULL;
+		OffsetNumber offnum = InvalidOffsetNumber;
+
+		if (ScanDirectionIsForward(postfixDir))
 		{
-			if (!_bt_steppage(scan, dir))
-				return false;
+			if (++so->currPos.itemIndex > so->currPos.lastItem)
+			{
+				if (prefixDir != so->skipData->curPos.nextDirection)
+				{
+					/* this happens when doing a cursor scan and changing
+					 * direction in the meantime. eg. first fetch forwards,
+					 * then backwards.
+					 * we *always* just go to the next page instead of skipping,
+					 * because that's the only safe option.
+					 */
+					so->skipData->curPos.nextAction = SkipStateNext;
+					so->skipData->curPos.nextDirection = prefixDir;
+				}
+
+				if (so->skipData->curPos.nextAction == SkipStateNext)
+				{
+					/* we should just go forwards one page, no skipping is necessary */
+					if (!_bt_step_forward_page(scan, so->currPos.nextPage, &itup, &offnum))
+						return false;
+				}
+				else if (so->skipData->curPos.nextAction == SkipStateStop)
+				{
+					/* we've reached the end of the index, or we cannot find any more keys */
+					BTScanPosUnpinIfPinned(so->currPos);
+					BTScanPosInvalidate(so->currPos);
+					return false;
+				}
+
+				/* now find the next tuple */
+				match = _bt_skip_find_next(scan, itup, offnum, prefixDir, postfixDir);
+				if (!match)
+				{
+					if (_bt_skip_is_always_valid(so))
+						_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
+					return false;
+				}
+				_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
+			}
+		}
+		else
+		{
+			if (--so->currPos.itemIndex < so->currPos.firstItem)
+			{
+				if (prefixDir != so->skipData->curPos.nextDirection)
+				{
+					so->skipData->curPos.nextAction = SkipStateNext;
+					so->skipData->curPos.nextDirection = prefixDir;
+				}
+
+				if (so->skipData->curPos.nextAction == SkipStateNext)
+				{
+					if (!_bt_step_back_page(scan, &itup, &offnum))
+						return false;
+				}
+				else if (so->skipData->curPos.nextAction == SkipStateStop)
+				{
+					BTScanPosUnpinIfPinned(so->currPos);
+					BTScanPosInvalidate(so->currPos);
+					return false;
+				}
+
+				/* now find the next tuple */
+				match = _bt_skip_find_next(scan, itup, offnum, prefixDir, postfixDir);
+				if (!match)
+				{
+					if (_bt_skip_is_always_valid(so))
+						_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
+					return false;
+				}
+				_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
+			}
 		}
 	}
 
@@ -1506,8 +1321,8 @@ _bt_next(IndexScanDesc scan, ScanDirection dir)
  *
  * Returns true if any matching items found on the page, false if none.
  */
-static bool
-_bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
+bool
+_bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber *offnum, bool isRegularMode)
 {
 	BTScanOpaque so = (BTScanOpaque) scan->opaque;
 	Page		page;
@@ -1517,6 +1332,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 	int			itemIndex;
 	bool		continuescan;
 	int			indnatts;
+	int			prefixskipindex;
 
 	/*
 	 * We must have the buffer pinned and locked, but the usual macro can't be
@@ -1575,11 +1391,11 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 		/* load items[] in ascending order */
 		itemIndex = 0;
 
-		offnum = Max(offnum, minoff);
+		*offnum = Max(*offnum, minoff);
 
-		while (offnum <= maxoff)
+		while (*offnum <= maxoff)
 		{
-			ItemId		iid = PageGetItemId(page, offnum);
+			ItemId		iid = PageGetItemId(page, *offnum);
 			IndexTuple	itup;
 
 			/*
@@ -1588,19 +1404,19 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 			 */
 			if (scan->ignore_killed_tuples && ItemIdIsDead(iid))
 			{
-				offnum = OffsetNumberNext(offnum);
+				*offnum = OffsetNumberNext(*offnum);
 				continue;
 			}
 
 			itup = (IndexTuple) PageGetItem(page, iid);
 
-			if (_bt_checkkeys(scan, itup, indnatts, dir, &continuescan))
+			if (_bt_checkkeys_extended(scan, itup, indnatts, dir, isRegularMode, &continuescan, &prefixskipindex))
 			{
 				/* tuple passes all scan key conditions */
 				if (!BTreeTupleIsPosting(itup))
 				{
 					/* Remember it */
-					_bt_saveitem(so, itemIndex, offnum, itup);
+					_bt_saveitem(so, itemIndex, *offnum, itup);
 					itemIndex++;
 				}
 				else
@@ -1612,26 +1428,30 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 					 * TID
 					 */
 					tupleOffset =
-						_bt_setuppostingitems(so, itemIndex, offnum,
+						_bt_setuppostingitems(so, itemIndex, *offnum,
 											  BTreeTupleGetPostingN(itup, 0),
 											  itup);
 					itemIndex++;
 					/* Remember additional TIDs */
 					for (int i = 1; i < BTreeTupleGetNPosting(itup); i++)
 					{
-						_bt_savepostingitem(so, itemIndex, offnum,
+						_bt_savepostingitem(so, itemIndex, *offnum,
 											BTreeTupleGetPostingN(itup, i),
 											tupleOffset);
 						itemIndex++;
 					}
 				}
 			}
+
+			*offnum = OffsetNumberNext(*offnum);
+
 			/* When !continuescan, there can't be any more matches, so stop */
 			if (!continuescan)
 				break;
-
-			offnum = OffsetNumberNext(offnum);
+			if (!isRegularMode && prefixskipindex != -1)
+				break;
 		}
+		*offnum = OffsetNumberPrev(*offnum);
 
 		/*
 		 * We don't need to visit page to the right when the high key
@@ -1651,7 +1471,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 			int			truncatt;
 
 			truncatt = BTreeTupleGetNAtts(itup, scan->indexRelation);
-			_bt_checkkeys(scan, itup, truncatt, dir, &continuescan);
+			_bt_checkkeys(scan, itup, truncatt, dir, &continuescan, NULL);
 		}
 
 		if (!continuescan)
@@ -1667,11 +1487,11 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 		/* load items[] in descending order */
 		itemIndex = MaxTIDsPerBTreePage;
 
-		offnum = Min(offnum, maxoff);
+		*offnum = Min(*offnum, maxoff);
 
-		while (offnum >= minoff)
+		while (*offnum >= minoff)
 		{
-			ItemId		iid = PageGetItemId(page, offnum);
+			ItemId		iid = PageGetItemId(page, *offnum);
 			IndexTuple	itup;
 			bool		tuple_alive;
 			bool		passes_quals;
@@ -1688,10 +1508,10 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 			 */
 			if (scan->ignore_killed_tuples && ItemIdIsDead(iid))
 			{
-				Assert(offnum >= P_FIRSTDATAKEY(opaque));
-				if (offnum > P_FIRSTDATAKEY(opaque))
+				Assert(*offnum >= P_FIRSTDATAKEY(opaque));
+				if (*offnum > P_FIRSTDATAKEY(opaque))
 				{
-					offnum = OffsetNumberPrev(offnum);
+					*offnum = OffsetNumberPrev(*offnum);
 					continue;
 				}
 
@@ -1702,8 +1522,8 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 
 			itup = (IndexTuple) PageGetItem(page, iid);
 
-			passes_quals = _bt_checkkeys(scan, itup, indnatts, dir,
-										 &continuescan);
+			passes_quals = _bt_checkkeys_extended(scan, itup, indnatts, dir,
+												  isRegularMode, &continuescan, &prefixskipindex);
 			if (passes_quals && tuple_alive)
 			{
 				/* tuple passes all scan key conditions */
@@ -1711,7 +1531,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 				{
 					/* Remember it */
 					itemIndex--;
-					_bt_saveitem(so, itemIndex, offnum, itup);
+					_bt_saveitem(so, itemIndex, *offnum, itup);
 				}
 				else
 				{
@@ -1729,28 +1549,32 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 					 */
 					itemIndex--;
 					tupleOffset =
-						_bt_setuppostingitems(so, itemIndex, offnum,
+						_bt_setuppostingitems(so, itemIndex, *offnum,
 											  BTreeTupleGetPostingN(itup, 0),
 											  itup);
 					/* Remember additional TIDs */
 					for (int i = 1; i < BTreeTupleGetNPosting(itup); i++)
 					{
 						itemIndex--;
-						_bt_savepostingitem(so, itemIndex, offnum,
+						_bt_savepostingitem(so, itemIndex, *offnum,
 											BTreeTupleGetPostingN(itup, i),
 											tupleOffset);
 					}
 				}
 			}
+
+			*offnum = OffsetNumberPrev(*offnum);
+
 			if (!continuescan)
 			{
 				/* there can't be any more matches, so stop */
 				so->currPos.moreLeft = false;
 				break;
 			}
-
-			offnum = OffsetNumberPrev(offnum);
+			if (!isRegularMode && prefixskipindex != -1)
+				break;
 		}
+		*offnum = OffsetNumberNext(*offnum);
 
 		Assert(itemIndex >= 0);
 		so->currPos.firstItem = itemIndex;
@@ -1858,7 +1682,7 @@ _bt_savepostingitem(BTScanOpaque so, int itemIndex, OffsetNumber offnum,
  * read lock, on that page.  If we do not hold the pin, we set so->currPos.buf
  * to InvalidBuffer.  We return true to indicate success.
  */
-static bool
+bool
 _bt_steppage(IndexScanDesc scan, ScanDirection dir)
 {
 	BTScanOpaque so = (BTScanOpaque) scan->opaque;
@@ -1886,6 +1710,9 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
 		if (so->markTuples)
 			memcpy(so->markTuples, so->currTuples,
 				   so->currPos.nextTupleOffset);
+		if (so->skipData)
+			memcpy(&so->skipData->markPos, &so->skipData->curPos,
+				   sizeof(BTSkipPosData));
 		so->markPos.itemIndex = so->markItemIndex;
 		so->markItemIndex = -1;
 	}
@@ -1965,13 +1792,14 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
  * If there are no more matching records in the given direction, we drop all
  * locks and pins, set so->currPos.buf to InvalidBuffer, and return false.
  */
-static bool
+bool
 _bt_readnextpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir)
 {
 	BTScanOpaque so = (BTScanOpaque) scan->opaque;
 	Relation	rel;
 	Page		page;
 	BTPageOpaque opaque;
+	OffsetNumber offnum;
 	bool		status = true;
 
 	rel = scan->indexRelation;
@@ -2003,7 +1831,8 @@ _bt_readnextpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir)
 				PredicateLockPage(rel, blkno, scan->xs_snapshot);
 				/* see if there are any matches on this page */
 				/* note that this will clear moreRight if we can stop */
-				if (_bt_readpage(scan, dir, P_FIRSTDATAKEY(opaque)))
+				offnum = P_FIRSTDATAKEY(opaque);
+				if (_bt_readpage(scan, dir, &offnum, true))
 					break;
 			}
 			else if (scan->parallel_scan != NULL)
@@ -2105,7 +1934,8 @@ _bt_readnextpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir)
 				PredicateLockPage(rel, BufferGetBlockNumber(so->currPos.buf), scan->xs_snapshot);
 				/* see if there are any matches on this page */
 				/* note that this will clear moreLeft if we can stop */
-				if (_bt_readpage(scan, dir, PageGetMaxOffsetNumber(page)))
+				offnum = PageGetMaxOffsetNumber(page);
+				if (_bt_readpage(scan, dir, &offnum, true))
 					break;
 			}
 			else if (scan->parallel_scan != NULL)
@@ -2173,7 +2003,7 @@ _bt_parallel_readpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir)
  * to be half-dead; the caller should check that condition and step left
  * again if it's important.
  */
-static Buffer
+Buffer
 _bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot)
 {
 	Page		page;
@@ -2437,7 +2267,7 @@ _bt_endpoint(IndexScanDesc scan, ScanDirection dir)
 	/*
 	 * Now load data from the first page of the scan.
 	 */
-	if (!_bt_readpage(scan, dir, start))
+	if (!_bt_readpage(scan, dir, &start, true))
 	{
 		/*
 		 * There's no actually-matching data on this page.  Try to advance to
@@ -2466,7 +2296,7 @@ _bt_endpoint(IndexScanDesc scan, ScanDirection dir)
  * _bt_initialize_more_data() -- initialize moreLeft/moreRight appropriately
  * for scan direction
  */
-static inline void
+inline void
 _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir)
 {
 	/* initialize moreLeft/moreRight appropriately for scan direction */
@@ -2483,3 +2313,25 @@ _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir)
 	so->numKilled = 0;			/* just paranoia */
 	so->markItemIndex = -1;		/* ditto */
 }
+
+/* Forward the call to either _bt_checkkeys, which is a simple
+ * and fastest way of checking keys, or to _bt_checkkeys_skip,
+ * which is a slower way to check the keys, but it will return extra
+ * information about whether or not we should stop reading the current page
+ * and skip. The expensive checking is only necessary when !isRegularMode, eg.
+ * when prefixDir!=postfixDir, which only happens when scanning from cursors backwards
+ */
+static inline bool
+_bt_checkkeys_extended(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
+					   ScanDirection dir, bool isRegularMode,
+					   bool *continuescan, int *prefixskipindex)
+{
+	if (isRegularMode)
+	{
+		return _bt_checkkeys(scan, tuple, tupnatts, dir, continuescan, prefixskipindex);
+	}
+	else
+	{
+		return _bt_checkkeys_skip(scan, tuple, tupnatts, dir, continuescan, prefixskipindex);
+	}
+}
diff --git a/src/backend/access/nbtree/nbtskip.c b/src/backend/access/nbtree/nbtskip.c
new file mode 100644
index 0000000000..7850230b9f
--- /dev/null
+++ b/src/backend/access/nbtree/nbtskip.c
@@ -0,0 +1,1317 @@
+/*-------------------------------------------------------------------------
+ *
+ * nbtskip.c
+ *	  Search code related to skip scan for postgres btrees.
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/nbtree/nbtskip.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/nbtree.h"
+#include "access/relscan.h"
+#include "catalog/catalog.h"
+#include "miscadmin.h"
+#include "utils/guc.h"
+#include "storage/predicate.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+
+static inline void _bt_update_scankey_with_tuple(BTScanInsert scankeys,
+											Relation indexRel, IndexTuple itup, int numattrs);
+static inline bool _bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key, Buffer buf);
+static inline int32 _bt_compare_until(Relation rel, BTScanInsert key, IndexTuple itup, int prefix);
+static inline void
+_bt_determine_next_action(IndexScanDesc scan, BTSkipCompareResult *cmp, OffsetNumber firstOffnum,
+						  OffsetNumber lastOffnum, ScanDirection postfixDir, BTSkipState *nextAction);
+static inline void
+_bt_determine_next_action_after_skip(BTScanOpaque so, BTSkipCompareResult *cmp, ScanDirection prefixDir,
+									 ScanDirection postfixDir, int skipped, BTSkipState *nextAction);
+static inline void
+_bt_determine_next_action_after_skip_extra(BTScanOpaque so, BTSkipCompareResult *cmp, BTSkipState *nextAction);
+static inline void _bt_copy_scankey(BTScanInsert to, BTScanInsert from, int numattrs);
+static inline IndexTuple _bt_get_tuple_from_offset(BTScanOpaque so, OffsetNumber curTupleOffnum);
+static void _bt_skip_update_scankey_after_read(IndexScanDesc scan, IndexTuple curTuple,
+											   ScanDirection prefixDir, ScanDirection postfixDir);
+static void _bt_skip_update_scankey_for_prefix_skip(IndexScanDesc scan, Relation indexRel,
+										int prefix, IndexTuple itup, ScanDirection prefixDir);
+static bool _bt_try_in_page_skip(IndexScanDesc scan, ScanDirection prefixDir);
+
+/*
+ * returns whether we're at the end of a scan.
+ * the scan position can be invalid even though we still
+ * should continue the scan. this happens for example when
+ * we're scanning with prefixDir!=postfixDir. when looking at the first
+ * prefix, we traverse the items within the prefix from max to min.
+ * if none of them match, we actually run off the start of the index,
+ * meaning none of the tuples within this prefix match. the scan pos becomes
+ * invalid, however, we do need to look further to the next prefix.
+ * therefore, this function still returns true in this particular case.
+ */
+static inline bool
+_bt_skip_is_valid(BTScanOpaque so, ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	return BTScanPosIsValid(so->currPos) ||
+			(!_bt_skip_is_regular_mode(prefixDir, postfixDir) &&
+			 so->skipData->curPos.nextAction != SkipStateStop);
+}
+
+/* try finding the next tuple to skip to within the local tuple storage.
+ * local tuple storage is filled during _bt_readpage with all matching
+ * tuples on that page. if we can find the next prefix here it saves
+ * us doing a scan from root.
+ * Note that this optimization only works with _bt_regular_mode == true
+ * If this is not the case, the local tuple workspace will always only
+ * contain tuples of one specific prefix (_bt_readpage will stop at
+ * the end of a prefx)
+ */
+static bool
+_bt_try_in_page_skip(IndexScanDesc scan, ScanDirection prefixDir)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+	BTScanPosItem *currItem;
+	BTSkip skip = so->skipData;
+	IndexTuple itup = NULL;
+	bool goback;
+	int low, high, starthigh, startlow;
+	int32		result,
+				cmpval;
+	BTScanInsert key = &so->skipData->curPos.skipScanKey;
+
+	currItem = &so->currPos.items[so->currPos.itemIndex];
+	itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+
+	_bt_skip_update_scankey_for_prefix_skip(scan, scan->indexRelation, skip->prefix, itup, prefixDir);
+
+	_bt_set_bsearch_flags(key->scankeys[key->keysz - 1].sk_strategy, prefixDir, &key->nextkey, &goback);
+
+	/* Requesting nextkey semantics while using scantid seems nonsensical */
+	Assert(!key->nextkey || key->scantid == NULL);
+	/* scantid-set callers must use _bt_binsrch_insert() on leaf pages */
+
+	startlow = low = ScanDirectionIsForward(prefixDir) ? so->currPos.itemIndex + 1 : so->currPos.firstItem;
+	starthigh = high = ScanDirectionIsForward(prefixDir) ? so->currPos.lastItem : so->currPos.itemIndex - 1;
+
+	/*
+	 * If there are no keys on the page, return the first available slot. Note
+	 * this covers two cases: the page is really empty (no keys), or it
+	 * contains only a high key.  The latter case is possible after vacuuming.
+	 * This can never happen on an internal page, however, since they are
+	 * never empty (an internal page must have children).
+	 */
+	if (unlikely(high < low))
+		return false;
+
+	/*
+	 * Binary search to find the first key on the page >= scan key, or first
+	 * key > scankey when nextkey is true.
+	 *
+	 * For nextkey=false (cmpval=1), the loop invariant is: all slots before
+	 * 'low' are < scan key, all slots at or after 'high' are >= scan key.
+	 *
+	 * For nextkey=true (cmpval=0), the loop invariant is: all slots before
+	 * 'low' are <= scan key, all slots at or after 'high' are > scan key.
+	 *
+	 * We can fall out when high == low.
+	 */
+	high++;						/* establish the loop invariant for high */
+
+	cmpval = key->nextkey ? 0 : 1;	/* select comparison value */
+
+	while (high > low)
+	{
+		int mid = low + ((high - low) / 2);
+
+		/* We have low <= mid < high, so mid points at a real slot */
+
+		currItem = &so->currPos.items[mid];
+		itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+		result = _bt_compare_until(scan->indexRelation, key, itup, skip->prefix);
+
+		if (result >= cmpval)
+			low = mid + 1;
+		else
+			high = mid;
+	}
+
+	if (high > starthigh)
+		return false;
+
+	if (goback)
+	{
+		low--;
+		if (low < startlow)
+			return false;
+	}
+
+	so->currPos.itemIndex = low;
+
+	return true;
+}
+
+/*
+ *  _bt_skip() -- Skip items that have the same prefix as the most recently
+ * 				  fetched index tuple.
+ *
+ * in: pinned, not locked
+ * out: pinned, not locked (unless end of scan, then unpinned)
+ */
+bool
+_bt_skip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+	BTScanPosItem *currItem;
+	IndexTuple itup = NULL;
+	OffsetNumber curTupleOffnum = InvalidOffsetNumber;
+	BTSkipCompareResult cmp;
+	BTSkip skip = so->skipData;
+	OffsetNumber first;
+
+	/* in page skip only works when prefixDir == postfixDir */
+	if (!_bt_skip_is_regular_mode(prefixDir, postfixDir) || !_bt_try_in_page_skip(scan, prefixDir))
+	{
+		currItem = &so->currPos.items[so->currPos.itemIndex];
+		itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+
+		so->skipData->curPos.nextSkipIndex = so->skipData->prefix;
+		_bt_skip_once(scan, &itup, &curTupleOffnum, true, prefixDir, postfixDir);
+		_bt_skip_until_match(scan, &itup, &curTupleOffnum, prefixDir, postfixDir);
+		if (!_bt_skip_is_always_valid(so))
+			return false;
+
+		first = curTupleOffnum;
+		_bt_readpage(scan, postfixDir, &curTupleOffnum, _bt_skip_is_regular_mode(prefixDir, postfixDir));
+		if (DEBUG1 >= log_min_messages || DEBUG1 >= client_min_messages)
+		{
+			print_itup(BufferGetBlockNumber(so->currPos.buf), _bt_get_tuple_from_offset(so, first), NULL, scan->indexRelation,
+						"first item on page compared after skip");
+			print_itup(BufferGetBlockNumber(so->currPos.buf), _bt_get_tuple_from_offset(so, curTupleOffnum), NULL, scan->indexRelation,
+						"last item on page compared after skip");
+		}
+		_bt_compare_current_item(scan, _bt_get_tuple_from_offset(so, curTupleOffnum),
+								 IndexRelationGetNumberOfAttributes(scan->indexRelation),
+								 postfixDir, _bt_skip_is_regular_mode(prefixDir, postfixDir), &cmp);
+		_bt_determine_next_action(scan, &cmp, first, curTupleOffnum, postfixDir, &skip->curPos.nextAction);
+		skip->curPos.nextDirection = prefixDir;
+		skip->curPos.nextSkipIndex = cmp.prefixSkipIndex;
+		_bt_skip_update_scankey_after_read(scan, _bt_get_tuple_from_offset(so, curTupleOffnum), prefixDir, postfixDir);
+
+		_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
+	}
+
+	/* prepare for the call to _bt_next, because _bt_next increments this to get to the tuple we want to be at */
+	if (ScanDirectionIsForward(postfixDir))
+		so->currPos.itemIndex--;
+	else
+		so->currPos.itemIndex++;
+
+	return true;
+}
+
+static IndexTuple
+_bt_get_tuple_from_offset(BTScanOpaque so, OffsetNumber curTupleOffnum)
+{
+	Page page = BufferGetPage(so->currPos.buf);
+	return (IndexTuple) PageGetItem(page, PageGetItemId(page, curTupleOffnum));
+}
+
+static void
+_bt_determine_next_action(IndexScanDesc scan, BTSkipCompareResult *cmp, OffsetNumber firstOffnum, OffsetNumber lastOffnum, ScanDirection postfixDir, BTSkipState *nextAction)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+
+	if (cmp->fullKeySkip)
+		*nextAction = SkipStateStop;
+	else if (ScanDirectionIsForward(postfixDir))
+	{
+		OffsetNumber firstItem = firstOffnum, lastItem = lastOffnum;
+		if (cmp->prefixSkip)
+		{
+			*nextAction = SkipStateSkip;
+		}
+		else
+		{
+			IndexTuple toCmp;
+			if (so->currPos.lastItem >= so->currPos.firstItem)
+				toCmp = _bt_get_tuple_from_offset(so, so->currPos.items[so->currPos.lastItem].indexOffset);
+			else
+				toCmp = _bt_get_tuple_from_offset(so, firstItem);
+			_bt_update_scankey_with_tuple(&so->skipData->currentTupleKey,
+										  scan->indexRelation, toCmp, RelationGetNumberOfAttributes(scan->indexRelation));
+			if (_bt_has_extra_quals_after_skip(so->skipData, postfixDir, so->skipData->prefix) && !cmp->equal &&
+					(cmp->prefixCmpResult != 0 ||
+					 _bt_compare_until(scan->indexRelation, &so->skipData->currentTupleKey,
+									   _bt_get_tuple_from_offset(so, lastItem), so->skipData->prefix) != 0))
+				*nextAction = SkipStateSkipExtra;
+			else
+				*nextAction = SkipStateNext;
+		}
+	}
+	else
+	{
+		OffsetNumber firstItem = lastOffnum, lastItem = firstOffnum;
+		if (cmp->prefixSkip)
+		{
+			*nextAction = SkipStateSkip;
+		}
+		else
+		{
+			IndexTuple toCmp;
+			if (so->currPos.lastItem >= so->currPos.firstItem)
+				toCmp = _bt_get_tuple_from_offset(so, so->currPos.items[so->currPos.firstItem].indexOffset);
+			else
+				toCmp = _bt_get_tuple_from_offset(so, lastItem);
+			_bt_update_scankey_with_tuple(&so->skipData->currentTupleKey,
+										  scan->indexRelation, toCmp, RelationGetNumberOfAttributes(scan->indexRelation));
+			if (_bt_has_extra_quals_after_skip(so->skipData, postfixDir, so->skipData->prefix) && !cmp->equal &&
+					(cmp->prefixCmpResult != 0 ||
+					 _bt_compare_until(scan->indexRelation, &so->skipData->currentTupleKey,
+									   _bt_get_tuple_from_offset(so, firstItem), so->skipData->prefix) != 0))
+				*nextAction = SkipStateSkipExtra;
+			else
+				*nextAction = SkipStateNext;
+		}
+	}
+}
+
+static inline bool
+_bt_should_prefix_skip(BTSkipCompareResult *cmp)
+{
+	return cmp->prefixSkip || cmp->prefixCmpResult != 0;
+}
+
+static inline void
+_bt_determine_next_action_after_skip(BTScanOpaque so, BTSkipCompareResult *cmp, ScanDirection prefixDir,
+									 ScanDirection postfixDir, int skipped, BTSkipState *nextAction)
+{
+	if (!_bt_skip_is_always_valid(so) || cmp->fullKeySkip)
+		*nextAction = SkipStateStop;
+	else if (cmp->equal && _bt_skip_is_regular_mode(prefixDir, postfixDir))
+		*nextAction = SkipStateNext;
+	else if (_bt_should_prefix_skip(cmp) && _bt_skip_is_regular_mode(prefixDir, postfixDir) &&
+			 ((ScanDirectionIsForward(prefixDir) && cmp->skCmpResult == -1) ||
+			  (ScanDirectionIsBackward(prefixDir) && cmp->skCmpResult == 1)))
+		*nextAction = SkipStateSkip;
+	else if (!_bt_skip_is_regular_mode(prefixDir, postfixDir) ||
+			 _bt_has_extra_quals_after_skip(so->skipData, postfixDir, skipped) ||
+			 cmp->prefixCmpResult != 0)
+		*nextAction = SkipStateSkipExtra;
+	else
+		*nextAction = SkipStateNext;
+}
+
+static inline void
+_bt_determine_next_action_after_skip_extra(BTScanOpaque so, BTSkipCompareResult *cmp, BTSkipState *nextAction)
+{
+	if (!_bt_skip_is_always_valid(so) || cmp->fullKeySkip)
+		*nextAction = SkipStateStop;
+	else if (cmp->equal)
+		*nextAction = SkipStateNext;
+	else if (_bt_should_prefix_skip(cmp))
+		*nextAction = SkipStateSkip;
+	else
+		*nextAction = SkipStateNext;
+}
+
+/* just a debug function that prints a scankey. will be removed for final patch */
+static inline void
+_print_skey(IndexScanDesc scan, BTScanInsert scanKey)
+{
+	Oid			typOutput;
+	bool		varlenatype;
+	char	   *val;
+	int i;
+	Relation rel = scan->indexRelation;
+
+	for (i = 0; i < scanKey->keysz; i++)
+	{
+		ScanKey cur = &scanKey->scankeys[i];
+		if (!IsCatalogRelation(rel))
+		{
+			if (!(cur->sk_flags & SK_ISNULL))
+			{
+				if (cur->sk_subtype != InvalidOid)
+					getTypeOutputInfo(cur->sk_subtype,
+									  &typOutput, &varlenatype);
+				else
+					getTypeOutputInfo(rel->rd_opcintype[i],
+									  &typOutput, &varlenatype);
+				val = OidOutputFunctionCall(typOutput, cur->sk_argument);
+				if (val)
+				{
+					elog(DEBUG1, "%s sk attr %d val: %s (%s, %s)",
+						 RelationGetRelationName(rel), i, val,
+						 (cur->sk_flags & SK_BT_NULLS_FIRST) != 0 ? "NULLS FIRST" : "NULLS LAST",
+						 (cur->sk_flags & SK_BT_DESC) != 0 ? "DESC" : "ASC");
+					pfree(val);
+				}
+			}
+			else
+			{
+				elog(DEBUG1, "%s sk attr %d val: NULL (%s, %s)",
+					 RelationGetRelationName(rel), i,
+					 (cur->sk_flags & SK_BT_NULLS_FIRST) != 0 ? "NULLS FIRST" : "NULLS LAST",
+					 (cur->sk_flags & SK_BT_DESC) != 0 ? "DESC" : "ASC");
+			}
+		}
+	}
+}
+
+bool
+_bt_checkkeys_skip(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
+				   ScanDirection dir, bool *continuescan, int *prefixskipindex)
+{
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+
+	bool match = _bt_checkkeys(scan, tuple, tupnatts, dir, continuescan, prefixskipindex);
+	int prefixCmpResult = _bt_compare_until(scan->indexRelation, &skip->curPos.skipScanKey, tuple, skip->prefix);
+	if (*prefixskipindex == -1 && prefixCmpResult != 0)
+	{
+		*prefixskipindex = skip->prefix;
+		return false;
+	}
+	else
+	{
+		bool newcont;
+		_bt_checkkeys_threeway(scan, tuple, tupnatts, dir, &newcont, prefixskipindex);
+		if (*prefixskipindex == -1 && prefixCmpResult != 0)
+		{
+			*prefixskipindex = skip->prefix;
+			return false;
+		}
+	}
+	return match;
+}
+
+/*
+ * Compare a scankey with a given tuple but only the first prefix columns
+ * This function returns 0 if the first 'prefix' columns are equal
+ * -1 if key < itup for the first prefix columns
+ * 1 if key > itup for the first prefix columns
+ */
+int32
+_bt_compare_until(Relation rel,
+			BTScanInsert key,
+			IndexTuple itup,
+			int prefix)
+{
+	TupleDesc	itupdesc = RelationGetDescr(rel);
+	ScanKey		scankey;
+	int			ncmpkey;
+
+	Assert(key->keysz <= IndexRelationGetNumberOfKeyAttributes(rel));
+
+	ncmpkey = Min(prefix, key->keysz);
+	scankey = key->scankeys;
+	for (int i = 1; i <= ncmpkey; i++)
+	{
+		Datum		datum;
+		bool		isNull;
+		int32		result;
+
+		datum = index_getattr(itup, scankey->sk_attno, itupdesc, &isNull);
+
+		/* see comments about NULLs handling in btbuild */
+		if (scankey->sk_flags & SK_ISNULL)	/* key is NULL */
+		{
+			if (isNull)
+				result = 0;		/* NULL "=" NULL */
+			else if (scankey->sk_flags & SK_BT_NULLS_FIRST)
+				result = -1;	/* NULL "<" NOT_NULL */
+			else
+				result = 1;		/* NULL ">" NOT_NULL */
+		}
+		else if (isNull)		/* key is NOT_NULL and item is NULL */
+		{
+			if (scankey->sk_flags & SK_BT_NULLS_FIRST)
+				result = 1;		/* NOT_NULL ">" NULL */
+			else
+				result = -1;	/* NOT_NULL "<" NULL */
+		}
+		else
+		{
+			/*
+			 * The sk_func needs to be passed the index value as left arg and
+			 * the sk_argument as right arg (they might be of different
+			 * types).  Since it is convenient for callers to think of
+			 * _bt_compare as comparing the scankey to the index item, we have
+			 * to flip the sign of the comparison result.  (Unless it's a DESC
+			 * column, in which case we *don't* flip the sign.)
+			 */
+			result = DatumGetInt32(FunctionCall2Coll(&scankey->sk_func,
+													 scankey->sk_collation,
+													 datum,
+													 scankey->sk_argument));
+
+			if (!(scankey->sk_flags & SK_BT_DESC))
+				INVERT_COMPARE_RESULT(result);
+		}
+
+		/* if the keys are unequal, return the difference */
+		if (result != 0)
+			return result;
+
+		scankey++;
+	}
+	return 0;
+}
+
+
+/*
+ * Create initial scankeys for skipping and stores them in the skipData
+ * structure
+ */
+void
+_bt_skip_create_scankeys(Relation rel, BTScanOpaque so)
+{
+	int keysCount;
+	BTSkip skip = so->skipData;
+	StrategyNumber stratTotal;
+	ScanKey		keyPointers[INDEX_MAX_KEYS];
+	bool goback;
+	/* we need to create both forward and backward keys because the scan direction
+	 * may change at any moment in scans with a cursor.
+	 * we could technically delay creation of the second until first use as an optimization
+	 * but that is not implemented yet.
+	 */
+	keysCount = _bt_choose_scan_keys(so->keyData, so->numberOfKeys, ForwardScanDirection,
+									 keyPointers, skip->fwdNotNullKeys, &stratTotal, skip->prefix);
+	_bt_create_insertion_scan_key(rel, ForwardScanDirection, keyPointers, keysCount,
+								  &skip->fwdScanKey, &stratTotal, &goback);
+
+	keysCount = _bt_choose_scan_keys(so->keyData, so->numberOfKeys, BackwardScanDirection,
+									 keyPointers, skip->bwdNotNullKeys, &stratTotal, skip->prefix);
+	_bt_create_insertion_scan_key(rel, BackwardScanDirection, keyPointers, keysCount,
+								  &skip->bwdScanKey, &stratTotal, &goback);
+
+	_bt_metaversion(rel, &skip->curPos.skipScanKey.heapkeyspace,
+					&skip->curPos.skipScanKey.allequalimage);
+	skip->curPos.skipScanKey.anynullkeys = false; /* unused */
+	skip->curPos.skipScanKey.nextkey = false;
+	skip->curPos.skipScanKey.pivotsearch = false;
+	skip->curPos.skipScanKey.scantid = NULL;
+	skip->curPos.skipScanKey.keysz = 0;
+
+	/* setup scankey for the current tuple as well. it's not necessarily that
+	 * we will use the data from the current tuple already,
+	 * but we need the rest of the data structure to be set up correctly
+	 * for when we use it to create skip->curPos.skipScanKey keys later
+	 */
+	_bt_mkscankey(rel, NULL, &skip->currentTupleKey);
+}
+
+/*
+ * _bt_scankey_within_page() -- check if the provided scankey could be found
+ * 								within a page, specified by the buffer.
+ */
+static inline bool
+_bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+						Buffer buf)
+{
+	/* @todo: optimization is still possible here to
+	 * only check either the low or the high, depending on
+	 * which direction *we came from* AND which direction
+	 * *we are planning to scan*
+	 */
+	OffsetNumber low, high;
+	Page page = BufferGetPage(buf);
+	BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+	int			ans_lo, ans_hi;
+
+	low = P_FIRSTDATAKEY(opaque);
+	high = PageGetMaxOffsetNumber(page);
+
+	if (unlikely(high < low))
+		return false;
+
+	ans_lo = _bt_compare(scan->indexRelation,
+					   key, page, low);
+	ans_hi = _bt_compare(scan->indexRelation,
+					   key, page, high);
+	if (key->nextkey)
+	{
+		/* sk < last && sk >= first */
+		return ans_lo >= 0 && ans_hi == -1;
+	}
+	else
+	{
+		/* sk <= last && sk > first */
+		return ans_lo == 1 && ans_hi <= 0;
+	}
+}
+
+/* in: pinned and locked, out: pinned and locked (unless end of scan) */
+static void
+_bt_skip_find(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum,
+			  BTScanInsert scanKey, ScanDirection dir)
+{
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	OffsetNumber offnum;
+	BTStack stack;
+	Buffer buf;
+	bool goback;
+	Page		page;
+	BTPageOpaque opaque;
+	OffsetNumber minoff;
+	Relation rel = scan->indexRelation;
+	bool fromroot = true;
+
+	_bt_set_bsearch_flags(scanKey->scankeys[scanKey->keysz - 1].sk_strategy, dir, &scanKey->nextkey, &goback);
+
+	if ((DEBUG1 >= log_min_messages || DEBUG1 >= client_min_messages) && !IsCatalogRelation(rel))
+	{
+		if (*curTuple != NULL)
+			print_itup(BufferGetBlockNumber(so->currPos.buf), *curTuple, NULL, rel,
+						"before btree search");
+
+		elog(DEBUG1, "%s searching tree with %d keys, nextkey=%d, goback=%d",
+			 RelationGetRelationName(rel), scanKey->keysz, scanKey->nextkey,
+			 goback);
+
+		_print_skey(scan, scanKey);
+	}
+
+	if (*curTupleOffnum == InvalidOffsetNumber)
+	{
+		BTScanPosUnpinIfPinned(so->currPos);
+	}
+	else
+	{
+		if (_bt_scankey_within_page(scan, scanKey, so->currPos.buf))
+		{
+			elog(DEBUG1, "sk found within current page");
+
+			offnum = _bt_binsrch(scan->indexRelation, scanKey, so->currPos.buf);
+			fromroot = false;
+		}
+		else
+		{
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			ReleaseBuffer(so->currPos.buf);
+			so->currPos.buf = InvalidBuffer;
+		}
+	}
+
+	/*
+	 * We haven't found scan key within the current page, so let's scan from
+	 * the root. Use _bt_search and _bt_binsrch to get the buffer and offset
+	 * number
+	 */
+	if (fromroot)
+	{
+		stack = _bt_search(scan->indexRelation, scanKey,
+						   &buf, BT_READ, scan->xs_snapshot);
+		_bt_freestack(stack);
+		so->currPos.buf = buf;
+
+		offnum = _bt_binsrch(scan->indexRelation, scanKey, buf);
+
+		/* Lock the page for SERIALIZABLE transactions */
+		PredicateLockPage(scan->indexRelation, BufferGetBlockNumber(so->currPos.buf),
+						  scan->xs_snapshot);
+	}
+
+	page = BufferGetPage(so->currPos.buf);
+	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+
+	if (goback)
+	{
+		offnum = OffsetNumberPrev(offnum);
+		minoff = P_FIRSTDATAKEY(opaque);
+		if (offnum < minoff)
+		{
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			if (!_bt_step_back_page(scan, curTuple, curTupleOffnum))
+				return;
+			page = BufferGetPage(so->currPos.buf);
+			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+			offnum = PageGetMaxOffsetNumber(page);
+		}
+	}
+	else if (offnum > PageGetMaxOffsetNumber(page))
+	{
+		BlockNumber next = opaque->btpo_next;
+		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		if (!_bt_step_forward_page(scan, next, curTuple, curTupleOffnum))
+			return;
+		page = BufferGetPage(so->currPos.buf);
+		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+		offnum = P_FIRSTDATAKEY(opaque);
+	}
+
+	/* We know in which direction to look */
+	_bt_initialize_more_data(so, dir);
+
+	*curTupleOffnum = offnum;
+	*curTuple = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
+	so->currPos.currPage = BufferGetBlockNumber(so->currPos.buf);
+
+	if (DEBUG1 >= log_min_messages || DEBUG1 >= client_min_messages)
+		print_itup(BufferGetBlockNumber(so->currPos.buf), *curTuple, NULL, rel,
+					"after btree search");
+}
+
+static inline bool
+_bt_step_one_page(IndexScanDesc scan, ScanDirection dir, IndexTuple *curTuple,
+				  OffsetNumber *curTupleOffnum)
+{
+	if (ScanDirectionIsForward(dir))
+	{
+		BTScanOpaque so = (BTScanOpaque) scan->opaque;
+		return _bt_step_forward_page(scan, so->currPos.nextPage, curTuple, curTupleOffnum);
+	}
+	else
+	{
+		return _bt_step_back_page(scan, curTuple, curTupleOffnum);
+	}
+}
+
+/* in: possibly pinned, but unlocked, out: pinned and locked */
+bool
+_bt_step_forward_page(IndexScanDesc scan, BlockNumber next, IndexTuple *curTuple,
+					  OffsetNumber *curTupleOffnum)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+	Relation rel = scan->indexRelation;
+	BlockNumber blkno = next;
+	Page page;
+	BTPageOpaque opaque;
+
+	Assert(BTScanPosIsValid(so->currPos));
+
+	/* Before leaving current page, deal with any killed items */
+	if (so->numKilled > 0)
+		_bt_killitems(scan);
+
+	/*
+	 * Before we modify currPos, make a copy of the page data if there was a
+	 * mark position that needs it.
+	 */
+	if (so->markItemIndex >= 0)
+	{
+		/* bump pin on current buffer for assignment to mark buffer */
+		if (BTScanPosIsPinned(so->currPos))
+			IncrBufferRefCount(so->currPos.buf);
+		memcpy(&so->markPos, &so->currPos,
+			   offsetof(BTScanPosData, items[1]) +
+			   so->currPos.lastItem * sizeof(BTScanPosItem));
+		if (so->markTuples)
+			memcpy(so->markTuples, so->currTuples,
+				   so->currPos.nextTupleOffset);
+		so->markPos.itemIndex = so->markItemIndex;
+		if (so->skipData)
+			memcpy(&so->skipData->markPos, &so->skipData->curPos,
+				   sizeof(BTSkipPosData));
+		so->markItemIndex = -1;
+	}
+
+	/* Remember we left a page with data */
+	so->currPos.moreLeft = true;
+
+	/* release the previous buffer, if pinned */
+	BTScanPosUnpinIfPinned(so->currPos);
+
+	{
+		for (;;)
+		{
+			/*
+			 * if we're at end of scan, give up and mark parallel scan as
+			 * done, so that all the workers can finish their scan
+			 */
+			if (blkno == P_NONE)
+			{
+				_bt_parallel_done(scan);
+				BTScanPosInvalidate(so->currPos);
+				return false;
+			}
+
+			/* check for interrupts while we're not holding any buffer lock */
+			CHECK_FOR_INTERRUPTS();
+			/* step right one page */
+			so->currPos.buf = _bt_getbuf(rel, blkno, BT_READ);
+			page = BufferGetPage(so->currPos.buf);
+			TestForOldSnapshot(scan->xs_snapshot, rel, page);
+			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+			/* check for deleted page */
+			if (!P_IGNORE(opaque))
+			{
+				PredicateLockPage(rel, blkno, scan->xs_snapshot);
+				*curTupleOffnum = P_FIRSTDATAKEY(opaque);
+				*curTuple = _bt_get_tuple_from_offset(so, *curTupleOffnum);
+				break;
+			}
+
+			blkno = opaque->btpo_next;
+			_bt_relbuf(rel, so->currPos.buf);
+		}
+	}
+
+	return true;
+}
+
+/* in: possibly pinned, but unlocked, out: pinned and locked */
+bool
+_bt_step_back_page(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+
+	Assert(BTScanPosIsValid(so->currPos));
+
+	/* Before leaving current page, deal with any killed items */
+	if (so->numKilled > 0)
+		_bt_killitems(scan);
+
+	/*
+	 * Before we modify currPos, make a copy of the page data if there was a
+	 * mark position that needs it.
+	 */
+	if (so->markItemIndex >= 0)
+	{
+		/* bump pin on current buffer for assignment to mark buffer */
+		if (BTScanPosIsPinned(so->currPos))
+			IncrBufferRefCount(so->currPos.buf);
+		memcpy(&so->markPos, &so->currPos,
+			   offsetof(BTScanPosData, items[1]) +
+			   so->currPos.lastItem * sizeof(BTScanPosItem));
+		if (so->markTuples)
+			memcpy(so->markTuples, so->currTuples,
+				   so->currPos.nextTupleOffset);
+		if (so->skipData)
+			memcpy(&so->skipData->markPos, &so->skipData->curPos,
+				   sizeof(BTSkipPosData));
+		so->markPos.itemIndex = so->markItemIndex;
+		so->markItemIndex = -1;
+	}
+
+	/* Remember we left a page with data */
+	so->currPos.moreRight = true;
+
+	/* Not parallel, so just use our own notion of the current page */
+
+	{
+		Relation	rel;
+		Page		page;
+		BTPageOpaque opaque;
+
+		rel = scan->indexRelation;
+
+		if (BTScanPosIsPinned(so->currPos))
+			LockBuffer(so->currPos.buf, BT_READ);
+		else
+			so->currPos.buf = _bt_getbuf(rel, so->currPos.currPage, BT_READ);
+
+		for (;;)
+		{
+			/* Step to next physical page */
+			so->currPos.buf = _bt_walk_left(rel, so->currPos.buf,
+											scan->xs_snapshot);
+
+			/* if we're physically at end of index, return failure */
+			if (so->currPos.buf == InvalidBuffer)
+			{
+				BTScanPosInvalidate(so->currPos);
+				return false;
+			}
+
+			/*
+			 * Okay, we managed to move left to a non-deleted page. Done if
+			 * it's not half-dead and contains matching tuples. Else loop back
+			 * and do it all again.
+			 */
+			page = BufferGetPage(so->currPos.buf);
+			TestForOldSnapshot(scan->xs_snapshot, rel, page);
+			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+			if (!P_IGNORE(opaque))
+			{
+				PredicateLockPage(rel, BufferGetBlockNumber(so->currPos.buf), scan->xs_snapshot);
+				*curTupleOffnum = PageGetMaxOffsetNumber(page);
+				*curTuple = _bt_get_tuple_from_offset(so, *curTupleOffnum);
+				break;
+			}
+		}
+	}
+
+	return true;
+}
+
+/* holds lock as long as curTupleOffnum != InvalidOffsetNumber */
+bool
+_bt_skip_find_next(IndexScanDesc scan, IndexTuple curTuple, OffsetNumber curTupleOffnum,
+				   ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+	BTSkipCompareResult cmp;
+
+	while (_bt_skip_is_valid(so, prefixDir, postfixDir))
+	{
+		bool found;
+		_bt_skip_until_match(scan, &curTuple, &curTupleOffnum, prefixDir, postfixDir);
+
+		while (_bt_skip_is_always_valid(so))
+		{
+			OffsetNumber first = curTupleOffnum;
+			found = _bt_readpage(scan, postfixDir, &curTupleOffnum,
+								 _bt_skip_is_regular_mode(prefixDir, postfixDir));
+			if (DEBUG1 >= log_min_messages || DEBUG1 >= client_min_messages)
+			{
+				print_itup(BufferGetBlockNumber(so->currPos.buf),
+						   _bt_get_tuple_from_offset(so, first), NULL, scan->indexRelation,
+							"first item on page compared");
+				print_itup(BufferGetBlockNumber(so->currPos.buf),
+						   _bt_get_tuple_from_offset(so, curTupleOffnum), NULL, scan->indexRelation,
+							"last item on page compared");
+			}
+			_bt_compare_current_item(scan, _bt_get_tuple_from_offset(so, curTupleOffnum),
+									 IndexRelationGetNumberOfAttributes(scan->indexRelation),
+									 postfixDir, _bt_skip_is_regular_mode(prefixDir, postfixDir), &cmp);
+			_bt_determine_next_action(scan, &cmp, first, curTupleOffnum,
+									  postfixDir, &skip->curPos.nextAction);
+			skip->curPos.nextDirection = prefixDir;
+			skip->curPos.nextSkipIndex = cmp.prefixSkipIndex;
+
+			if (found)
+			{
+				_bt_skip_update_scankey_after_read(scan, _bt_get_tuple_from_offset(so, curTupleOffnum),
+												   prefixDir, postfixDir);
+				return true;
+			}
+			else if (skip->curPos.nextAction == SkipStateNext)
+			{
+				if (curTupleOffnum != InvalidOffsetNumber)
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+				if (!_bt_step_one_page(scan, postfixDir, &curTuple, &curTupleOffnum))
+					return false;
+			}
+			else if (skip->curPos.nextAction == SkipStateSkip || skip->curPos.nextAction == SkipStateSkipExtra)
+			{
+				curTuple = _bt_get_tuple_from_offset(so, curTupleOffnum);
+				_bt_skip_update_scankey_after_read(scan, curTuple, prefixDir, postfixDir);
+				LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+				curTupleOffnum = InvalidOffsetNumber;
+				curTuple = NULL;
+				break;
+			}
+			else if (skip->curPos.nextAction == SkipStateStop)
+			{
+				LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+				BTScanPosUnpinIfPinned(so->currPos);
+				BTScanPosInvalidate(so->currPos);
+				return false;
+			}
+			else
+			{
+				Assert(false);
+			}
+		}
+	}
+	return false;
+}
+
+void
+_bt_skip_until_match(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum,
+					 ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+	while (_bt_skip_is_valid(so, prefixDir, postfixDir) &&
+		   (skip->curPos.nextAction == SkipStateSkip || skip->curPos.nextAction == SkipStateSkipExtra))
+	{
+		_bt_skip_once(scan, curTuple, curTupleOffnum,
+					  skip->curPos.nextAction == SkipStateSkip, prefixDir, postfixDir);
+	}
+}
+
+void
+_bt_compare_current_item(IndexScanDesc scan, IndexTuple tuple, int tupnatts, ScanDirection dir,
+						 bool isRegularMode, BTSkipCompareResult* cmp)
+{
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+
+	if (_bt_skip_is_always_valid(so))
+	{
+		bool continuescan = true;
+
+		cmp->equal = _bt_checkkeys(scan, tuple, tupnatts, dir, &continuescan, &cmp->prefixSkipIndex);
+		cmp->fullKeySkip = !continuescan;
+		/* prefix can be smaller than scankey due to extra quals being added
+		 * therefore we need to compare both. @todo this can be optimized into one function call */
+		cmp->prefixCmpResult = _bt_compare_until(scan->indexRelation, &skip->curPos.skipScanKey, tuple, skip->prefix);
+		cmp->skCmpResult = _bt_compare_until(scan->indexRelation,
+											 &skip->curPos.skipScanKey, tuple, skip->curPos.skipScanKey.keysz);
+		if (cmp->prefixSkipIndex == -1)
+		{
+			cmp->prefixSkipIndex = skip->prefix;
+			cmp->prefixSkip = ScanDirectionIsForward(dir) ? cmp->prefixCmpResult < 0 : cmp->prefixCmpResult > 0;
+		}
+		else
+		{
+			int newskip = -1;
+			_bt_checkkeys_threeway(scan, tuple, tupnatts, dir, &continuescan, &newskip);
+			if (newskip != -1)
+			{
+				cmp->prefixSkip = true;
+				cmp->prefixSkipIndex = newskip;
+			}
+			else
+			{
+				cmp->prefixSkip = ScanDirectionIsForward(dir) ? cmp->prefixCmpResult < 0 : cmp->prefixCmpResult > 0;
+				cmp->prefixSkipIndex = skip->prefix;
+			}
+		}
+
+		if (DEBUG1 >= log_min_messages || DEBUG1 >= client_min_messages)
+		{
+			print_itup(BufferGetBlockNumber(so->currPos.buf), tuple, NULL, scan->indexRelation,
+						"compare item");
+			_print_skey(scan, &skip->curPos.skipScanKey);
+			elog(DEBUG1, "result: eq: %d fkskip: %d pfxskip: %d prefixcmpres: %d prefixskipidx: %d", cmp->equal, cmp->fullKeySkip,
+				 _bt_should_prefix_skip(cmp), cmp->prefixCmpResult, cmp->prefixSkipIndex);
+		}
+	}
+	else
+	{
+		/* we cannot stop the scan if !isRegularMode - then we do need to skip to the next prefix */
+		cmp->fullKeySkip = isRegularMode;
+		cmp->equal = false;
+		cmp->prefixCmpResult = -2;
+		cmp->prefixSkip = true;
+		cmp->prefixSkipIndex = skip->prefix;
+		cmp->skCmpResult = -2;
+	}
+}
+
+void
+_bt_skip_once(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum,
+			  bool forceSkip, ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+	BTSkipCompareResult cmp;
+	bool doskip = forceSkip;
+	int skipIndex = skip->curPos.nextSkipIndex;
+	skip->curPos.nextAction = SkipStateSkipExtra;
+
+	while (doskip)
+	{
+		int toskip = skipIndex;
+		if (*curTuple != NULL)
+		{
+			if (skip->prefix <= skipIndex || !_bt_skip_is_regular_mode(prefixDir, postfixDir))
+			{
+				toskip = skip->prefix;
+			}
+
+			_bt_skip_update_scankey_for_prefix_skip(scan, scan->indexRelation,
+													toskip, *curTuple, prefixDir);
+		}
+
+		_bt_skip_find(scan, curTuple, curTupleOffnum, &skip->curPos.skipScanKey, prefixDir);
+
+		if (_bt_skip_is_always_valid(so))
+		{
+			_bt_skip_update_scankey_for_extra_skip(scan, scan->indexRelation,
+												   prefixDir, prefixDir, true, *curTuple);
+			_bt_compare_current_item(scan, *curTuple,
+									 IndexRelationGetNumberOfAttributes(scan->indexRelation),
+									 prefixDir,
+									 _bt_skip_is_regular_mode(prefixDir, postfixDir), &cmp);
+			skipIndex = cmp.prefixSkipIndex;
+			_bt_determine_next_action_after_skip(so, &cmp, prefixDir,
+												 postfixDir, toskip, &skip->curPos.nextAction);
+		}
+		else
+		{
+			skip->curPos.nextAction = SkipStateStop;
+		}
+		doskip = skip->curPos.nextAction == SkipStateSkip;
+	}
+	if (skip->curPos.nextAction != SkipStateStop && skip->curPos.nextAction != SkipStateNext)
+		_bt_skip_extra_conditions(scan, curTuple, curTupleOffnum, prefixDir, postfixDir, &cmp);
+}
+
+void
+_bt_skip_extra_conditions(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum,
+						  ScanDirection prefixDir, ScanDirection postfixDir, BTSkipCompareResult *cmp)
+{
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+	bool regularMode = _bt_skip_is_regular_mode(prefixDir, postfixDir);
+	if (_bt_skip_is_always_valid(so))
+	{
+		do
+		{
+			if (*curTuple != NULL)
+				_bt_skip_update_scankey_for_extra_skip(scan, scan->indexRelation,
+													   postfixDir, prefixDir, false, *curTuple);
+			_bt_skip_find(scan, curTuple, curTupleOffnum, &skip->curPos.skipScanKey, postfixDir);
+			_bt_compare_current_item(scan, *curTuple,
+									 IndexRelationGetNumberOfAttributes(scan->indexRelation),
+									 postfixDir, _bt_skip_is_regular_mode(prefixDir, postfixDir), cmp);
+		} while (regularMode && cmp->prefixCmpResult != 0 && !cmp->equal && !cmp->fullKeySkip);
+		skip->curPos.nextSkipIndex = cmp->prefixSkipIndex;
+	}
+	_bt_determine_next_action_after_skip_extra(so, cmp, &skip->curPos.nextAction);
+}
+
+static void
+_bt_skip_update_scankey_after_read(IndexScanDesc scan, IndexTuple curTuple,
+								   ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+	if (skip->curPos.nextAction == SkipStateSkip)
+	{
+		int toskip = skip->curPos.nextSkipIndex;
+		if (skip->prefix <= skip->curPos.nextSkipIndex ||
+				!_bt_skip_is_regular_mode(prefixDir, postfixDir))
+		{
+			toskip = skip->prefix;
+		}
+
+		if (_bt_skip_is_regular_mode(prefixDir, postfixDir))
+			_bt_skip_update_scankey_for_prefix_skip(scan, scan->indexRelation,
+													toskip, curTuple, prefixDir);
+		else
+			_bt_skip_update_scankey_for_prefix_skip(scan, scan->indexRelation,
+													toskip, NULL, prefixDir);
+	}
+	else if (skip->curPos.nextAction == SkipStateSkipExtra)
+	{
+		_bt_skip_update_scankey_for_extra_skip(scan, scan->indexRelation,
+											   postfixDir, prefixDir, false, curTuple);
+	}
+}
+
+static inline int
+_bt_compare_one(ScanKey scankey, Datum datum2, bool isNull2)
+{
+	int32		result;
+	Datum datum1 = scankey->sk_argument;
+	bool isNull1 = scankey->sk_flags & SK_ISNULL;
+	/* see comments about NULLs handling in btbuild */
+	if (isNull1)	/* key is NULL */
+	{
+		if (isNull2)
+			result = 0;		/* NULL "=" NULL */
+		else if (scankey->sk_flags & SK_BT_NULLS_FIRST)
+			result = -1;	/* NULL "<" NOT_NULL */
+		else
+			result = 1;		/* NULL ">" NOT_NULL */
+	}
+	else if (isNull2)		/* key is NOT_NULL and item is NULL */
+	{
+		if (scankey->sk_flags & SK_BT_NULLS_FIRST)
+			result = 1;		/* NOT_NULL ">" NULL */
+		else
+			result = -1;	/* NOT_NULL "<" NULL */
+	}
+	else
+	{
+		/*
+		 * The sk_func needs to be passed the index value as left arg and
+		 * the sk_argument as right arg (they might be of different
+		 * types).  Since it is convenient for callers to think of
+		 * _bt_compare as comparing the scankey to the index item, we have
+		 * to flip the sign of the comparison result.  (Unless it's a DESC
+		 * column, in which case we *don't* flip the sign.)
+		 */
+		result = DatumGetInt32(FunctionCall2Coll(&scankey->sk_func,
+												 scankey->sk_collation,
+												 datum2,
+												 datum1));
+
+		if (!(scankey->sk_flags & SK_BT_DESC))
+			INVERT_COMPARE_RESULT(result);
+	}
+	return result;
+}
+
+/*
+ * set up new values for the existing scankeys
+ * based on the current index tuple
+ */
+static inline void
+_bt_update_scankey_with_tuple(BTScanInsert insertKey, Relation indexRel, IndexTuple itup, int numattrs)
+{
+	TupleDesc		itupdesc;
+	int				i;
+	ScanKey			scankeys = insertKey->scankeys;
+
+	insertKey->keysz = numattrs;
+	itupdesc = RelationGetDescr(indexRel);
+	for (i = 0; i < numattrs; i++)
+	{
+		Datum datum;
+		bool null;
+		int flags;
+
+		datum = index_getattr(itup, i + 1, itupdesc, &null);
+		flags = (null ? SK_ISNULL : 0) |
+				(indexRel->rd_indoption[i] << SK_BT_INDOPTION_SHIFT);
+		scankeys[i].sk_flags = flags;
+		scankeys[i].sk_argument = datum;
+	}
+}
+
+/* copy the elements important to a skip from one insertion sk to another */
+static inline void
+_bt_copy_scankey(BTScanInsert to, BTScanInsert from, int numattrs)
+{
+	memcpy(to->scankeys, from->scankeys, sizeof(ScanKeyData) * (unsigned long)numattrs);
+	to->nextkey = from->nextkey;
+	to->keysz = numattrs;
+}
+
+/*
+ * Updates the existing scankey for skipping to the next prefix
+ * alwaysUsePrefix determines how many attrs the scankey will have
+ * when true, it will always have skip->prefix number of attributes,
+ * otherwise, the value can be less, which will be determined by the comparison
+ * result with the current tuple.
+ * for example, a SELECT * FROM tbl WHERE b<2, index (a,b,c) and when skipping with prefix size=2
+ * if we encounter the tuple (1,3,1) - this does not match the qual b<2. however, we also know that
+ * it is not useful to skip to any next qual with prefix=2 (eg. (1,4)), because that will definitely not
+ * match either. However, we do want to skip to eg. (2,0). Therefore, we skip over prefix=1 in this case.
+ *
+ * the provided itup may be null. this happens when we don't want to use the current tuple to update
+ * the scankey, but instead want to use the existing curPos.skipScanKey to fill currentTupleKey. this accounts
+ * for some edge cases.
+ */
+static void
+_bt_skip_update_scankey_for_prefix_skip(IndexScanDesc scan, Relation indexRel,
+										int prefix, IndexTuple itup, ScanDirection prefixDir)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+	/* we use skip->prefix is alwaysUsePrefix is set or if skip->prefix is smaller than whatever the
+	 * comparison result provided, such that we never skip more than skip->prefix
+	 */
+	int numattrs = prefix;
+
+	if (itup != NULL)
+	{
+		_bt_update_scankey_with_tuple(&skip->currentTupleKey, indexRel, itup, numattrs);
+		_bt_copy_scankey(&skip->curPos.skipScanKey, &skip->currentTupleKey, numattrs);
+	}
+	else
+	{
+		skip->curPos.skipScanKey.keysz = numattrs;
+		_bt_copy_scankey(&skip->currentTupleKey, &skip->curPos.skipScanKey, numattrs);
+	}
+	/* update strategy for last attribute as we will use this to determine the rest of the
+	 * rest of the flags (goback) when doing the actual tree search
+	 */
+	skip->currentTupleKey.scankeys[numattrs - 1].sk_strategy =
+			skip->curPos.skipScanKey.scankeys[numattrs - 1].sk_strategy =
+			ScanDirectionIsForward(prefixDir) ? BTGreaterStrategyNumber : BTLessStrategyNumber;
+}
+
+/* update the scankey for skipping the 'extra' conditions, opportunities
+ * that arise when we have just skipped to a new prefix and can try to skip
+ * within the prefix to the right tuple by using extra quals when available
+ *
+ * @todo as an optimization it should be possible to optimize calls to this function
+ * and to _bt_skip_update_scankey_for_prefix_skip to some more specific functions that
+ * will need to do less copying of data.
+ */
+void
+_bt_skip_update_scankey_for_extra_skip(IndexScanDesc scan, Relation indexRel, ScanDirection curDir,
+									   ScanDirection prefixDir, bool prioritizeEqual, IndexTuple itup)
+{
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+	BTScanInsert toCopy;
+	int i, left, lastNonTuple = skip->prefix;
+
+	/* first make sure that currentTupleKey is correct at all times */
+	_bt_skip_update_scankey_for_prefix_skip(scan, indexRel, skip->prefix, itup, prefixDir);
+	/* then do the actual work to setup curPos.skipScanKey - distinguish between work that depends on overallDir
+	 * (those attributes between attribute number 1 and 'prefix' inclusive)
+	 * and work that depends on curDir
+	 * (those attributes between attribute number 'prefix' + 1 and fwdScanKey.keysz inclusive)
+	 */
+	if (ScanDirectionIsForward(prefixDir))
+	{
+		/*
+		 * if overallDir is Forward, we need to choose between fwdScanKey or
+		 * currentTupleKey. we need to choose the most restrictive one -
+		 * in most cases this means choosing eg. a>5 over a=2 when scanning forward,
+		 * unless prioritizeEqual is set. this is done for certain special cases
+		 */
+		for (i = 0; i < skip->prefix; i++)
+		{
+			ScanKey scankey = &skip->fwdScanKey.scankeys[i];
+			ScanKey scankeyItem = &skip->currentTupleKey.scankeys[i];
+			if (scankey->sk_attno != 0 && (_bt_compare_one(scankey, scankeyItem->sk_argument, scankeyItem->sk_flags & SK_ISNULL) > 0
+										   || (prioritizeEqual && scankey->sk_strategy == BTEqualStrategyNumber)))
+			{
+				memcpy(skip->curPos.skipScanKey.scankeys + i, scankey, sizeof(ScanKeyData));
+				lastNonTuple = i;
+			}
+			else
+			{
+				if (lastNonTuple < i)
+					break;
+				memcpy(skip->curPos.skipScanKey.scankeys + i, scankeyItem, sizeof(ScanKeyData));
+			}
+			/* for now choose equal here - it could actually be improved a bit @todo by choosing the strategy
+			 * from the scankeys, but it doesn't matter a lot
+			 */
+			skip->curPos.skipScanKey.scankeys[i].sk_strategy = BTEqualStrategyNumber;
+		}
+	}
+	else
+	{
+		/* similar for backward but in opposite direction */
+		for (i = 0; i < skip->prefix; i++)
+		{
+			ScanKey scankey = &skip->bwdScanKey.scankeys[i];
+			ScanKey scankeyItem = &skip->currentTupleKey.scankeys[i];
+			if (scankey->sk_attno != 0 && (_bt_compare_one(scankey, scankeyItem->sk_argument, scankeyItem->sk_flags & SK_ISNULL) < 0
+										   || (prioritizeEqual && scankey->sk_strategy == BTEqualStrategyNumber)))
+			{
+				memcpy(skip->curPos.skipScanKey.scankeys + i, scankey, sizeof(ScanKeyData));
+				lastNonTuple = i;
+			}
+			else
+			{
+				if (lastNonTuple < i)
+					break;
+				memcpy(skip->curPos.skipScanKey.scankeys + i, scankeyItem, sizeof(ScanKeyData));
+			}
+			skip->curPos.skipScanKey.scankeys[i].sk_strategy = BTEqualStrategyNumber;
+		}
+	}
+
+	/*
+	 * the remaining keys are the quals after the prefix
+	 */
+	if (ScanDirectionIsForward(curDir))
+		toCopy = &skip->fwdScanKey;
+	else
+		toCopy = &skip->bwdScanKey;
+
+	if (lastNonTuple >= skip->prefix - 1)
+	{
+		left = toCopy->keysz - skip->prefix;
+		if (left > 0)
+		{
+			memcpy(skip->curPos.skipScanKey.scankeys + skip->prefix, toCopy->scankeys + i, sizeof(ScanKeyData) * (unsigned long)left);
+		}
+		skip->curPos.skipScanKey.keysz = toCopy->keysz;
+	}
+	else
+	{
+		skip->curPos.skipScanKey.keysz = lastNonTuple + 1;
+	}
+}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 9111e2789c..135953da5f 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -554,7 +554,7 @@ _bt_leafbuild(BTSpool *btspool, BTSpool *btspool2)
 
 	wstate.heap = btspool->heap;
 	wstate.index = btspool->index;
-	wstate.inskey = _bt_mkscankey(wstate.index, NULL);
+	wstate.inskey = _bt_mkscankey(wstate.index, NULL, NULL);
 	/* _bt_mkscankey() won't set allequalimage without metapage */
 	wstate.inskey->allequalimage = _bt_allequalimage(wstate.index, true);
 	wstate.btws_use_wal = RelationNeedsWAL(wstate.index);
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 54afa6f417..d5d30ac5b6 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -49,10 +49,10 @@ static bool _bt_compare_scankey_args(IndexScanDesc scan, ScanKey op,
 									 ScanKey leftarg, ScanKey rightarg,
 									 bool *result);
 static bool _bt_fix_scankey_strategy(ScanKey skey, int16 *indoption);
-static void _bt_mark_scankey_required(ScanKey skey);
+static void _bt_mark_scankey_required(ScanKey skey, int forwardReqFlag, int backwardReqFlag);
 static bool _bt_check_rowcompare(ScanKey skey,
 								 IndexTuple tuple, int tupnatts, TupleDesc tupdesc,
-								 ScanDirection dir, bool *continuescan);
+								 ScanDirection dir, bool *continuescan, int *prefixskipindex);
 static int	_bt_keep_natts(Relation rel, IndexTuple lastleft,
 						   IndexTuple firstright, BTScanInsert itup_key);
 
@@ -87,9 +87,8 @@ static int	_bt_keep_natts(Relation rel, IndexTuple lastleft,
  *		field themselves.
  */
 BTScanInsert
-_bt_mkscankey(Relation rel, IndexTuple itup)
+_bt_mkscankey(Relation rel, IndexTuple itup, BTScanInsert key)
 {
-	BTScanInsert key;
 	ScanKey		skey;
 	TupleDesc	itupdesc;
 	int			indnkeyatts;
@@ -109,8 +108,10 @@ _bt_mkscankey(Relation rel, IndexTuple itup)
 	 * Truncated attributes and non-key attributes are omitted from the final
 	 * scan key.
 	 */
-	key = palloc(offsetof(BTScanInsertData, scankeys) +
-				 sizeof(ScanKeyData) * indnkeyatts);
+	if (key == NULL)
+		key = palloc(offsetof(BTScanInsertData, scankeys) +
+					 sizeof(ScanKeyData) * indnkeyatts);
+
 	if (itup)
 		_bt_metaversion(rel, &key->heapkeyspace, &key->allequalimage);
 	else
@@ -155,7 +156,7 @@ _bt_mkscankey(Relation rel, IndexTuple itup)
 		ScanKeyEntryInitializeWithInfo(&skey[i],
 									   flags,
 									   (AttrNumber) (i + 1),
-									   InvalidStrategy,
+									   BTEqualStrategyNumber,
 									   InvalidOid,
 									   rel->rd_indcollation[i],
 									   procinfo,
@@ -745,7 +746,7 @@ _bt_preprocess_keys(IndexScanDesc scan)
 	int			numberOfKeys = scan->numberOfKeys;
 	int16	   *indoption = scan->indexRelation->rd_indoption;
 	int			new_numberOfKeys;
-	int			numberOfEqualCols;
+	int			numberOfEqualCols, numberOfEqualColsSincePrefix;
 	ScanKey		inkeys;
 	ScanKey		outkeys;
 	ScanKey		cur;
@@ -754,6 +755,7 @@ _bt_preprocess_keys(IndexScanDesc scan)
 	int			i,
 				j;
 	AttrNumber	attno;
+	int			prefix = 0;
 
 	/* initialize result variables */
 	so->qual_ok = true;
@@ -762,6 +764,11 @@ _bt_preprocess_keys(IndexScanDesc scan)
 	if (numberOfKeys < 1)
 		return;					/* done if qual-less scan */
 
+	if (_bt_skip_enabled(so))
+	{
+		prefix = so->skipData->prefix;
+	}
+
 	/*
 	 * Read so->arrayKeyData if array keys are present, else scan->keyData
 	 */
@@ -786,7 +793,9 @@ _bt_preprocess_keys(IndexScanDesc scan)
 		so->numberOfKeys = 1;
 		/* We can mark the qual as required if it's for first index col */
 		if (cur->sk_attno == 1)
-			_bt_mark_scankey_required(outkeys);
+			_bt_mark_scankey_required(outkeys, SK_BT_REQFWD, SK_BT_REQBKWD);
+		if (cur->sk_attno <= prefix + 1)
+			_bt_mark_scankey_required(outkeys, SK_BT_REQSKIPFWD, SK_BT_REQSKIPBKWD);
 		return;
 	}
 
@@ -795,6 +804,8 @@ _bt_preprocess_keys(IndexScanDesc scan)
 	 */
 	new_numberOfKeys = 0;
 	numberOfEqualCols = 0;
+	numberOfEqualColsSincePrefix = 0;
+
 
 	/*
 	 * Initialize for processing of keys for attr 1.
@@ -830,6 +841,8 @@ _bt_preprocess_keys(IndexScanDesc scan)
 		if (i == numberOfKeys || cur->sk_attno != attno)
 		{
 			int			priorNumberOfEqualCols = numberOfEqualCols;
+			int			priorNumberOfEqualColsSincePrefix = numberOfEqualColsSincePrefix;
+
 
 			/* check input keys are correctly ordered */
 			if (i < numberOfKeys && cur->sk_attno < attno)
@@ -880,6 +893,8 @@ _bt_preprocess_keys(IndexScanDesc scan)
 				}
 				/* track number of attrs for which we have "=" keys */
 				numberOfEqualCols++;
+				if (attno > prefix)
+					numberOfEqualColsSincePrefix++;
 			}
 
 			/* try to keep only one of <, <= */
@@ -929,7 +944,9 @@ _bt_preprocess_keys(IndexScanDesc scan)
 
 					memcpy(outkey, xform[j], sizeof(ScanKeyData));
 					if (priorNumberOfEqualCols == attno - 1)
-						_bt_mark_scankey_required(outkey);
+						_bt_mark_scankey_required(outkey, SK_BT_REQFWD, SK_BT_REQBKWD);
+					if (attno <= prefix || priorNumberOfEqualColsSincePrefix == attno - prefix - 1)
+						_bt_mark_scankey_required(outkey, SK_BT_REQSKIPFWD, SK_BT_REQSKIPBKWD);
 				}
 			}
 
@@ -954,7 +971,9 @@ _bt_preprocess_keys(IndexScanDesc scan)
 
 			memcpy(outkey, cur, sizeof(ScanKeyData));
 			if (numberOfEqualCols == attno - 1)
-				_bt_mark_scankey_required(outkey);
+				_bt_mark_scankey_required(outkey, SK_BT_REQFWD, SK_BT_REQBKWD);
+			if (attno <= prefix || numberOfEqualColsSincePrefix == attno - prefix - 1)
+				_bt_mark_scankey_required(outkey, SK_BT_REQSKIPFWD, SK_BT_REQSKIPBKWD);
 
 			/*
 			 * We don't support RowCompare using equality; such a qual would
@@ -997,7 +1016,9 @@ _bt_preprocess_keys(IndexScanDesc scan)
 
 				memcpy(outkey, cur, sizeof(ScanKeyData));
 				if (numberOfEqualCols == attno - 1)
-					_bt_mark_scankey_required(outkey);
+					_bt_mark_scankey_required(outkey, SK_BT_REQFWD, SK_BT_REQBKWD);
+				if (attno <= prefix || numberOfEqualColsSincePrefix == attno - prefix - 1)
+					_bt_mark_scankey_required(outkey, SK_BT_REQSKIPFWD, SK_BT_REQSKIPBKWD);
 			}
 		}
 	}
@@ -1295,7 +1316,7 @@ _bt_fix_scankey_strategy(ScanKey skey, int16 *indoption)
  * anyway on a rescan.  Something to keep an eye on though.
  */
 static void
-_bt_mark_scankey_required(ScanKey skey)
+_bt_mark_scankey_required(ScanKey skey, int forwardReqFlag, int backwardReqFlag)
 {
 	int			addflags;
 
@@ -1303,14 +1324,14 @@ _bt_mark_scankey_required(ScanKey skey)
 	{
 		case BTLessStrategyNumber:
 		case BTLessEqualStrategyNumber:
-			addflags = SK_BT_REQFWD;
+			addflags = forwardReqFlag;
 			break;
 		case BTEqualStrategyNumber:
-			addflags = SK_BT_REQFWD | SK_BT_REQBKWD;
+			addflags = forwardReqFlag | backwardReqFlag;
 			break;
 		case BTGreaterEqualStrategyNumber:
 		case BTGreaterStrategyNumber:
-			addflags = SK_BT_REQBKWD;
+			addflags = backwardReqFlag;
 			break;
 		default:
 			elog(ERROR, "unrecognized StrategyNumber: %d",
@@ -1353,17 +1374,22 @@ _bt_mark_scankey_required(ScanKey skey)
  */
 bool
 _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
-			  ScanDirection dir, bool *continuescan)
+			  ScanDirection dir, bool *continuescan, int *prefixSkipIndex)
 {
 	TupleDesc	tupdesc;
 	BTScanOpaque so;
 	int			keysz;
 	int			ikey;
 	ScanKey		key;
+	int pfx;
+
+	if (prefixSkipIndex == NULL)
+		prefixSkipIndex = &pfx;
 
 	Assert(BTreeTupleGetNAtts(tuple, scan->indexRelation) == tupnatts);
 
 	*continuescan = true;		/* default assumption */
+	*prefixSkipIndex = -1;
 
 	tupdesc = RelationGetDescr(scan->indexRelation);
 	so = (BTScanOpaque) scan->opaque;
@@ -1392,7 +1418,7 @@ _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
 		if (key->sk_flags & SK_ROW_HEADER)
 		{
 			if (_bt_check_rowcompare(key, tuple, tupnatts, tupdesc, dir,
-									 continuescan))
+									 continuescan, prefixSkipIndex))
 				continue;
 			return false;
 		}
@@ -1429,6 +1455,13 @@ _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
 					 ScanDirectionIsBackward(dir))
 				*continuescan = false;
 
+			if ((key->sk_flags & SK_BT_REQSKIPFWD) &&
+				ScanDirectionIsForward(dir))
+				*prefixSkipIndex = key->sk_attno - 1;
+			else if ((key->sk_flags & SK_BT_REQSKIPBKWD) &&
+					 ScanDirectionIsBackward(dir))
+				*prefixSkipIndex = key->sk_attno - 1;
+
 			/*
 			 * In any case, this indextuple doesn't match the qual.
 			 */
@@ -1452,6 +1485,10 @@ _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
 				if ((key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) &&
 					ScanDirectionIsBackward(dir))
 					*continuescan = false;
+
+				if ((key->sk_flags & (SK_BT_REQSKIPFWD | SK_BT_REQSKIPBKWD)) &&
+					ScanDirectionIsBackward(dir))
+					*prefixSkipIndex = key->sk_attno - 1;
 			}
 			else
 			{
@@ -1468,6 +1505,9 @@ _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
 				if ((key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) &&
 					ScanDirectionIsForward(dir))
 					*continuescan = false;
+				if ((key->sk_flags & (SK_BT_REQSKIPFWD | SK_BT_REQSKIPBKWD)) &&
+									ScanDirectionIsBackward(dir))
+									*prefixSkipIndex = key->sk_attno - 1;
 			}
 
 			/*
@@ -1498,6 +1538,206 @@ _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
 					 ScanDirectionIsBackward(dir))
 				*continuescan = false;
 
+			if ((key->sk_flags & SK_BT_REQSKIPFWD) &&
+				ScanDirectionIsForward(dir))
+				*prefixSkipIndex = key->sk_attno - 1;
+			else if ((key->sk_flags & SK_BT_REQSKIPBKWD) &&
+					 ScanDirectionIsBackward(dir))
+				*prefixSkipIndex = key->sk_attno - 1;
+
+			/*
+			 * In any case, this indextuple doesn't match the qual.
+			 */
+			return false;
+		}
+	}
+
+	/* If we get here, the tuple passes all index quals. */
+	return true;
+}
+
+bool
+_bt_checkkeys_threeway(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
+			  ScanDirection dir, bool *continuescan, int *prefixSkipIndex)
+{
+	TupleDesc	tupdesc;
+	BTScanOpaque so;
+	int			keysz;
+	int			ikey;
+	ScanKey		key;
+	int pfx;
+	BTScanInsert keys;
+
+	if (prefixSkipIndex == NULL)
+		prefixSkipIndex = &pfx;
+
+	Assert(BTreeTupleGetNAtts(tuple, scan->indexRelation) == tupnatts);
+
+	*continuescan = true;		/* default assumption */
+	*prefixSkipIndex = -1;
+
+	tupdesc = RelationGetDescr(scan->indexRelation);
+	so = (BTScanOpaque) scan->opaque;
+	if (ScanDirectionIsForward(dir))
+		keys = &so->skipData->bwdScanKey;
+	else
+		keys = &so->skipData->fwdScanKey;
+
+	keysz = keys->keysz;
+
+	for (key = keys->scankeys, ikey = 0; ikey < keysz; key++, ikey++)
+	{
+		Datum		datum;
+		bool		isNull;
+		int		cmpresult;
+
+		if (key->sk_attno == 0)
+			continue;
+
+		if (key->sk_attno > tupnatts)
+		{
+			/*
+			 * This attribute is truncated (must be high key).  The value for
+			 * this attribute in the first non-pivot tuple on the page to the
+			 * right could be any possible value.  Assume that truncated
+			 * attribute passes the qual.
+			 */
+			Assert(ScanDirectionIsForward(dir));
+			continue;
+		}
+
+		/* row-comparison keys need special processing */
+		Assert((key->sk_flags & SK_ROW_HEADER) == 0);
+
+		datum = index_getattr(tuple,
+							  key->sk_attno,
+							  tupdesc,
+							  &isNull);
+
+		if (key->sk_flags & SK_ISNULL)
+		{
+			/* Handle IS NULL/NOT NULL tests */
+			if (key->sk_flags & SK_SEARCHNULL)
+			{
+				if (isNull)
+					continue;	/* tuple satisfies this qual */
+			}
+			else
+			{
+				Assert(key->sk_flags & SK_SEARCHNOTNULL);
+				if (!isNull)
+					continue;	/* tuple satisfies this qual */
+			}
+
+			/*
+			 * Tuple fails this qual.  If it's a required qual for the current
+			 * scan direction, then we can conclude no further tuples will
+			 * pass, either.
+			 */
+			if ((key->sk_flags & SK_BT_REQFWD) &&
+				ScanDirectionIsForward(dir))
+				*continuescan = false;
+			else if ((key->sk_flags & SK_BT_REQBKWD) &&
+					 ScanDirectionIsBackward(dir))
+				*continuescan = false;
+
+			if ((key->sk_flags & SK_BT_REQSKIPFWD) &&
+				ScanDirectionIsForward(dir))
+				*prefixSkipIndex = key->sk_attno - 1;
+			else if ((key->sk_flags & SK_BT_REQSKIPBKWD) &&
+					 ScanDirectionIsBackward(dir))
+				*prefixSkipIndex = key->sk_attno - 1;
+
+			/*
+			 * In any case, this indextuple doesn't match the qual.
+			 */
+			return false;
+		}
+
+		if (isNull)
+		{
+			if (key->sk_flags & SK_BT_NULLS_FIRST)
+			{
+				/*
+				 * Since NULLs are sorted before non-NULLs, we know we have
+				 * reached the lower limit of the range of values for this
+				 * index attr.  On a backward scan, we can stop if this qual
+				 * is one of the "must match" subset.  We can stop regardless
+				 * of whether the qual is > or <, so long as it's required,
+				 * because it's not possible for any future tuples to pass. On
+				 * a forward scan, however, we must keep going, because we may
+				 * have initially positioned to the start of the index.
+				 */
+				if ((key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) &&
+					ScanDirectionIsBackward(dir))
+					*continuescan = false;
+
+				if ((key->sk_flags & (SK_BT_REQSKIPFWD | SK_BT_REQSKIPBKWD)) &&
+					ScanDirectionIsBackward(dir))
+					*prefixSkipIndex = key->sk_attno - 1;
+			}
+			else
+			{
+				/*
+				 * Since NULLs are sorted after non-NULLs, we know we have
+				 * reached the upper limit of the range of values for this
+				 * index attr.  On a forward scan, we can stop if this qual is
+				 * one of the "must match" subset.  We can stop regardless of
+				 * whether the qual is > or <, so long as it's required,
+				 * because it's not possible for any future tuples to pass. On
+				 * a backward scan, however, we must keep going, because we
+				 * may have initially positioned to the end of the index.
+				 */
+				if ((key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) &&
+					ScanDirectionIsForward(dir))
+					*continuescan = false;
+				if ((key->sk_flags & (SK_BT_REQSKIPFWD | SK_BT_REQSKIPBKWD)) &&
+					ScanDirectionIsBackward(dir))
+					*prefixSkipIndex = key->sk_attno - 1;
+			}
+
+			/*
+			 * In any case, this indextuple doesn't match the qual.
+			 */
+			return false;
+		}
+
+
+		/* Perform the test --- three-way comparison not bool operator */
+		cmpresult = DatumGetInt32(FunctionCall2Coll(&key->sk_func,
+													key->sk_collation,
+													datum,
+													key->sk_argument));
+
+		if (key->sk_flags & SK_BT_DESC)
+			INVERT_COMPARE_RESULT(cmpresult);
+
+		if (cmpresult != 0)
+		{
+			/*
+			 * Tuple fails this qual.  If it's a required qual for the current
+			 * scan direction, then we can conclude no further tuples will
+			 * pass, either.
+			 *
+			 * Note: because we stop the scan as soon as any required equality
+			 * qual fails, it is critical that equality quals be used for the
+			 * initial positioning in _bt_first() when they are available. See
+			 * comments in _bt_first().
+			 */
+			if ((key->sk_flags & SK_BT_REQFWD) &&
+				ScanDirectionIsForward(dir) && cmpresult > 0)
+				*continuescan = false;
+			else if ((key->sk_flags & SK_BT_REQBKWD) &&
+					 ScanDirectionIsBackward(dir) && cmpresult < 0)
+				*continuescan = false;
+
+			if ((key->sk_flags & SK_BT_REQSKIPFWD) &&
+				ScanDirectionIsForward(dir) && cmpresult > 0)
+				*prefixSkipIndex = key->sk_attno - 1;
+			else if ((key->sk_flags & SK_BT_REQSKIPBKWD) &&
+					 ScanDirectionIsBackward(dir) && cmpresult < 0)
+				*prefixSkipIndex = key->sk_attno - 1;
+
 			/*
 			 * In any case, this indextuple doesn't match the qual.
 			 */
@@ -1520,7 +1760,7 @@ _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
  */
 static bool
 _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
-					 TupleDesc tupdesc, ScanDirection dir, bool *continuescan)
+					 TupleDesc tupdesc, ScanDirection dir, bool *continuescan, int *prefixSkipIndex)
 {
 	ScanKey		subkey = (ScanKey) DatumGetPointer(skey->sk_argument);
 	int32		cmpresult = 0;
@@ -1576,6 +1816,10 @@ _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
 				if ((subkey->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) &&
 					ScanDirectionIsBackward(dir))
 					*continuescan = false;
+
+				if ((subkey->sk_flags & (SK_BT_REQSKIPFWD | SK_BT_REQBKWD) &&
+					ScanDirectionIsBackward(dir)))
+					*prefixSkipIndex = subkey->sk_attno - 1;
 			}
 			else
 			{
@@ -1592,6 +1836,10 @@ _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
 				if ((subkey->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) &&
 					ScanDirectionIsForward(dir))
 					*continuescan = false;
+
+				if ((subkey->sk_flags & (SK_BT_REQSKIPFWD | SK_BT_REQBKWD) &&
+					ScanDirectionIsForward(dir)))
+					*prefixSkipIndex = subkey->sk_attno - 1;
 			}
 
 			/*
@@ -1616,6 +1864,13 @@ _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
 			else if ((subkey->sk_flags & SK_BT_REQBKWD) &&
 					 ScanDirectionIsBackward(dir))
 				*continuescan = false;
+
+			if ((subkey->sk_flags & SK_BT_REQSKIPFWD) &&
+				ScanDirectionIsForward(dir))
+				*prefixSkipIndex = subkey->sk_attno - 1;
+			else if ((subkey->sk_flags & SK_BT_REQSKIPBKWD) &&
+					 ScanDirectionIsBackward(dir))
+				*prefixSkipIndex = subkey->sk_attno - 1;
 			return false;
 		}
 
@@ -1678,6 +1933,13 @@ _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
 		else if ((subkey->sk_flags & SK_BT_REQBKWD) &&
 				 ScanDirectionIsBackward(dir))
 			*continuescan = false;
+
+		if ((subkey->sk_flags & SK_BT_REQSKIPFWD) &&
+			ScanDirectionIsForward(dir))
+			*prefixSkipIndex = subkey->sk_attno - 1;
+		else if ((subkey->sk_flags & SK_BT_REQSKIPBKWD) &&
+				 ScanDirectionIsBackward(dir))
+			*prefixSkipIndex = subkey->sk_attno - 1;
 	}
 
 	return result;
@@ -2767,3 +3029,524 @@ _bt_allequalimage(Relation rel, bool debugmessage)
 
 	return allequalimage;
 }
+
+void _bt_set_bsearch_flags(StrategyNumber stratTotal, ScanDirection dir, bool* nextkey, bool* goback)
+{
+	/*----------
+	 * Examine the selected initial-positioning strategy to determine exactly
+	 * where we need to start the scan, and set flag variables to control the
+	 * code below.
+	 *
+	 * If nextkey = false, _bt_search and _bt_binsrch will locate the first
+	 * item >= scan key.  If nextkey = true, they will locate the first
+	 * item > scan key.
+	 *
+	 * If goback = true, we will then step back one item, while if
+	 * goback = false, we will start the scan on the located item.
+	 *----------
+	 */
+	switch (stratTotal)
+	{
+		case BTLessStrategyNumber:
+
+			/*
+			 * Find first item >= scankey, then back up one to arrive at last
+			 * item < scankey.  (Note: this positioning strategy is only used
+			 * for a backward scan, so that is always the correct starting
+			 * position.)
+			 */
+			*nextkey = false;
+			*goback = true;
+			break;
+
+		case BTLessEqualStrategyNumber:
+
+			/*
+			 * Find first item > scankey, then back up one to arrive at last
+			 * item <= scankey.  (Note: this positioning strategy is only used
+			 * for a backward scan, so that is always the correct starting
+			 * position.)
+			 */
+			*nextkey = true;
+			*goback = true;
+			break;
+
+		case BTEqualStrategyNumber:
+
+			/*
+			 * If a backward scan was specified, need to start with last equal
+			 * item not first one.
+			 */
+			if (ScanDirectionIsBackward(dir))
+			{
+				/*
+				 * This is the same as the <= strategy.  We will check at the
+				 * end whether the found item is actually =.
+				 */
+				*nextkey = true;
+				*goback = true;
+			}
+			else
+			{
+				/*
+				 * This is the same as the >= strategy.  We will check at the
+				 * end whether the found item is actually =.
+				 */
+				*nextkey = false;
+				*goback = false;
+			}
+			break;
+
+		case BTGreaterEqualStrategyNumber:
+
+			/*
+			 * Find first item >= scankey.  (This is only used for forward
+			 * scans.)
+			 */
+			*nextkey = false;
+			*goback = false;
+			break;
+
+		case BTGreaterStrategyNumber:
+
+			/*
+			 * Find first item > scankey.  (This is only used for forward
+			 * scans.)
+			 */
+			*nextkey = true;
+			*goback = false;
+			break;
+
+		default:
+			/* can't get here, but keep compiler quiet */
+			elog(ERROR, "unrecognized strat_total: %d", (int) stratTotal);
+	}
+}
+
+bool _bt_create_insertion_scan_key(Relation	rel, ScanDirection dir, ScanKey* startKeys, int keysCount, BTScanInsert inskey, StrategyNumber* stratTotal,  bool* goback)
+{
+	int i;
+	bool nextkey;
+
+	/*
+	 * We want to start the scan somewhere within the index.  Set up an
+	 * insertion scankey we can use to search for the boundary point we
+	 * identified above.  The insertion scankey is built using the keys
+	 * identified by startKeys[].  (Remaining insertion scankey fields are
+	 * initialized after initial-positioning strategy is finalized.)
+	 */
+	Assert(keysCount <= INDEX_MAX_KEYS);
+	for (i = 0; i < keysCount; i++)
+	{
+		ScanKey		cur = startKeys[i];
+
+		if (cur == NULL)
+		{
+			inskey->scankeys[i].sk_attno = 0;
+			continue;
+		}
+
+		Assert(cur->sk_attno == i + 1);
+
+		if (cur->sk_flags & SK_ROW_HEADER)
+		{
+			/*
+			 * Row comparison header: look to the first row member instead.
+			 *
+			 * The member scankeys are already in insertion format (ie, they
+			 * have sk_func = 3-way-comparison function), but we have to watch
+			 * out for nulls, which _bt_preprocess_keys didn't check. A null
+			 * in the first row member makes the condition unmatchable, just
+			 * like qual_ok = false.
+			 */
+			ScanKey		subkey = (ScanKey) DatumGetPointer(cur->sk_argument);
+
+			Assert(subkey->sk_flags & SK_ROW_MEMBER);
+			if (subkey->sk_flags & SK_ISNULL)
+			{
+				return false;
+			}
+			memcpy(inskey->scankeys + i, subkey, sizeof(ScanKeyData));
+
+			/*
+			 * If the row comparison is the last positioning key we accepted,
+			 * try to add additional keys from the lower-order row members.
+			 * (If we accepted independent conditions on additional index
+			 * columns, we use those instead --- doesn't seem worth trying to
+			 * determine which is more restrictive.)  Note that this is OK
+			 * even if the row comparison is of ">" or "<" type, because the
+			 * condition applied to all but the last row member is effectively
+			 * ">=" or "<=", and so the extra keys don't break the positioning
+			 * scheme.  But, by the same token, if we aren't able to use all
+			 * the row members, then the part of the row comparison that we
+			 * did use has to be treated as just a ">=" or "<=" condition, and
+			 * so we'd better adjust strat_total accordingly.
+			 */
+			if (i == keysCount - 1)
+			{
+				bool		used_all_subkeys = false;
+
+				Assert(!(subkey->sk_flags & SK_ROW_END));
+				for (;;)
+				{
+					subkey++;
+					Assert(subkey->sk_flags & SK_ROW_MEMBER);
+					if (subkey->sk_attno != keysCount + 1)
+						break;	/* out-of-sequence, can't use it */
+					if (subkey->sk_strategy != cur->sk_strategy)
+						break;	/* wrong direction, can't use it */
+					if (subkey->sk_flags & SK_ISNULL)
+						break;	/* can't use null keys */
+					Assert(keysCount < INDEX_MAX_KEYS);
+					memcpy(inskey->scankeys + keysCount, subkey,
+						   sizeof(ScanKeyData));
+					keysCount++;
+					if (subkey->sk_flags & SK_ROW_END)
+					{
+						used_all_subkeys = true;
+						break;
+					}
+				}
+				if (!used_all_subkeys)
+				{
+					switch (*stratTotal)
+					{
+						case BTLessStrategyNumber:
+							*stratTotal = BTLessEqualStrategyNumber;
+							break;
+						case BTGreaterStrategyNumber:
+							*stratTotal = BTGreaterEqualStrategyNumber;
+							break;
+					}
+				}
+				break;			/* done with outer loop */
+			}
+		}
+		else
+		{
+			/*
+			 * Ordinary comparison key.  Transform the search-style scan key
+			 * to an insertion scan key by replacing the sk_func with the
+			 * appropriate btree comparison function.
+			 *
+			 * If scankey operator is not a cross-type comparison, we can use
+			 * the cached comparison function; otherwise gotta look it up in
+			 * the catalogs.  (That can't lead to infinite recursion, since no
+			 * indexscan initiated by syscache lookup will use cross-data-type
+			 * operators.)
+			 *
+			 * We support the convention that sk_subtype == InvalidOid means
+			 * the opclass input type; this is a hack to simplify life for
+			 * ScanKeyInit().
+			 */
+			if (cur->sk_subtype == rel->rd_opcintype[i] ||
+				cur->sk_subtype == InvalidOid)
+			{
+				FmgrInfo   *procinfo;
+
+				procinfo = index_getprocinfo(rel, cur->sk_attno, BTORDER_PROC);
+				ScanKeyEntryInitializeWithInfo(inskey->scankeys + i,
+											   cur->sk_flags,
+											   cur->sk_attno,
+											   cur->sk_strategy,
+											   cur->sk_subtype,
+											   cur->sk_collation,
+											   procinfo,
+											   cur->sk_argument);
+			}
+			else
+			{
+				RegProcedure cmp_proc;
+
+				cmp_proc = get_opfamily_proc(rel->rd_opfamily[i],
+											 rel->rd_opcintype[i],
+											 cur->sk_subtype,
+											 BTORDER_PROC);
+				if (!RegProcedureIsValid(cmp_proc))
+					elog(ERROR, "missing support function %d(%u,%u) for attribute %d of index \"%s\"",
+						 BTORDER_PROC, rel->rd_opcintype[i], cur->sk_subtype,
+						 cur->sk_attno, RelationGetRelationName(rel));
+				ScanKeyEntryInitialize(inskey->scankeys + i,
+									   cur->sk_flags,
+									   cur->sk_attno,
+									   cur->sk_strategy,
+									   cur->sk_subtype,
+									   cur->sk_collation,
+									   cmp_proc,
+									   cur->sk_argument);
+			}
+		}
+	}
+
+	_bt_set_bsearch_flags(*stratTotal, dir, &nextkey, goback);
+
+	/* Initialize remaining insertion scan key fields */
+	_bt_metaversion(rel, &inskey->heapkeyspace, &inskey->allequalimage);
+	inskey->anynullkeys = false; /* unused */
+	inskey->nextkey = nextkey;
+	inskey->pivotsearch = false;
+	inskey->scantid = NULL;
+	inskey->keysz = keysCount;
+
+	return true;
+}
+
+/*----------
+ * Examine the scan keys to discover where we need to start the scan.
+ *
+ * We want to identify the keys that can be used as starting boundaries;
+ * these are =, >, or >= keys for a forward scan or =, <, <= keys for
+ * a backwards scan.  We can use keys for multiple attributes so long as
+ * the prior attributes had only =, >= (resp. =, <=) keys.  Once we accept
+ * a > or < boundary or find an attribute with no boundary (which can be
+ * thought of as the same as "> -infinity"), we can't use keys for any
+ * attributes to its right, because it would break our simplistic notion
+ * of what initial positioning strategy to use.
+ *
+ * When the scan keys include cross-type operators, _bt_preprocess_keys
+ * may not be able to eliminate redundant keys; in such cases we will
+ * arbitrarily pick a usable one for each attribute.  This is correct
+ * but possibly not optimal behavior.  (For example, with keys like
+ * "x >= 4 AND x >= 5" we would elect to scan starting at x=4 when
+ * x=5 would be more efficient.)  Since the situation only arises given
+ * a poorly-worded query plus an incomplete opfamily, live with it.
+ *
+ * When both equality and inequality keys appear for a single attribute
+ * (again, only possible when cross-type operators appear), we *must*
+ * select one of the equality keys for the starting point, because
+ * _bt_checkkeys() will stop the scan as soon as an equality qual fails.
+ * For example, if we have keys like "x >= 4 AND x = 10" and we elect to
+ * start at x=4, we will fail and stop before reaching x=10.  If multiple
+ * equality quals survive preprocessing, however, it doesn't matter which
+ * one we use --- by definition, they are either redundant or
+ * contradictory.
+ *
+ * Any regular (not SK_SEARCHNULL) key implies a NOT NULL qualifier.
+ * If the index stores nulls at the end of the index we'll be starting
+ * from, and we have no boundary key for the column (which means the key
+ * we deduced NOT NULL from is an inequality key that constrains the other
+ * end of the index), then we cons up an explicit SK_SEARCHNOTNULL key to
+ * use as a boundary key.  If we didn't do this, we might find ourselves
+ * traversing a lot of null entries at the start of the scan.
+ *
+ * In this loop, row-comparison keys are treated the same as keys on their
+ * first (leftmost) columns.  We'll add on lower-order columns of the row
+ * comparison below, if possible.
+ *
+ * The selected scan keys (at most one per index column) are remembered by
+ * storing their addresses into the local startKeys[] array.
+ *----------
+ */
+int _bt_choose_scan_keys(ScanKey scanKeys, int numberOfKeys, ScanDirection dir, ScanKey* startKeys, ScanKeyData* notnullkeys, StrategyNumber* stratTotal, int prefix)
+{
+	StrategyNumber strat;
+	int			keysCount = 0;
+	int			i;
+
+	*stratTotal = BTEqualStrategyNumber;
+	if (numberOfKeys > 0 || prefix > 0)
+	{
+		AttrNumber	curattr;
+		ScanKey		chosen;
+		ScanKey		impliesNN;
+		ScanKey		cur;
+
+		/*
+		 * chosen is the so-far-chosen key for the current attribute, if any.
+		 * We don't cast the decision in stone until we reach keys for the
+		 * next attribute.
+		 */
+		curattr = 1;
+		chosen = NULL;
+		/* Also remember any scankey that implies a NOT NULL constraint */
+		impliesNN = NULL;
+
+		/*
+		 * Loop iterates from 0 to numberOfKeys inclusive; we use the last
+		 * pass to handle after-last-key processing.  Actual exit from the
+		 * loop is at one of the "break" statements below.
+		 */
+		for (cur = scanKeys, i = 0;; cur++, i++)
+		{
+			if (i >= numberOfKeys || cur->sk_attno != curattr)
+			{
+				/*
+				 * Done looking at keys for curattr.  If we didn't find a
+				 * usable boundary key, see if we can deduce a NOT NULL key.
+				 */
+				if (chosen == NULL && impliesNN != NULL &&
+					((impliesNN->sk_flags & SK_BT_NULLS_FIRST) ?
+					 ScanDirectionIsForward(dir) :
+					 ScanDirectionIsBackward(dir)))
+				{
+					/* Yes, so build the key in notnullkeys[keysCount] */
+					chosen = &notnullkeys[keysCount];
+					ScanKeyEntryInitialize(chosen,
+										   (SK_SEARCHNOTNULL | SK_ISNULL |
+											(impliesNN->sk_flags &
+											 (SK_BT_DESC | SK_BT_NULLS_FIRST))),
+										   curattr,
+										   ((impliesNN->sk_flags & SK_BT_NULLS_FIRST) ?
+											BTGreaterStrategyNumber :
+											BTLessStrategyNumber),
+										   InvalidOid,
+										   InvalidOid,
+										   InvalidOid,
+										   (Datum) 0);
+				}
+
+				/*
+				 * If we still didn't find a usable boundary key, quit; else
+				 * save the boundary key pointer in startKeys.
+				 */
+				if (chosen == NULL && curattr > prefix)
+					break;
+				startKeys[keysCount++] = chosen;
+
+				/*
+				 * Adjust strat_total, and quit if we have stored a > or <
+				 * key.
+				 */
+				if (chosen != NULL && curattr > prefix)
+				{
+					strat = chosen->sk_strategy;
+					if (strat != BTEqualStrategyNumber)
+					{
+						*stratTotal = strat;
+						if (strat == BTGreaterStrategyNumber ||
+							strat == BTLessStrategyNumber)
+							break;
+					}
+				}
+
+				/*
+				 * Done if that was the last attribute, or if next key is not
+				 * in sequence (implying no boundary key is available for the
+				 * next attribute).
+				 */
+				if (i >= numberOfKeys)
+				{
+					curattr++;
+					while(curattr <= prefix)
+					{
+						startKeys[keysCount++] = NULL;
+						curattr++;
+					}
+					break;
+				}
+				else if (cur->sk_attno != curattr + 1)
+				{
+					curattr++;
+					while(curattr < cur->sk_attno && curattr <= prefix)
+					{
+						startKeys[keysCount++] = NULL;
+						curattr++;
+					}
+					if (curattr > prefix && curattr != cur->sk_attno)
+						break;
+				}
+				else
+				{
+					curattr++;
+				}
+
+				/*
+				 * Reset for next attr.
+				 */
+				chosen = NULL;
+				impliesNN = NULL;
+			}
+
+			/*
+			 * Can we use this key as a starting boundary for this attr?
+			 *
+			 * If not, does it imply a NOT NULL constraint?  (Because
+			 * SK_SEARCHNULL keys are always assigned BTEqualStrategyNumber,
+			 * *any* inequality key works for that; we need not test.)
+			 */
+			switch (cur->sk_strategy)
+			{
+				case BTLessStrategyNumber:
+				case BTLessEqualStrategyNumber:
+					if (chosen == NULL)
+					{
+						if (ScanDirectionIsBackward(dir))
+							chosen = cur;
+						else
+							impliesNN = cur;
+					}
+					break;
+				case BTEqualStrategyNumber:
+					/* override any non-equality choice */
+					chosen = cur;
+					break;
+				case BTGreaterEqualStrategyNumber:
+				case BTGreaterStrategyNumber:
+					if (chosen == NULL)
+					{
+						if (ScanDirectionIsForward(dir))
+							chosen = cur;
+						else
+							impliesNN = cur;
+					}
+					break;
+			}
+		}
+	}
+	return keysCount;
+}
+
+void print_itup(BlockNumber blk, IndexTuple left, IndexTuple right, Relation rel, char *extra)
+{
+	bool		isnull[INDEX_MAX_KEYS];
+	Datum		values[INDEX_MAX_KEYS];
+	char	   *lkey_desc = NULL;
+	char	   *rkey_desc;
+
+	/* Avoid infinite recursion -- don't instrument catalog indexes */
+	if (!IsCatalogRelation(rel))
+	{
+		TupleDesc	itupdesc = RelationGetDescr(rel);
+		int			natts;
+		int			indnkeyatts = rel->rd_index->indnkeyatts;
+
+		natts = BTreeTupleGetNAtts(left, rel);
+		itupdesc->natts = Min(indnkeyatts, natts);
+		memset(&isnull, 0xFF, sizeof(isnull));
+		index_deform_tuple(left, itupdesc, values, isnull);
+		rel->rd_index->indnkeyatts = natts;
+
+		/*
+		 * Since the regression tests should pass when the instrumentation
+		 * patch is applied, be prepared for BuildIndexValueDescription() to
+		 * return NULL due to security considerations.
+		 */
+		lkey_desc = BuildIndexValueDescription(rel, values, isnull);
+		if (lkey_desc && right)
+		{
+			/*
+			 * Revolting hack: modify tuple descriptor to have number of key
+			 * columns actually present in caller's pivot tuples
+			 */
+			natts = BTreeTupleGetNAtts(right, rel);
+			itupdesc->natts = Min(indnkeyatts, natts);
+			memset(&isnull, 0xFF, sizeof(isnull));
+			index_deform_tuple(right, itupdesc, values, isnull);
+			rel->rd_index->indnkeyatts = natts;
+			rkey_desc = BuildIndexValueDescription(rel, values, isnull);
+			elog(DEBUG1, "%s blk %u sk > %s, sk <= %s %s",
+				 RelationGetRelationName(rel), blk, lkey_desc, rkey_desc,
+				 extra);
+			pfree(rkey_desc);
+		}
+		else
+			elog(DEBUG1, "%s blk %u sk check %s %s",
+				 RelationGetRelationName(rel), blk, lkey_desc, extra);
+
+		/* Cleanup */
+		itupdesc->natts = IndexRelationGetNumberOfAttributes(rel);
+		rel->rd_index->indnkeyatts = indnkeyatts;
+		if (lkey_desc)
+			pfree(lkey_desc);
+	}
+}
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 4924ae1c59..b5db76cf48 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -68,6 +68,9 @@ spghandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = spgbulkdelete;
 	amroutine->amvacuumcleanup = spgvacuumcleanup;
 	amroutine->amcanreturn = spgcanreturn;
+	amroutine->amskip = NULL;
+	amroutine->ambeginskipscan = NULL;
+	amroutine->amgetskiptuple = NULL;
 	amroutine->amcostestimate = spgcostestimate;
 	amroutine->amoptions = spgoptions;
 	amroutine->amproperty = spgproperty;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 58141d8393..6a9e34b6d1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -143,6 +143,7 @@ static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
 static void ExplainIndentText(ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
+static void ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
 
 
@@ -1054,6 +1055,22 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 	return planstate_tree_walker(planstate, ExplainPreScanNode, rels_used);
 }
 
+/*
+ * ExplainIndexSkipScanKeys -
+ *	  Append information about index skip scan to es->str.
+ *
+ * Can be used to print the skip prefix size.
+ */
+static void
+ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es)
+{
+	if (skipPrefixSize > 0)
+	{
+		if (es->format != EXPLAIN_FORMAT_TEXT)
+			ExplainPropertyInteger("Distinct Prefix", NULL, skipPrefixSize, es);
+	}
+}
+
 /*
  * ExplainNode -
  *	  Appends a description of a plan tree to es->str
@@ -1388,6 +1405,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
 
+				if (indexscan->indexdistinct)
+					ExplainIndexSkipScanKeys(indexscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexscan->indexid,
 										indexscan->indexorderdir,
 										es);
@@ -1398,6 +1418,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) plan;
 
+				if (indexonlyscan->indexdistinct)
+					ExplainIndexSkipScanKeys(indexonlyscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexonlyscan->indexid,
 										indexonlyscan->indexorderdir,
 										es);
@@ -1657,6 +1680,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	switch (nodeTag(plan))
 	{
 		case T_IndexScan:
+			if (((IndexScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyText("Skip scan", ((IndexScan *) plan)->indexdistinct ? "Distinct only" : "All", es);
 			show_scan_qual(((IndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexScan *) plan)->indexqualorig)
@@ -1670,6 +1695,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			break;
 		case T_IndexOnlyScan:
+			if (((IndexOnlyScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyText("Skip scan", ((IndexOnlyScan *) plan)->indexdistinct ? "Distinct only" : "All", es);
 			show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexOnlyScan *) plan)->indexqual)
@@ -1686,6 +1713,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 									 planstate->instrument->ntuples2, 0, es);
 			break;
 		case T_BitmapIndexScan:
+			if (((BitmapIndexScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyText("Skip scan", "All", es);
 			show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
 			break;
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 642805d90c..0e77f241f9 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -133,6 +133,14 @@ ExecScanFetch(ScanState *node,
 	return (*accessMtd) (node);
 }
 
+TupleTableSlot *
+ExecScan(ScanState *node,
+		 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
+		 ExecScanRecheckMtd recheckMtd)
+{
+	return ExecScanExtended(node, accessMtd, recheckMtd, NULL);
+}
+
 /* ----------------------------------------------------------------
  *		ExecScan
  *
@@ -155,9 +163,10 @@ ExecScanFetch(ScanState *node,
  * ----------------------------------------------------------------
  */
 TupleTableSlot *
-ExecScan(ScanState *node,
+ExecScanExtended(ScanState *node,
 		 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
-		 ExecScanRecheckMtd recheckMtd)
+		 ExecScanRecheckMtd recheckMtd,
+		 ExecScanSkipMtd skipMtd)
 {
 	ExprContext *econtext;
 	ExprState  *qual;
@@ -170,6 +179,20 @@ ExecScan(ScanState *node,
 	projInfo = node->ps.ps_ProjInfo;
 	econtext = node->ps.ps_ExprContext;
 
+	if (skipMtd != NULL && node->ss_FirstTupleEmitted)
+	{
+		bool cont = skipMtd(node);
+		if (!cont)
+		{
+			node->ss_FirstTupleEmitted = false;
+			return ExecClearTuple(node->ss_ScanTupleSlot);
+		}
+	}
+	else
+	{
+		node->ss_FirstTupleEmitted = true;
+	}
+
 	/* interrupt checks are in ExecScanFetch */
 
 	/*
@@ -178,8 +201,13 @@ ExecScan(ScanState *node,
 	 */
 	if (!qual && !projInfo)
 	{
+		TupleTableSlot *slot;
+
 		ResetExprContext(econtext);
-		return ExecScanFetch(node, accessMtd, recheckMtd);
+		slot = ExecScanFetch(node, accessMtd, recheckMtd);
+		if (TupIsNull(slot))
+			node->ss_FirstTupleEmitted = false;
+		return slot;
 	}
 
 	/*
@@ -206,6 +234,7 @@ ExecScan(ScanState *node,
 		 */
 		if (TupIsNull(slot))
 		{
+			node->ss_FirstTupleEmitted = false;
 			if (projInfo)
 				return ExecClearTuple(projInfo->pi_state.resultslot);
 			else
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 81a1208157..602c64fc91 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -22,13 +22,14 @@
 #include "postgres.h"
 
 #include "access/genam.h"
+#include "access/relscan.h"
 #include "executor/execdebug.h"
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeIndexscan.h"
 #include "miscadmin.h"
+#include "utils/rel.h"
 #include "utils/memutils.h"
 
-
 /* ----------------------------------------------------------------
  *		ExecBitmapIndexScan
  *
@@ -308,10 +309,20 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
 	/*
 	 * Initialize scan descriptor.
 	 */
-	indexstate->biss_ScanDesc =
-		index_beginscan_bitmap(indexstate->biss_RelationDesc,
-							   estate->es_snapshot,
-							   indexstate->biss_NumScanKeys);
+	if (node->indexskipprefixsize > 0)
+	{
+		indexstate->biss_ScanDesc =
+			index_beginscan_bitmap_skip(indexstate->biss_RelationDesc,
+				estate->es_snapshot,
+				indexstate->biss_NumScanKeys,
+				Min(IndexRelationGetNumberOfKeyAttributes(indexstate->biss_RelationDesc),
+					node->indexskipprefixsize));
+	}
+	else
+		indexstate->biss_ScanDesc =
+			index_beginscan_bitmap(indexstate->biss_RelationDesc,
+								   estate->es_snapshot,
+								   indexstate->biss_NumScanKeys);
 
 	/*
 	 * If no run-time keys to calculate, go ahead and pass the scankeys to the
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 5617ac29e7..aadba4a0fe 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -41,6 +41,7 @@
 #include "miscadmin.h"
 #include "storage/bufmgr.h"
 #include "storage/predicate.h"
+#include "storage/itemptr.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -49,6 +50,37 @@ static TupleTableSlot *IndexOnlyNext(IndexOnlyScanState *node);
 static void StoreIndexTuple(TupleTableSlot *slot, IndexTuple itup,
 							TupleDesc itupdesc);
 
+static bool
+IndexOnlySkip(IndexOnlyScanState *node)
+{
+	EState	   *estate;
+	ScanDirection direction;
+	IndexScanDesc scandesc;
+	IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) node->ss.ps.plan;
+
+	if (!node->ioss_Distinct)
+		return true;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+	/* flip direction if this is an overall backward scan */
+	if (ScanDirectionIsBackward(indexonlyscan->indexorderdir))
+	{
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
+	}
+	scandesc = node->ioss_ScanDesc;
+
+	if (!index_skip(scandesc, direction, indexonlyscan->indexorderdir))
+		return false;
+
+	return true;
+}
 
 /* ----------------------------------------------------------------
  *		IndexOnlyNext
@@ -65,6 +97,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
 	ItemPointer tid;
+	ItemPointerData startTid;
+	IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) node->ss.ps.plan;
 
 	/*
 	 * extract necessary information from index scan node
@@ -72,7 +106,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexOnlyScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexonlyscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -90,11 +124,19 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 * serially executing an index only scan that was planned to be
 		 * parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->ioss_RelationDesc,
-								   estate->es_snapshot,
-								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+		if (node->ioss_SkipPrefixSize > 0)
+			scandesc = index_beginscan_skip(node->ss.ss_currentRelation,
+									   node->ioss_RelationDesc,
+									   estate->es_snapshot,
+									   node->ioss_NumScanKeys,
+									   node->ioss_NumOrderByKeys,
+									   Min(IndexRelationGetNumberOfKeyAttributes(node->ioss_RelationDesc), node->ioss_SkipPrefixSize));
+		else
+			scandesc = index_beginscan(node->ss.ss_currentRelation,
+									   node->ioss_RelationDesc,
+									   estate->es_snapshot,
+									   node->ioss_NumScanKeys,
+									   node->ioss_NumOrderByKeys);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -114,11 +156,16 @@ IndexOnlyNext(IndexOnlyScanState *node)
 						 node->ioss_OrderByKeys,
 						 node->ioss_NumOrderByKeys);
 	}
+	else
+	{
+		ItemPointerCopy(&scandesc->xs_heaptid, &startTid);
+	}
 
 	/*
 	 * OK, now that we have what we need, fetch the next tuple.
 	 */
-	while ((tid = index_getnext_tid(scandesc, direction)) != NULL)
+	while ((tid = node->ioss_SkipPrefixSize > 0 ? index_getnext_tid_skip(scandesc, direction, node->ioss_Distinct ? indexonlyscan->indexorderdir : direction) :
+			index_getnext_tid(scandesc, direction)) != NULL)
 	{
 		bool		tuple_from_heap = false;
 
@@ -314,9 +361,10 @@ ExecIndexOnlyScan(PlanState *pstate)
 	if (node->ioss_NumRuntimeKeys != 0 && !node->ioss_RuntimeKeysReady)
 		ExecReScan((PlanState *) node);
 
-	return ExecScan(&node->ss,
+	return ExecScanExtended(&node->ss,
 					(ExecScanAccessMtd) IndexOnlyNext,
-					(ExecScanRecheckMtd) IndexOnlyRecheck);
+					(ExecScanRecheckMtd) IndexOnlyRecheck,
+					node->ioss_Distinct ? (ExecScanSkipMtd) IndexOnlySkip : NULL);
 }
 
 /* ----------------------------------------------------------------
@@ -503,7 +551,10 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	indexstate = makeNode(IndexOnlyScanState);
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
+	indexstate->ss.ss_FirstTupleEmitted = false;
 	indexstate->ss.ps.ExecProcNode = ExecIndexOnlyScan;
+	indexstate->ioss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->ioss_Distinct = node->indexdistinct;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index d0a96a38e0..db3b5a3379 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -69,6 +69,37 @@ static void reorderqueue_push(IndexScanState *node, TupleTableSlot *slot,
 							  Datum *orderbyvals, bool *orderbynulls);
 static HeapTuple reorderqueue_pop(IndexScanState *node);
 
+static bool
+IndexSkip(IndexScanState *node)
+{
+	EState	   *estate;
+	ScanDirection direction;
+	IndexScanDesc scandesc;
+	IndexScan *indexscan = (IndexScan *) node->ss.ps.plan;
+
+	if (!node->iss_Distinct)
+		return true;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+	/* flip direction if this is an overall backward scan */
+	if (ScanDirectionIsBackward(indexscan->indexorderdir))
+	{
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
+	}
+	scandesc = node->iss_ScanDesc;
+
+	if (!index_skip(scandesc, direction, indexscan->indexorderdir))
+		return false;
+
+	return true;
+}
 
 /* ----------------------------------------------------------------
  *		IndexNext
@@ -85,6 +116,7 @@ IndexNext(IndexScanState *node)
 	ScanDirection direction;
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
+	IndexScan *indexscan = (IndexScan *) node->ss.ps.plan;
 
 	/*
 	 * extract necessary information from index scan node
@@ -92,7 +124,7 @@ IndexNext(IndexScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -109,14 +141,25 @@ IndexNext(IndexScanState *node)
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		if (node->iss_SkipPrefixSize > 0)
+			scandesc = index_beginscan_skip(node->ss.ss_currentRelation,
+									   node->iss_RelationDesc,
+									   estate->es_snapshot,
+									   node->iss_NumScanKeys,
+									   node->iss_NumOrderByKeys,
+									   Min(IndexRelationGetNumberOfKeyAttributes(node->iss_RelationDesc), node->iss_SkipPrefixSize));
+		else
+			scandesc = index_beginscan(node->ss.ss_currentRelation,
+									   node->iss_RelationDesc,
+									   estate->es_snapshot,
+									   node->iss_NumScanKeys,
+									   node->iss_NumOrderByKeys);
 
 		node->iss_ScanDesc = scandesc;
 
+		/* Index skip scan assumes xs_want_itup, so set it to true if we skip over distinct */
+		node->iss_ScanDesc->xs_want_itup = indexscan->indexdistinct;
+
 		/*
 		 * If no run-time keys to calculate or they are ready, go ahead and
 		 * pass the scankeys to the index AM.
@@ -130,7 +173,9 @@ IndexNext(IndexScanState *node)
 	/*
 	 * ok, now that we have what we need, fetch the next tuple.
 	 */
-	while (index_getnext_slot(scandesc, direction, slot))
+	while (node->iss_SkipPrefixSize > 0 ?
+		   index_getnext_slot_skip(scandesc, direction, node->iss_Distinct ? indexscan->indexorderdir : direction, slot) :
+		   index_getnext_slot(scandesc, direction, slot))
 	{
 		CHECK_FOR_INTERRUPTS();
 
@@ -530,13 +575,15 @@ ExecIndexScan(PlanState *pstate)
 		ExecReScan((PlanState *) node);
 
 	if (node->iss_NumOrderByKeys > 0)
-		return ExecScan(&node->ss,
+		return ExecScanExtended(&node->ss,
 						(ExecScanAccessMtd) IndexNextWithReorder,
-						(ExecScanRecheckMtd) IndexRecheck);
+						(ExecScanRecheckMtd) IndexRecheck,
+						node->iss_Distinct ? (ExecScanSkipMtd) IndexSkip : NULL);
 	else
-		return ExecScan(&node->ss,
+		return ExecScanExtended(&node->ss,
 						(ExecScanAccessMtd) IndexNext,
-						(ExecScanRecheckMtd) IndexRecheck);
+						(ExecScanRecheckMtd) IndexRecheck,
+						node->iss_Distinct ? (ExecScanSkipMtd) IndexSkip : NULL);
 }
 
 /* ----------------------------------------------------------------
@@ -910,6 +957,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
 	indexstate->ss.ps.ExecProcNode = ExecIndexScan;
+	indexstate->iss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->iss_Distinct = node->indexdistinct;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 1a70625dc8..21af804b4f 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -493,6 +493,8 @@ _copyIndexScan(const IndexScan *from)
 	COPY_NODE_FIELD(indexorderbyorig);
 	COPY_NODE_FIELD(indexorderbyops);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
+	COPY_SCALAR_FIELD(indexdistinct);
 
 	return newnode;
 }
@@ -518,6 +520,8 @@ _copyIndexOnlyScan(const IndexOnlyScan *from)
 	COPY_NODE_FIELD(indexorderby);
 	COPY_NODE_FIELD(indextlist);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
+	COPY_SCALAR_FIELD(indexdistinct);
 
 	return newnode;
 }
@@ -542,6 +546,7 @@ _copyBitmapIndexScan(const BitmapIndexScan *from)
 	COPY_SCALAR_FIELD(isshared);
 	COPY_NODE_FIELD(indexqual);
 	COPY_NODE_FIELD(indexqualorig);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 82fcabd9ee..50297fb9dc 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -562,6 +562,8 @@ _outIndexScan(StringInfo str, const IndexScan *node)
 	WRITE_NODE_FIELD(indexorderbyorig);
 	WRITE_NODE_FIELD(indexorderbyops);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
+	WRITE_INT_FIELD(indexdistinct);
 }
 
 static void
@@ -576,6 +578,9 @@ _outIndexOnlyScan(StringInfo str, const IndexOnlyScan *node)
 	WRITE_NODE_FIELD(indexorderby);
 	WRITE_NODE_FIELD(indextlist);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
+	WRITE_INT_FIELD(indexdistinct);
+
 }
 
 static void
@@ -589,6 +594,7 @@ _outBitmapIndexScan(StringInfo str, const BitmapIndexScan *node)
 	WRITE_BOOL_FIELD(isshared);
 	WRITE_NODE_FIELD(indexqual);
 	WRITE_NODE_FIELD(indexqualorig);
+	WRITE_INT_FIELD(indexskipprefixsize);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index d5b23a3479..3129beb9d7 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1820,6 +1820,8 @@ _readIndexScan(void)
 	READ_NODE_FIELD(indexorderbyorig);
 	READ_NODE_FIELD(indexorderbyops);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
+	READ_INT_FIELD(indexdistinct);
 
 	READ_DONE();
 }
@@ -1839,6 +1841,8 @@ _readIndexOnlyScan(void)
 	READ_NODE_FIELD(indexorderby);
 	READ_NODE_FIELD(indextlist);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
+	READ_INT_FIELD(indexdistinct);
 
 	READ_DONE();
 }
@@ -1857,6 +1861,7 @@ _readBitmapIndexScan(void)
 	READ_BOOL_FIELD(isshared);
 	READ_NODE_FIELD(indexqual);
 	READ_NODE_FIELD(indexqualorig);
+	READ_INT_FIELD(indexskipprefixsize);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 8cf694b61d..9126296bd6 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -125,6 +125,7 @@ int			max_parallel_workers_per_gather = 2;
 bool		enable_seqscan = true;
 bool		enable_indexscan = true;
 bool		enable_indexonlyscan = true;
+bool		enable_indexskipscan = true;
 bool		enable_bitmapscan = true;
 bool		enable_tidscan = true;
 bool		enable_sort = true;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index fc25908dc6..948414bd80 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -175,15 +175,20 @@ static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
 								 Oid indexid, List *indexqual, List *indexqualorig,
 								 List *indexorderby, List *indexorderbyorig,
 								 List *indexorderbyops,
-								 ScanDirection indexscandir);
+								 ScanDirection indexscandir,
+								 int skipprefix,
+								 bool distinct);
 static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 Index scanrelid, Oid indexid,
 										 List *indexqual, List *indexorderby,
 										 List *indextlist,
-										 ScanDirection indexscandir);
+										 ScanDirection indexscandir,
+										 int skipprefix,
+										 bool distinct);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
-											  List *indexqualorig);
+											  List *indexqualorig,
+											  int skipPrefixSize);
 static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 											List *qpqual,
 											Plan *lefttree,
@@ -2914,7 +2919,9 @@ create_indexscan_plan(PlannerInfo *root,
 												fixed_indexquals,
 												fixed_indexorderbys,
 												best_path->indexinfo->indextlist,
-												best_path->indexscandir);
+												best_path->indexscandir,
+												best_path->indexskipprefix,
+												best_path->indexdistinct);
 	else
 		scan_plan = (Scan *) make_indexscan(tlist,
 											qpqual,
@@ -2925,7 +2932,9 @@ create_indexscan_plan(PlannerInfo *root,
 											fixed_indexorderbys,
 											indexorderbys,
 											indexorderbyops,
-											best_path->indexscandir);
+											best_path->indexscandir,
+											best_path->indexskipprefix,
+											best_path->indexdistinct);
 
 	copy_generic_path_info(&scan_plan->plan, &best_path->path);
 
@@ -3215,7 +3224,8 @@ create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
 		plan = (Plan *) make_bitmap_indexscan(iscan->scan.scanrelid,
 											  iscan->indexid,
 											  iscan->indexqual,
-											  iscan->indexqualorig);
+											  iscan->indexqualorig,
+											  iscan->indexskipprefixsize);
 		/* and set its cost/width fields appropriately */
 		plan->startup_cost = 0.0;
 		plan->total_cost = ipath->indextotalcost;
@@ -5186,7 +5196,9 @@ make_indexscan(List *qptlist,
 			   List *indexorderby,
 			   List *indexorderbyorig,
 			   List *indexorderbyops,
-			   ScanDirection indexscandir)
+			   ScanDirection indexscandir,
+			   int skipPrefixSize,
+			   bool distinct)
 {
 	IndexScan  *node = makeNode(IndexScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5203,6 +5215,8 @@ make_indexscan(List *qptlist,
 	node->indexorderbyorig = indexorderbyorig;
 	node->indexorderbyops = indexorderbyops;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
+	node->indexdistinct = distinct;
 
 	return node;
 }
@@ -5215,7 +5229,9 @@ make_indexonlyscan(List *qptlist,
 				   List *indexqual,
 				   List *indexorderby,
 				   List *indextlist,
-				   ScanDirection indexscandir)
+				   ScanDirection indexscandir,
+				   int skipPrefixSize,
+				   bool distinct)
 {
 	IndexOnlyScan *node = makeNode(IndexOnlyScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5230,6 +5246,8 @@ make_indexonlyscan(List *qptlist,
 	node->indexorderby = indexorderby;
 	node->indextlist = indextlist;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
+	node->indexdistinct = distinct;
 
 	return node;
 }
@@ -5238,7 +5256,8 @@ static BitmapIndexScan *
 make_bitmap_indexscan(Index scanrelid,
 					  Oid indexid,
 					  List *indexqual,
-					  List *indexqualorig)
+					  List *indexqualorig,
+					  int skipPrefixSize)
 {
 	BitmapIndexScan *node = makeNode(BitmapIndexScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5251,6 +5270,7 @@ make_bitmap_indexscan(Index scanrelid,
 	node->indexid = indexid;
 	node->indexqual = indexqual;
 	node->indexqualorig = indexqualorig;
+	node->indexskipprefixsize = skipPrefixSize;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 6a7b55abd2..62b5e5e071 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4832,6 +4832,70 @@ create_distinct_paths(PlannerInfo *root,
 												  path,
 												  list_length(root->distinct_pathkeys),
 												  numDistinctRows));
+
+				/* Consider index skip scan as well */
+				if (enable_indexskipscan &&
+					IsA(path, IndexPath) &&
+					((IndexPath *) path)->indexinfo->amcanskip &&
+					root->distinct_pathkeys != NIL)
+				{
+					ListCell   		*lc;
+					IndexOptInfo 	*index = NULL;
+					bool 			different_columns_order = false,
+									multiple_froms = false;
+					int 			i = 0;
+					int 			distinctPrefixKeys;
+
+					Assert(path->pathtype == T_IndexOnlyScan ||
+						   path->pathtype == T_IndexScan);
+
+					index = ((IndexPath *) path)->indexinfo;
+					distinctPrefixKeys = list_length(root->query_uniquekeys);
+
+					/*
+					 * Normally we can think about distinctPrefixKeys as just
+					 * a number of distinct keys. But if lets say we have a
+					 * distinct key a, and the index contains b, a in exactly
+					 * this order. In such situation we need to use position
+					 * of a in the index as distinctPrefixKeys, otherwise skip
+					 * will happen only by the first column.
+					 */
+					foreach(lc, root->query_uniquekeys)
+					{
+						UniqueKey *uniquekey = (UniqueKey *) lfirst(lc);
+						EquivalenceMember *em =
+							lfirst_node(EquivalenceMember,
+										list_head(uniquekey->eq_clause->ec_members));
+						Var *var = (Var *) em->em_expr;
+
+						Assert(i < index->ncolumns);
+
+						for (i = 0; i < index->ncolumns; i++)
+						{
+							if (index->indexkeys[i] == var->varattno)
+							{
+								distinctPrefixKeys = Max(i + 1, distinctPrefixKeys);
+								break;
+							}
+						}
+					}
+
+					/* we can only do this if scanning from one relation */
+					if (path->pathtype == T_IndexScan &&
+						parse->jointree != NULL &&
+						list_length(parse->jointree->fromlist) > 1)
+							multiple_froms = true;
+
+					if (!different_columns_order &&	!multiple_froms)
+					{
+						add_path(distinct_rel, (Path *)
+								 create_skipscan_unique_path(root,
+															 distinct_rel,
+															 path,
+															 distinctPrefixKeys,
+															 numDistinctRows));
+					}
+				}
 			}
 		}
 
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 278436f102..87d39570b5 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2916,6 +2916,46 @@ create_upper_unique_path(PlannerInfo *root,
 	return pathnode;
 }
 
+/*
+ * create_skipscan_unique_path
+ *	  Creates a pathnode the same as an existing IndexPath except based on
+ *	  skipping duplicate values.  This may or may not be cheaper than using
+ *	  create_upper_unique_path.
+ *
+ * The input path must be an IndexPath for an index that supports amskip.
+ */
+IndexPath *
+create_skipscan_unique_path(PlannerInfo *root,
+							RelOptInfo *rel,
+							Path *basepath,
+							int distinctPrefixKeys,
+							double numGroups)
+{
+	IndexPath *pathnode = makeNode(IndexPath);
+
+	Assert(IsA(basepath, IndexPath));
+
+	/* We don't want to modify basepath, so make a copy. */
+	memcpy(pathnode, basepath, sizeof(IndexPath));
+
+	/* The size of the prefix we'll use for skipping. */
+	Assert(pathnode->indexinfo->amcanskip);
+	Assert(distinctPrefixKeys > 0);
+	pathnode->indexskipprefix = distinctPrefixKeys;
+	pathnode->indexdistinct = true;
+
+	/*
+	 * The cost to skip to each distinct value should be roughly the same as
+	 * the cost of finding the first key times the number of distinct values
+	 * we expect to find.
+	 */
+	pathnode->path.startup_cost = basepath->startup_cost;
+	pathnode->path.total_cost = basepath->startup_cost * numGroups;
+	pathnode->path.rows = numGroups;
+
+	return pathnode;
+}
+
 /*
  * create_agg_path
  *	  Creates a pathnode that represents performing aggregation/grouping
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d82fc5ab8b..364e23cbfb 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -271,6 +271,9 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			info->amoptionalkey = amroutine->amoptionalkey;
 			info->amsearcharray = amroutine->amsearcharray;
 			info->amsearchnulls = amroutine->amsearchnulls;
+			info->amcanskip = (amroutine->amskip != NULL &&
+					amroutine->amgetskiptuple != NULL &&
+					amroutine->ambeginskipscan != NULL);
 			info->amcanparallel = amroutine->amcanparallel;
 			info->amhasgettuple = (amroutine->amgettuple != NULL);
 			info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 7d1f1069f1..1b401f837a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -960,6 +960,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_indexskipscan", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of index-skip-scan plans."),
+			NULL
+		},
+		&enable_indexskipscan,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_bitmapscan", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of bitmap-scan plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c7e46592fb..7f7929ecd8 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -354,6 +354,7 @@
 #enable_hashjoin = on
 #enable_indexscan = on
 #enable_indexonlyscan = on
+#enable_indexskipscan = on
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index d02e676aa3..adca0c69d2 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -919,7 +919,7 @@ tuplesort_begin_cluster(TupleDesc tupDesc,
 
 	state->tupDesc = tupDesc;	/* assume we need not copy tupDesc */
 
-	indexScanKey = _bt_mkscankey(indexRel, NULL);
+	indexScanKey = _bt_mkscankey(indexRel, NULL, NULL);
 
 	if (state->indexInfo->ii_Expressions != NULL)
 	{
@@ -1014,7 +1014,7 @@ tuplesort_begin_index_btree(Relation heapRel,
 	state->indexRel = indexRel;
 	state->enforceUnique = enforceUnique;
 
-	indexScanKey = _bt_mkscankey(indexRel, NULL);
+	indexScanKey = _bt_mkscankey(indexRel, NULL, NULL);
 
 	/* Prepare SortSupport data for each column */
 	state->sortKeys = (SortSupport) palloc0(state->nKeys *
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 3b3e22f73d..c6b352d61f 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -119,6 +119,12 @@ typedef IndexScanDesc (*ambeginscan_function) (Relation indexRelation,
 											   int nkeys,
 											   int norderbys);
 
+/* prepare for index scan with skip */
+typedef IndexScanDesc (*ambeginscan_skip_function) (Relation indexRelation,
+											   int nkeys,
+											   int norderbys,
+											   int prefix);
+
 /* (re)start index scan */
 typedef void (*amrescan_function) (IndexScanDesc scan,
 								   ScanKey keys,
@@ -130,6 +136,16 @@ typedef void (*amrescan_function) (IndexScanDesc scan,
 typedef bool (*amgettuple_function) (IndexScanDesc scan,
 									 ScanDirection direction);
 
+/* next valid tuple */
+typedef bool (*amgettuple_with_skip_function) (IndexScanDesc scan,
+											   ScanDirection prefixDir,
+											   ScanDirection postfixDir);
+
+/* skip past duplicates */
+typedef bool (*amskip_function) (IndexScanDesc scan,
+								 ScanDirection prefixDir,
+								 ScanDirection postfixDir);
+
 /* fetch all valid tuples */
 typedef int64 (*amgetbitmap_function) (IndexScanDesc scan,
 									   TIDBitmap *tbm);
@@ -223,12 +239,15 @@ typedef struct IndexAmRoutine
 	ambuildphasename_function ambuildphasename; /* can be NULL */
 	amvalidate_function amvalidate;
 	ambeginscan_function ambeginscan;
+	ambeginscan_skip_function ambeginskipscan;
 	amrescan_function amrescan;
 	amgettuple_function amgettuple; /* can be NULL */
+	amgettuple_with_skip_function amgetskiptuple; /* can be NULL */
 	amgetbitmap_function amgetbitmap;	/* can be NULL */
 	amendscan_function amendscan;
 	ammarkpos_function ammarkpos;	/* can be NULL */
 	amrestrpos_function amrestrpos; /* can be NULL */
+	amskip_function amskip;				/* can be NULL */
 
 	/* interface functions to support parallel index scans */
 	amestimateparallelscan_function amestimateparallelscan; /* can be NULL */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 7e9364a50c..7cea6c1756 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -149,9 +149,17 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_skip(Relation heapRelation,
+									 Relation indexRelation,
+									 Snapshot snapshot,
+									 int nkeys, int norderbys, int prefix);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											int nkeys);
+extern IndexScanDesc index_beginscan_bitmap_skip(Relation indexRelation,
+											Snapshot snapshot,
+											int nkeys,
+											int prefix);
 extern void index_rescan(IndexScanDesc scan,
 						 ScanKey keys, int nkeys,
 						 ScanKey orderbys, int norderbys);
@@ -167,10 +175,16 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  ParallelIndexScanDesc pscan);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
+extern ItemPointer index_getnext_tid_skip(IndexScanDesc scan,
+									 ScanDirection prefixDir,
+									 ScanDirection postfixDir);
 struct TupleTableSlot;
 extern bool index_fetch_heap(IndexScanDesc scan, struct TupleTableSlot *slot);
 extern bool index_getnext_slot(IndexScanDesc scan, ScanDirection direction,
 							   struct TupleTableSlot *slot);
+extern bool index_getnext_slot_skip(IndexScanDesc scan, ScanDirection prefixDir,
+									ScanDirection postfixDir,
+									struct TupleTableSlot *slot);
 extern int64 index_getbitmap(IndexScanDesc scan, TIDBitmap *bitmap);
 
 extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
@@ -180,6 +194,8 @@ extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
 extern IndexBulkDeleteResult *index_vacuum_cleanup(IndexVacuumInfo *info,
 												   IndexBulkDeleteResult *stats);
 extern bool index_can_return(Relation indexRelation, int attno);
+extern bool index_skip(IndexScanDesc scan, ScanDirection prefixDir,
+					   ScanDirection postfixDir);
 extern RegProcedure index_getprocid(Relation irel, AttrNumber attnum,
 									uint16 procnum);
 extern FmgrInfo *index_getprocinfo(Relation irel, AttrNumber attnum,
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 18206a0c65..015bb63df5 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -910,6 +910,53 @@ typedef struct BTArrayKeyInfo
 	Datum	   *elem_values;	/* array of num_elems Datums */
 } BTArrayKeyInfo;
 
+typedef struct BTSkipCompareResult
+{
+	bool		equal;
+	int			prefixCmpResult, skCmpResult;
+	bool		prefixSkip, fullKeySkip;
+	int			prefixSkipIndex;
+} BTSkipCompareResult;
+
+typedef enum BTSkipState
+{
+	SkipStateStop,
+	SkipStateSkip,
+	SkipStateSkipExtra,
+	SkipStateNext
+} BTSkipState;
+
+typedef struct BTSkipPosData
+{
+	BTSkipState nextAction;
+	ScanDirection nextDirection;
+	int nextSkipIndex;
+	BTScanInsertData skipScanKey;
+} BTSkipPosData;
+
+typedef struct BTSkipData
+{
+	/* used to control skipping
+	 * skipScanKey is a combination of currentTupleKey and fwdScanKey/bwdScanKey.
+	 * currentTupleKey contains the scan keys for the current tuple
+	 * fwdScanKey contains the scan keys for quals that would be chosen for a forward scan
+	 * bwdScanKey contains the scan keys for quals that would be chosen for a backward scan
+	 * we need both fwd and bwd, because the scan keys differ for going fwd and bwd
+	 * if a qual would be a>2 and a<5, fwd would have a>2, while bwd would have a<5
+	 */
+	BTScanInsertData	currentTupleKey;
+	BTScanInsertData	fwdScanKey;
+	ScanKeyData			fwdNotNullKeys[INDEX_MAX_KEYS];
+	BTScanInsertData	bwdScanKey;
+	ScanKeyData			bwdNotNullKeys[INDEX_MAX_KEYS];
+	/* length of prefix to skip */
+	int					prefix;
+
+	BTSkipPosData curPos, markPos;
+} BTSkipData;
+
+typedef BTSkipData *BTSkip;
+
 typedef struct BTScanOpaqueData
 {
 	/* these fields are set by _bt_preprocess_keys(): */
@@ -947,6 +994,9 @@ typedef struct BTScanOpaqueData
 	 */
 	int			markItemIndex;	/* itemIndex, or -1 if not valid */
 
+	/* Work space for _bt_skip */
+	BTSkip	skipData;	/* used to control skipping */
+
 	/* keep these last in struct for efficiency */
 	BTScanPosData currPos;		/* current position data */
 	BTScanPosData markPos;		/* marked position, if any */
@@ -961,6 +1011,8 @@ typedef BTScanOpaqueData *BTScanOpaque;
  */
 #define SK_BT_REQFWD	0x00010000	/* required to continue forward scan */
 #define SK_BT_REQBKWD	0x00020000	/* required to continue backward scan */
+#define SK_BT_REQSKIPFWD	0x00040000	/* required to continue forward scan within current prefix */
+#define SK_BT_REQSKIPBKWD	0x00080000	/* required to continue backward scan within current prefix */
 #define SK_BT_INDOPTION_SHIFT  24	/* must clear the above bits */
 #define SK_BT_DESC			(INDOPTION_DESC << SK_BT_INDOPTION_SHIFT)
 #define SK_BT_NULLS_FIRST	(INDOPTION_NULLS_FIRST << SK_BT_INDOPTION_SHIFT)
@@ -1007,9 +1059,12 @@ extern bool btinsert(Relation rel, Datum *values, bool *isnull,
 					 IndexUniqueCheck checkUnique,
 					 struct IndexInfo *indexInfo);
 extern IndexScanDesc btbeginscan(Relation rel, int nkeys, int norderbys);
+extern IndexScanDesc btbeginscan_skip(Relation rel, int nkeys, int norderbys, int skipPrefix);
 extern Size btestimateparallelscan(void);
 extern void btinitparallelscan(void *target);
 extern bool btgettuple(IndexScanDesc scan, ScanDirection dir);
+extern bool btgettuple_skip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir);
+extern bool btskip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir);
 extern int64 btgetbitmap(IndexScanDesc scan, TIDBitmap *tbm);
 extern void btrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 					 ScanKey orderbys, int norderbys);
@@ -1101,15 +1156,79 @@ extern Buffer _bt_moveright(Relation rel, BTScanInsert key, Buffer buf,
 							bool forupdate, BTStack stack, int access, Snapshot snapshot);
 extern OffsetNumber _bt_binsrch_insert(Relation rel, BTInsertState insertstate);
 extern int32 _bt_compare(Relation rel, BTScanInsert key, Page page, OffsetNumber offnum);
-extern bool _bt_first(IndexScanDesc scan, ScanDirection dir);
-extern bool _bt_next(IndexScanDesc scan, ScanDirection dir);
+extern bool _bt_first(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir);
+extern bool _bt_next(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir);
 extern Buffer _bt_get_endpoint(Relation rel, uint32 level, bool rightmost,
 							   Snapshot snapshot);
+extern Buffer _bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot);
+extern OffsetNumber _bt_binsrch(Relation rel, BTScanInsert key, Buffer buf);
+extern void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir);
+extern bool _bt_readpage(IndexScanDesc scan, ScanDirection dir,
+						 OffsetNumber *offnum, bool isRegularMode);
+extern bool _bt_steppage(IndexScanDesc scan, ScanDirection dir);
+extern bool _bt_readnextpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir);
+extern void _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp);
+
+/*
+* prototypes for functions in nbtskip.c
+*/
+static inline bool
+_bt_skip_enabled(BTScanOpaque so)
+{
+	return so->skipData != NULL;
+}
+
+static inline bool
+_bt_skip_is_regular_mode(ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	return prefixDir == postfixDir;
+}
+
+/* returns whether or not we can use extra quals in the scankey after skipping to a prefix */
+static inline bool
+_bt_has_extra_quals_after_skip(BTSkip skip, ScanDirection dir, int prefix)
+{
+	if (ScanDirectionIsForward(dir))
+	{
+		return skip->fwdScanKey.keysz > prefix;
+	}
+	else
+	{
+		return skip->bwdScanKey.keysz > prefix;
+	}
+}
+
+/* alias of BTScanPosIsValid */
+static inline bool
+_bt_skip_is_always_valid(BTScanOpaque so)
+{
+	return BTScanPosIsValid(so->currPos);
+}
+
+extern bool _bt_skip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir);
+extern void _bt_skip_create_scankeys(Relation rel, BTScanOpaque so);
+extern void _bt_skip_update_scankey_for_extra_skip(IndexScanDesc scan, Relation indexRel,
+					ScanDirection curDir, ScanDirection prefixDir, bool prioritizeEqual, IndexTuple itup);
+extern void _bt_skip_once(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum,
+						  bool forceSkip, ScanDirection prefixDir, ScanDirection postfixDir);
+extern void _bt_skip_extra_conditions(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum,
+									  ScanDirection prefixDir, ScanDirection postfixDir, BTSkipCompareResult *cmp);
+extern bool _bt_skip_find_next(IndexScanDesc scan, IndexTuple curTuple, OffsetNumber curTupleOffnum,
+							   ScanDirection prefixDir, ScanDirection postfixDir);
+extern void _bt_skip_until_match(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum,
+								 ScanDirection prefixDir, ScanDirection postfixDir);
+extern bool _bt_has_results(BTScanOpaque so);
+extern void _bt_compare_current_item(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
+									 ScanDirection dir, bool isRegularMode, BTSkipCompareResult* cmp);
+extern bool _bt_step_back_page(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum);
+extern bool _bt_step_forward_page(IndexScanDesc scan, BlockNumber next, IndexTuple *curTuple,
+								  OffsetNumber *curTupleOffnum);
+extern bool _bt_checkkeys_skip(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
+							   ScanDirection dir, bool *continuescan, int *prefixskipindex);
 
 /*
  * prototypes for functions in nbtutils.c
  */
-extern BTScanInsert _bt_mkscankey(Relation rel, IndexTuple itup);
 extern void _bt_freestack(BTStack stack);
 extern void _bt_preprocess_array_keys(IndexScanDesc scan);
 extern void _bt_start_array_keys(IndexScanDesc scan, ScanDirection dir);
@@ -1118,7 +1237,7 @@ extern void _bt_mark_array_keys(IndexScanDesc scan);
 extern void _bt_restore_array_keys(IndexScanDesc scan);
 extern void _bt_preprocess_keys(IndexScanDesc scan);
 extern bool _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple,
-						  int tupnatts, ScanDirection dir, bool *continuescan);
+						  int tupnatts, ScanDirection dir, bool *continuescan, int *indexSkipPrefix);
 extern void _bt_killitems(IndexScanDesc scan);
 extern BTCycleId _bt_vacuum_cycleid(Relation rel);
 extern BTCycleId _bt_start_vacuum(Relation rel);
@@ -1140,6 +1259,19 @@ extern bool _bt_check_natts(Relation rel, bool heapkeyspace, Page page,
 extern void _bt_check_third_page(Relation rel, Relation heap,
 								 bool needheaptidspace, Page page, IndexTuple newtup);
 extern bool _bt_allequalimage(Relation rel, bool debugmessage);
+extern bool _bt_checkkeys_threeway(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
+				ScanDirection dir, bool *continuescan, int *prefixSkipIndex);
+extern bool _bt_create_insertion_scan_key(Relation	rel, ScanDirection dir,
+				ScanKey* startKeys, int keysCount,
+				BTScanInsert inskey, StrategyNumber* stratTotal,
+				bool* goback);
+extern void _bt_set_bsearch_flags(StrategyNumber stratTotal, ScanDirection dir,
+		bool* nextkey, bool* goback);
+extern int _bt_choose_scan_keys(ScanKey scanKeys, int numberOfKeys, ScanDirection dir,
+ScanKey* startKeys, ScanKeyData* notnullkeys,
+  StrategyNumber* stratTotal, int prefix);
+extern BTScanInsert _bt_mkscankey(Relation rel, IndexTuple itup, BTScanInsert key);
+extern void print_itup(BlockNumber blk, IndexTuple left, IndexTuple right, Relation rel, char *extra);
 
 /*
  * prototypes for functions in nbtvalidate.c
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 94890512dc..897b445884 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -429,9 +429,13 @@ extern Datum ExecMakeFunctionResultSet(SetExprState *fcache,
  */
 typedef TupleTableSlot *(*ExecScanAccessMtd) (ScanState *node);
 typedef bool (*ExecScanRecheckMtd) (ScanState *node, TupleTableSlot *slot);
+typedef bool (*ExecScanSkipMtd) (ScanState *node);
 
 extern TupleTableSlot *ExecScan(ScanState *node, ExecScanAccessMtd accessMtd,
 								ExecScanRecheckMtd recheckMtd);
+extern TupleTableSlot *ExecScanExtended(ScanState *node, ExecScanAccessMtd accessMtd,
+								ExecScanRecheckMtd recheckMtd,
+								ExecScanSkipMtd skipMtd);
 extern void ExecAssignScanProjectionInfo(ScanState *node);
 extern void ExecAssignScanProjectionInfoWithVarno(ScanState *node, Index varno);
 extern void ExecScanReScan(ScanState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3d27d50f09..03e5060765 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1331,6 +1331,7 @@ typedef struct ScanState
 	Relation	ss_currentRelation;
 	struct TableScanDescData *ss_currentScanDesc;
 	TupleTableSlot *ss_ScanTupleSlot;
+	bool ss_FirstTupleEmitted;
 } ScanState;
 
 /* ----------------
@@ -1427,6 +1428,8 @@ typedef struct IndexScanState
 	ExprContext *iss_RuntimeContext;
 	Relation	iss_RelationDesc;
 	struct IndexScanDescData *iss_ScanDesc;
+	int			iss_SkipPrefixSize;
+	bool		iss_Distinct;
 
 	/* These are needed for re-checking ORDER BY expr ordering */
 	pairingheap *iss_ReorderQueue;
@@ -1456,6 +1459,8 @@ typedef struct IndexScanState
  *		TableSlot		   slot for holding tuples fetched from the table
  *		VMBuffer		   buffer in use for visibility map testing, if any
  *		PscanLen		   size of parallel index-only scan descriptor
+ *		SkipPrefixSize	   number of keys for skip-based DISTINCT
+ *		FirstTupleEmitted  has the first tuple been emitted
  * ----------------
  */
 typedef struct IndexOnlyScanState
@@ -1474,6 +1479,8 @@ typedef struct IndexOnlyScanState
 	struct IndexScanDescData *ioss_ScanDesc;
 	TupleTableSlot *ioss_TableSlot;
 	Buffer		ioss_VMBuffer;
+	int			ioss_SkipPrefixSize;
+	bool		ioss_Distinct;
 	Size		ioss_PscanLen;
 } IndexOnlyScanState;
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d4816c180d..86dcd057ed 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -839,6 +839,7 @@ struct IndexOptInfo
 	bool		amsearchnulls;	/* can AM search for NULL/NOT NULL entries? */
 	bool		amhasgettuple;	/* does AM have amgettuple interface? */
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
+	bool		amcanskip;		/* can AM skip duplicate values? */
 	bool		amcanparallel;	/* does AM support parallel scan? */
 	/* Rather than include amapi.h here, we declare amcostestimate like this */
 	void		(*amcostestimate) ();	/* AM's cost estimator */
@@ -1189,6 +1190,9 @@ typedef struct Path
  * we need not recompute them when considering using the same index in a
  * bitmap index/heap scan (see BitmapHeapPath).  The costs of the IndexPath
  * itself represent the costs of an IndexScan or IndexOnlyScan plan type.
+ *
+ * 'indexskipprefix' represents the number of columns to consider for skip
+ * scans.
  *----------
  */
 typedef struct IndexPath
@@ -1201,6 +1205,8 @@ typedef struct IndexPath
 	ScanDirection indexscandir;
 	Cost		indextotalcost;
 	Selectivity indexselectivity;
+	int			indexskipprefix;
+	bool		indexdistinct;
 } IndexPath;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 4869fe7b6d..49f4de3843 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -409,6 +409,8 @@ typedef struct IndexScan
 	List	   *indexorderbyorig;	/* the same in original form */
 	List	   *indexorderbyops;	/* OIDs of sort ops for ORDER BY exprs */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for skip scans */
+	bool		indexdistinct; /* whether only distinct keys are requested */
 } IndexScan;
 
 /* ----------------
@@ -436,6 +438,8 @@ typedef struct IndexOnlyScan
 	List	   *indexorderby;	/* list of index ORDER BY exprs */
 	List	   *indextlist;		/* TargetEntry list describing index's cols */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for skip scans */
+	bool		indexdistinct; /* whether only distinct keys are requested */
 } IndexOnlyScan;
 
 /* ----------------
@@ -462,6 +466,7 @@ typedef struct BitmapIndexScan
 	bool		isshared;		/* Create shared bitmap if set */
 	List	   *indexqual;		/* list of index quals (OpExprs) */
 	List	   *indexqualorig;	/* the same in original form */
+	int			indexskipprefixsize;	/* the size of the prefix for skip scans */
 } BitmapIndexScan;
 
 /* ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 735ba09650..923eecf5f0 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -50,6 +50,7 @@ extern PGDLLIMPORT int max_parallel_workers_per_gather;
 extern PGDLLIMPORT bool enable_seqscan;
 extern PGDLLIMPORT bool enable_indexscan;
 extern PGDLLIMPORT bool enable_indexonlyscan;
+extern PGDLLIMPORT bool enable_indexskipscan;
 extern PGDLLIMPORT bool enable_bitmapscan;
 extern PGDLLIMPORT bool enable_tidscan;
 extern PGDLLIMPORT bool enable_sort;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index f75ff6f323..6c8c9dadbb 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -201,6 +201,11 @@ extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
 												 Path *subpath,
 												 int numCols,
 												 double numGroups);
+extern IndexPath *create_skipscan_unique_path(PlannerInfo *root,
+											  RelOptInfo *rel,
+											  Path *subpath,
+											  int numCols,
+											  double numGroups);
 extern AggPath *create_agg_path(PlannerInfo *root,
 								RelOptInfo *rel,
 								Path *subpath,
diff --git a/src/interfaces/libpq/encnames.c b/src/interfaces/libpq/encnames.c
new file mode 120000
index 0000000000..ca78618b55
--- /dev/null
+++ b/src/interfaces/libpq/encnames.c
@@ -0,0 +1 @@
+../../../src/backend/utils/mb/encnames.c
\ No newline at end of file
diff --git a/src/interfaces/libpq/wchar.c b/src/interfaces/libpq/wchar.c
new file mode 120000
index 0000000000..a27508f72a
--- /dev/null
+++ b/src/interfaces/libpq/wchar.c
@@ -0,0 +1 @@
+../../../src/backend/utils/mb/wchar.c
\ No newline at end of file
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index 11c6f50fbf..e21afa7990 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -306,3 +306,604 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+ANALYZE distinct_a;
+SELECT DISTINCT a FROM distinct_a;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+ a 
+---
+ 1
+(1 row)
+
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Index Only Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: Distinct only
+(2 rows)
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: Distinct only
+   Index Cond: (b = 2)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: Distinct only
+   Index Cond: (b = 2)
+(3 rows)
+
+DROP INDEX distinct_a_b_a;
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+FETCH FROM c;
+ a | b 
+---+---
+ 1 | 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+END;
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+FETCH FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+END;
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Index Only Scan using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: Distinct only
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 1 | 2
+ 3 | 1 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 1 | 2
+ 1 | 1 | 2
+(2 rows)
+
+END;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Index Only Scan Backward using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: Distinct only
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 2 | 2
+ 1 | 2 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 2 | 2
+ 3 | 2 | 2
+(2 rows)
+
+END;
+DROP TABLE distinct_abc;
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+ 2 | 1 | 10
+ 3 | 1 | 10
+ 4 | 1 | 10
+ 5 | 1 | 10
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Index Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: Distinct only
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Unique
+   ->  Bitmap Heap Scan on distinct_a
+         Recheck Cond: (a = 1)
+         ->  Bitmap Index Scan on distinct_a_a_b_idx
+               Index Cond: (a = 1)
+(5 rows)
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Index Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: Distinct only
+   Index Cond: (b = 2)
+   Filter: (c = 10)
+(4 rows)
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+ a | a 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+ 4 | 4
+ 5 | 5
+(5 rows)
+
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+ a | ?column? 
+---+----------
+ 1 |        1
+ 2 |        1
+ 3 |        1
+ 4 |        1
+ 5 |        1
+(5 rows)
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+FETCH FROM c;
+ a 
+---
+ 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a 
+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+END;
+DROP TABLE distinct_a;
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 |  9999
+ 1 | 10000
+(5 rows)
+
+DROP TABLE distinct_visibility;
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Index Only Scan using distinct_boundaries_a_b_c_idx on distinct_boundaries
+   Skip scan: Distinct only
+   Index Cond: ((b >= 1) AND (c = 0))
+(3 rows)
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+ a | b | c 
+---+---+---
+ 1 | 2 | 0
+ 2 | 2 | 0
+ 3 | 2 | 0
+ 4 | 2 | 0
+ 5 | 2 | 0
+(5 rows)
+
+DROP TABLE distinct_boundaries;
+-- test tuple killing
+-- DESC ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed where a = 3;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 5 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 1 | 1000 | 0 | 10
+(4 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 1 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 5 | 1000 | 0 | 10
+(4 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
+-- regular ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed where a = 3;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a, b;
+    FETCH FORWARD ALL FROM c;
+ a | b | c | d  
+---+---+---+----
+ 1 | 1 | 1 | 10
+ 2 | 1 | 1 | 10
+ 4 | 1 | 1 | 10
+ 5 | 1 | 1 | 10
+(4 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a | b | c | d  
+---+---+---+----
+ 5 | 1 | 1 | 10
+ 4 | 1 | 1 | 10
+ 2 | 1 | 1 | 10
+ 1 | 1 | 1 | 10
+(4 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
+-- partial delete
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed WHERE a = 3 AND b <= 999;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 5 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 3 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 1 | 1000 | 0 | 10
+(5 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 1 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 3 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 5 | 1000 | 0 | 10
+(5 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 715842b87a..7e16655f03 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -80,6 +80,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashjoin                | on
  enable_indexonlyscan           | on
  enable_indexscan               | on
+ enable_indexskipscan           | on
  enable_material                | on
  enable_mergejoin               | on
  enable_nestloop                | on
@@ -91,7 +92,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(19 rows)
+(20 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index 33102744eb..0227c98823 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -135,3 +135,251 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+ANALYZE distinct_a;
+
+SELECT DISTINCT a FROM distinct_a;
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+DROP INDEX distinct_a_b_a;
+
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+DROP TABLE distinct_abc;
+
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+DROP TABLE distinct_a;
+
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DROP TABLE distinct_visibility;
+
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+DROP TABLE distinct_boundaries;
+
+-- test tuple killing
+
+-- DESC ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed where a = 3;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
+
+-- regular ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed where a = 3;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a, b;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
+
+-- partial delete
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed WHERE a = 3 AND b <= 999;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
-- 
2.25.0

0003-Make-planner-favor-skip-in-index-scans-and-modify-te.patchapplication/octet-stream; name=0003-Make-planner-favor-skip-in-index-scans-and-modify-te.patchDownload

From 0e26ce9cdedbf00c5fc47a70846e85b4d7cbcf30 Mon Sep 17 00:00:00 2001
From: Floris van Nee <floris.vannee@gmail.com>
Date: Thu, 19 Mar 2020 10:27:47 +0100
Subject: [PATCH 3/3] Make planner favor skip in index scans and modify test
 ouputs

This commit hacks the planner to greatly favor skip scans over regular index scans.
To be used for testing purposes until proper planner implementation is in place.
It also modifies all expected results of the unit tests
to add the skip scan attribute in EXPLAIN output.
---
 src/backend/optimizer/util/pathnode.c         |   7 +
 src/backend/utils/adt/selfuncs.c              |  13 +-
 src/test/regress/expected/aggregates.out      |  59 +++--
 src/test/regress/expected/btree_index.out     |  24 +-
 src/test/regress/expected/cluster.out         |  18 +-
 src/test/regress/expected/create_index.out    |  43 +++-
 src/test/regress/expected/equivclass.out      |  72 ++++--
 src/test/regress/expected/fast_default.out    |   3 +-
 src/test/regress/expected/foreign_key.out     |   4 +-
 src/test/regress/expected/generated.out       |   9 +-
 src/test/regress/expected/groupingsets.out    |   6 +-
 src/test/regress/expected/index_including.out |   6 +-
 src/test/regress/expected/inet.out            |  12 +-
 src/test/regress/expected/inherit.out         | 149 ++++++++---
 src/test/regress/expected/insert_conflict.out |   3 +-
 src/test/regress/expected/interval.out        |   3 +-
 src/test/regress/expected/join.out            | 235 ++++++++++++------
 src/test/regress/expected/limit.out           |   9 +-
 src/test/regress/expected/misc_functions.out  |   6 +-
 src/test/regress/expected/partition_join.out  |  37 ++-
 src/test/regress/expected/partition_prune.out | 151 +++++++++--
 src/test/regress/expected/plancache.out       |   6 +-
 src/test/regress/expected/portals.out         |   3 +-
 src/test/regress/expected/privileges.out      |  15 +-
 src/test/regress/expected/regex.out           |  21 +-
 src/test/regress/expected/rowsecurity.out     |  30 ++-
 src/test/regress/expected/rowtypes.out        |  18 +-
 src/test/regress/expected/select.out          |  29 ++-
 src/test/regress/expected/select_distinct.out |  15 +-
 src/test/regress/expected/select_parallel.out |   9 +-
 src/test/regress/expected/subselect.out       |  10 +-
 src/test/regress/expected/tuplesort.out       |  12 +-
 src/test/regress/expected/union.out           |  36 ++-
 src/test/regress/expected/updatable_views.out |  56 +++--
 34 files changed, 840 insertions(+), 289 deletions(-)

diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 87d39570b5..865c4af8df 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1029,6 +1029,13 @@ create_index_path(PlannerInfo *root,
 	pathnode->indexorderbycols = indexorderbycols;
 	pathnode->indexscandir = indexscandir;
 
+	/* @todo this is just for testing purposes.
+	 * we need a better selection mechanism for when to
+	 * use skip scan and when to use regular index scan
+	 */
+	if (!partial_path && index->amcanskip && enable_indexskipscan)
+		pathnode->indexskipprefix = 10;
+
 	cost_index(pathnode, root, loop_count, partial_path);
 
 	return pathnode;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 8339f4cd7a..91ae33d74c 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5929,13 +5929,22 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 		if (indexcol != iclause->indexcol)
 		{
+			/* @todo this estimate is wrong but use it for
+				now for testing purposes. it forces index skip scan to
+				be used as often as possible.
+			*/
 			/* Beginning of a new column's quals */
-			if (!eqQualHere)
+			if (!eqQualHere && path->indexskipprefix == 0)
 				break;			/* done if no '=' qual for indexcol */
 			eqQualHere = false;
 			indexcol++;
 			if (indexcol != iclause->indexcol)
-				break;			/* no quals at all for indexcol */
+			{
+				if (path->indexskipprefix == 0)
+					break;			/* no quals at all for indexcol */
+				else
+					continue;
+			}
 		}
 
 		/* Examine each indexqual associated with this index clause */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 3259a22516..2194f009fd 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -720,8 +720,9 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan using tenk1_unique1 on tenk1
+                 Skip scan: All
                  Index Cond: (unique1 IS NOT NULL)
-(5 rows)
+(6 rows)
 
 select min(unique1) from tenk1;
  min 
@@ -737,8 +738,9 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Skip scan: All
                  Index Cond: (unique1 IS NOT NULL)
-(5 rows)
+(6 rows)
 
 select max(unique1) from tenk1;
  max  
@@ -754,8 +756,9 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Skip scan: All
                  Index Cond: ((unique1 IS NOT NULL) AND (unique1 < 42))
-(5 rows)
+(6 rows)
 
 select max(unique1) from tenk1 where unique1 < 42;
  max 
@@ -771,8 +774,9 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Skip scan: All
                  Index Cond: ((unique1 IS NOT NULL) AND (unique1 > 42))
-(5 rows)
+(6 rows)
 
 select max(unique1) from tenk1 where unique1 > 42;
  max  
@@ -794,8 +798,9 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Skip scan: All
                  Index Cond: ((unique1 IS NOT NULL) AND (unique1 > 42000))
-(5 rows)
+(6 rows)
 
 select max(unique1) from tenk1 where unique1 > 42000;
  max 
@@ -813,8 +818,9 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_thous_tenthous on tenk1
+                 Skip scan: All
                  Index Cond: ((thousand = 33) AND (tenthous IS NOT NULL))
-(5 rows)
+(6 rows)
 
 select max(tenthous) from tenk1 where thousand = 33;
  max  
@@ -830,8 +836,9 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+                 Skip scan: All
                  Index Cond: ((thousand = 33) AND (tenthous IS NOT NULL))
-(5 rows)
+(6 rows)
 
 select min(tenthous) from tenk1 where thousand = 33;
  min 
@@ -851,8 +858,9 @@ explain (costs off)
            InitPlan 1 (returns $1)
              ->  Limit
                    ->  Index Only Scan using tenk1_unique1 on tenk1
+                         Skip scan: All
                          Index Cond: ((unique1 IS NOT NULL) AND (unique1 > int4_tbl.f1))
-(7 rows)
+(8 rows)
 
 select f1, (select min(unique1) from tenk1 where unique1 > f1) AS gt
   from int4_tbl;
@@ -875,9 +883,10 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
+                 Skip scan: All
                  Index Cond: (unique2 IS NOT NULL)
    ->  Result
-(7 rows)
+(8 rows)
 
 select distinct max(unique2) from tenk1;
  max  
@@ -894,9 +903,10 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
+                 Skip scan: All
                  Index Cond: (unique2 IS NOT NULL)
    ->  Result
-(7 rows)
+(8 rows)
 
 select max(unique2) from tenk1 order by 1;
  max  
@@ -913,9 +923,10 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
+                 Skip scan: All
                  Index Cond: (unique2 IS NOT NULL)
    ->  Result
-(7 rows)
+(8 rows)
 
 select max(unique2) from tenk1 order by max(unique2);
  max  
@@ -932,9 +943,10 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
+                 Skip scan: All
                  Index Cond: (unique2 IS NOT NULL)
    ->  Result
-(7 rows)
+(8 rows)
 
 select max(unique2) from tenk1 order by max(unique2)+1;
  max  
@@ -951,10 +963,11 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
+                 Skip scan: All
                  Index Cond: (unique2 IS NOT NULL)
    ->  ProjectSet
          ->  Result
-(8 rows)
+(9 rows)
 
 select max(unique2), generate_series(1,3) as g from tenk1 order by g desc;
  max  | g 
@@ -1006,24 +1019,32 @@ explain (costs off)
            ->  Merge Append
                  Sort Key: minmaxtest.f1
                  ->  Index Only Scan using minmaxtesti on minmaxtest minmaxtest_1
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan using minmaxtest1i on minmaxtest1 minmaxtest_2
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest2i on minmaxtest2 minmaxtest_3
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan using minmaxtest3i on minmaxtest3 minmaxtest_4
+                       Skip scan: All
    InitPlan 2 (returns $1)
      ->  Limit
            ->  Merge Append
                  Sort Key: minmaxtest_5.f1 DESC
                  ->  Index Only Scan Backward using minmaxtesti on minmaxtest minmaxtest_6
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest1i on minmaxtest1 minmaxtest_7
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest_8
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest_9
-(23 rows)
+                       Skip scan: All
+(31 rows)
 
 select min(f1), max(f1) from minmaxtest;
  min | max 
@@ -1042,27 +1063,35 @@ explain (costs off)
            ->  Merge Append
                  Sort Key: minmaxtest.f1
                  ->  Index Only Scan using minmaxtesti on minmaxtest minmaxtest_1
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan using minmaxtest1i on minmaxtest1 minmaxtest_2
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest2i on minmaxtest2 minmaxtest_3
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan using minmaxtest3i on minmaxtest3 minmaxtest_4
+                       Skip scan: All
    InitPlan 2 (returns $1)
      ->  Limit
            ->  Merge Append
                  Sort Key: minmaxtest_5.f1 DESC
                  ->  Index Only Scan Backward using minmaxtesti on minmaxtest minmaxtest_6
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest1i on minmaxtest1 minmaxtest_7
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest_8
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest_9
+                       Skip scan: All
    ->  Sort
          Sort Key: ($0), ($1)
          ->  Result
-(26 rows)
+(34 rows)
 
 select distinct min(f1), max(f1) from minmaxtest;
  min | max 
diff --git a/src/test/regress/expected/btree_index.out b/src/test/regress/expected/btree_index.out
index 1646deb092..7e09e1df8c 100644
--- a/src/test/regress/expected/btree_index.out
+++ b/src/test/regress/expected/btree_index.out
@@ -110,9 +110,10 @@ select proname from pg_proc where proname like E'RI\\_FKey%del' order by 1;
                                   QUERY PLAN                                  
 ------------------------------------------------------------------------------
  Index Only Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: ((proname >= 'RI_FKey'::text) AND (proname < 'RI_FKez'::text))
    Filter: (proname ~~ 'RI\_FKey%del'::text)
-(3 rows)
+(4 rows)
 
 select proname from pg_proc where proname like E'RI\\_FKey%del' order by 1;
         proname         
@@ -129,9 +130,10 @@ select proname from pg_proc where proname ilike '00%foo' order by 1;
                              QUERY PLAN                             
 --------------------------------------------------------------------
  Index Only Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: ((proname >= '00'::text) AND (proname < '01'::text))
    Filter: (proname ~~* '00%foo'::text)
-(3 rows)
+(4 rows)
 
 select proname from pg_proc where proname ilike '00%foo' order by 1;
  proname 
@@ -143,8 +145,9 @@ select proname from pg_proc where proname ilike 'ri%foo' order by 1;
                            QUERY PLAN                            
 -----------------------------------------------------------------
  Index Only Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Filter: (proname ~~* 'ri%foo'::text)
-(2 rows)
+(3 rows)
 
 set enable_indexscan to false;
 set enable_bitmapscan to true;
@@ -157,8 +160,9 @@ select proname from pg_proc where proname like E'RI\\_FKey%del' order by 1;
    ->  Bitmap Heap Scan on pg_proc
          Filter: (proname ~~ 'RI\_FKey%del'::text)
          ->  Bitmap Index Scan on pg_proc_proname_args_nsp_index
+               Skip scan: All
                Index Cond: ((proname >= 'RI_FKey'::text) AND (proname < 'RI_FKez'::text))
-(6 rows)
+(7 rows)
 
 select proname from pg_proc where proname like E'RI\\_FKey%del' order by 1;
         proname         
@@ -179,8 +183,9 @@ select proname from pg_proc where proname ilike '00%foo' order by 1;
    ->  Bitmap Heap Scan on pg_proc
          Filter: (proname ~~* '00%foo'::text)
          ->  Bitmap Index Scan on pg_proc_proname_args_nsp_index
+               Skip scan: All
                Index Cond: ((proname >= '00'::text) AND (proname < '01'::text))
-(6 rows)
+(7 rows)
 
 select proname from pg_proc where proname ilike '00%foo' order by 1;
  proname 
@@ -192,8 +197,9 @@ select proname from pg_proc where proname ilike 'ri%foo' order by 1;
                            QUERY PLAN                            
 -----------------------------------------------------------------
  Index Only Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Filter: (proname ~~* 'ri%foo'::text)
-(2 rows)
+(3 rows)
 
 reset enable_seqscan;
 reset enable_indexscan;
@@ -240,8 +246,9 @@ select * from btree_bpchar where f1::bpchar like 'foo';
  Bitmap Heap Scan on btree_bpchar
    Filter: ((f1)::bpchar ~~ 'foo'::text)
    ->  Bitmap Index Scan on btree_bpchar_f1_idx
+         Skip scan: All
          Index Cond: ((f1)::bpchar = 'foo'::bpchar)
-(4 rows)
+(5 rows)
 
 select * from btree_bpchar where f1::bpchar like 'foo';
  f1  
@@ -256,8 +263,9 @@ select * from btree_bpchar where f1::bpchar like 'foo%';
  Bitmap Heap Scan on btree_bpchar
    Filter: ((f1)::bpchar ~~ 'foo%'::text)
    ->  Bitmap Index Scan on btree_bpchar_f1_idx
+         Skip scan: All
          Index Cond: (((f1)::bpchar >= 'foo'::bpchar) AND ((f1)::bpchar < 'fop'::bpchar))
-(4 rows)
+(5 rows)
 
 select * from btree_bpchar where f1::bpchar like 'foo%';
   f1  
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index bdae8fe00c..e7eeebf7e1 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -478,8 +478,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE upper(b) = 'PREFIX3';
                           QUERY PLAN                           
 ---------------------------------------------------------------
  Index Scan using clstr_expression_upper_b on clstr_expression
+   Skip scan: All
    Index Cond: (upper(b) = 'PREFIX3'::text)
-(2 rows)
+(3 rows)
 
 SELECT * FROM clstr_expression WHERE upper(b) = 'PREFIX3';
  id | a |    b    
@@ -491,8 +492,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
                           QUERY PLAN                           
 ---------------------------------------------------------------
  Index Scan using clstr_expression_minus_a on clstr_expression
+   Skip scan: All
    Index Cond: ((- a) = '-3'::integer)
-(2 rows)
+(3 rows)
 
 SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
  id  | a |     b     
@@ -512,8 +514,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE upper(b) = 'PREFIX3';
                           QUERY PLAN                           
 ---------------------------------------------------------------
  Index Scan using clstr_expression_upper_b on clstr_expression
+   Skip scan: All
    Index Cond: (upper(b) = 'PREFIX3'::text)
-(2 rows)
+(3 rows)
 
 SELECT * FROM clstr_expression WHERE upper(b) = 'PREFIX3';
  id | a |    b    
@@ -525,8 +528,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
                           QUERY PLAN                           
 ---------------------------------------------------------------
  Index Scan using clstr_expression_minus_a on clstr_expression
+   Skip scan: All
    Index Cond: ((- a) = '-3'::integer)
-(2 rows)
+(3 rows)
 
 SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
  id  | a |     b     
@@ -546,8 +550,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE upper(b) = 'PREFIX3';
                           QUERY PLAN                           
 ---------------------------------------------------------------
  Index Scan using clstr_expression_upper_b on clstr_expression
+   Skip scan: All
    Index Cond: (upper(b) = 'PREFIX3'::text)
-(2 rows)
+(3 rows)
 
 SELECT * FROM clstr_expression WHERE upper(b) = 'PREFIX3';
  id | a |    b    
@@ -559,8 +564,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
                           QUERY PLAN                           
 ---------------------------------------------------------------
  Index Scan using clstr_expression_minus_a on clstr_expression
+   Skip scan: All
    Index Cond: ((- a) = '-3'::integer)
-(2 rows)
+(3 rows)
 
 SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
  id  | a |     b     
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index ae95bb38a6..f8132ecacd 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1804,12 +1804,15 @@ SELECT * FROM tenk1
    Recheck Cond: (((thousand = 42) AND (tenthous = 1)) OR ((thousand = 42) AND (tenthous = 3)) OR ((thousand = 42) AND (tenthous = 42)))
    ->  BitmapOr
          ->  Bitmap Index Scan on tenk1_thous_tenthous
+               Skip scan: All
                Index Cond: ((thousand = 42) AND (tenthous = 1))
          ->  Bitmap Index Scan on tenk1_thous_tenthous
+               Skip scan: All
                Index Cond: ((thousand = 42) AND (tenthous = 3))
          ->  Bitmap Index Scan on tenk1_thous_tenthous
+               Skip scan: All
                Index Cond: ((thousand = 42) AND (tenthous = 42))
-(9 rows)
+(12 rows)
 
 SELECT * FROM tenk1
   WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42);
@@ -1828,13 +1831,16 @@ SELECT count(*) FROM tenk1
          Recheck Cond: ((hundred = 42) AND ((thousand = 42) OR (thousand = 99)))
          ->  BitmapAnd
                ->  Bitmap Index Scan on tenk1_hundred
+                     Skip scan: All
                      Index Cond: (hundred = 42)
                ->  BitmapOr
                      ->  Bitmap Index Scan on tenk1_thous_tenthous
+                           Skip scan: All
                            Index Cond: (thousand = 42)
                      ->  Bitmap Index Scan on tenk1_thous_tenthous
+                           Skip scan: All
                            Index Cond: (thousand = 99)
-(11 rows)
+(14 rows)
 
 SELECT count(*) FROM tenk1
   WHERE hundred = 42 AND (thousand = 42 OR thousand = 99);
@@ -1859,8 +1865,9 @@ EXPLAIN (COSTS OFF)
    ->  Bitmap Heap Scan on dupindexcols
          Recheck Cond: ((f1 >= 'WA'::text) AND (f1 <= 'ZZZ'::text) AND (id < 1000) AND (f1 ~<~ 'YX'::text))
          ->  Bitmap Index Scan on dupindexcols_i
+               Skip scan: All
                Index Cond: ((f1 >= 'WA'::text) AND (f1 <= 'ZZZ'::text) AND (id < 1000) AND (f1 ~<~ 'YX'::text))
-(5 rows)
+(6 rows)
 
 SELECT count(*) FROM dupindexcols
   WHERE f1 BETWEEN 'WA' AND 'ZZZ' and id < 1000 and f1 ~<~ 'YX';
@@ -1880,8 +1887,9 @@ ORDER BY unique1;
                       QUERY PLAN                       
 -------------------------------------------------------
  Index Only Scan using tenk1_unique1 on tenk1
+   Skip scan: All
    Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
-(2 rows)
+(3 rows)
 
 SELECT unique1 FROM tenk1
 WHERE unique1 IN (1,42,7)
@@ -1900,9 +1908,10 @@ ORDER BY thousand;
                       QUERY PLAN                       
 -------------------------------------------------------
  Index Only Scan using tenk1_thous_tenthous on tenk1
+   Skip scan: All
    Index Cond: (thousand < 2)
    Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
-(3 rows)
+(4 rows)
 
 SELECT thousand, tenthous FROM tenk1
 WHERE thousand < 2 AND tenthous IN (1001,3000)
@@ -1923,8 +1932,9 @@ ORDER BY thousand;
  Sort
    Sort Key: thousand
    ->  Index Scan using tenk1_thous_tenthous on tenk1
+         Skip scan: All
          Index Cond: ((thousand < 2) AND (tenthous = ANY ('{1001,3000}'::integer[])))
-(4 rows)
+(5 rows)
 
 SELECT thousand, tenthous FROM tenk1
 WHERE thousand < 2 AND tenthous IN (1001,3000)
@@ -1944,8 +1954,9 @@ explain (costs off)
                       QUERY PLAN                      
 ------------------------------------------------------
  Index Scan using tenk1_thous_tenthous on tenk1
+   Skip scan: All
    Index Cond: ((thousand = 1) AND (tenthous = 1001))
-(2 rows)
+(3 rows)
 
 --
 -- Check matching of boolean index columns to WHERE conditions and sort keys
@@ -1957,7 +1968,8 @@ explain (costs off)
 -------------------------------------------------------
  Limit
    ->  Index Scan using boolindex_b_i_key on boolindex
-(2 rows)
+         Skip scan: All
+(3 rows)
 
 explain (costs off)
   select * from boolindex where b order by i limit 10;
@@ -1965,8 +1977,9 @@ explain (costs off)
 -------------------------------------------------------
  Limit
    ->  Index Scan using boolindex_b_i_key on boolindex
+         Skip scan: All
          Index Cond: (b = true)
-(3 rows)
+(4 rows)
 
 explain (costs off)
   select * from boolindex where b = true order by i desc limit 10;
@@ -1974,8 +1987,9 @@ explain (costs off)
 ----------------------------------------------------------------
  Limit
    ->  Index Scan Backward using boolindex_b_i_key on boolindex
+         Skip scan: All
          Index Cond: (b = true)
-(3 rows)
+(4 rows)
 
 explain (costs off)
   select * from boolindex where not b order by i limit 10;
@@ -1983,8 +1997,9 @@ explain (costs off)
 -------------------------------------------------------
  Limit
    ->  Index Scan using boolindex_b_i_key on boolindex
+         Skip scan: All
          Index Cond: (b = false)
-(3 rows)
+(4 rows)
 
 explain (costs off)
   select * from boolindex where b is true order by i desc limit 10;
@@ -1992,8 +2007,9 @@ explain (costs off)
 ----------------------------------------------------------------
  Limit
    ->  Index Scan Backward using boolindex_b_i_key on boolindex
+         Skip scan: All
          Index Cond: (b = true)
-(3 rows)
+(4 rows)
 
 explain (costs off)
   select * from boolindex where b is false order by i desc limit 10;
@@ -2001,8 +2017,9 @@ explain (costs off)
 ----------------------------------------------------------------
  Limit
    ->  Index Scan Backward using boolindex_b_i_key on boolindex
+         Skip scan: All
          Index Cond: (b = false)
-(3 rows)
+(4 rows)
 
 --
 -- REINDEX (VERBOSE)
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..e163b8bf89 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -107,27 +107,30 @@ explain (costs off)
             QUERY PLAN             
 -----------------------------------
  Index Scan using ec0_pkey on ec0
+   Skip scan: All
    Index Cond: (ff = '42'::bigint)
    Filter: (f1 = '42'::bigint)
-(3 rows)
+(4 rows)
 
 explain (costs off)
   select * from ec0 where ff = f1 and f1 = '42'::int8alias1;
               QUERY PLAN               
 ---------------------------------------
  Index Scan using ec0_pkey on ec0
+   Skip scan: All
    Index Cond: (ff = '42'::int8alias1)
    Filter: (f1 = '42'::int8alias1)
-(3 rows)
+(4 rows)
 
 explain (costs off)
   select * from ec1 where ff = f1 and f1 = '42'::int8alias1;
               QUERY PLAN               
 ---------------------------------------
  Index Scan using ec1_pkey on ec1
+   Skip scan: All
    Index Cond: (ff = '42'::int8alias1)
    Filter: (f1 = '42'::int8alias1)
-(3 rows)
+(4 rows)
 
 explain (costs off)
   select * from ec1 where ff = f1 and f1 = '42'::int8alias2;
@@ -144,9 +147,10 @@ explain (costs off)
  Nested Loop
    Join Filter: (ec1.ff = ec2.x1)
    ->  Index Scan using ec1_pkey on ec1
+         Skip scan: All
          Index Cond: ((ff = '42'::bigint) AND (ff = '42'::bigint))
    ->  Seq Scan on ec2
-(5 rows)
+(6 rows)
 
 explain (costs off)
   select * from ec1, ec2 where ff = x1 and ff = '42'::int8alias1;
@@ -154,10 +158,11 @@ explain (costs off)
 ---------------------------------------------
  Nested Loop
    ->  Index Scan using ec1_pkey on ec1
+         Skip scan: All
          Index Cond: (ff = '42'::int8alias1)
    ->  Seq Scan on ec2
          Filter: (x1 = '42'::int8alias1)
-(5 rows)
+(6 rows)
 
 explain (costs off)
   select * from ec1, ec2 where ff = x1 and '42'::int8 = x1;
@@ -166,10 +171,11 @@ explain (costs off)
  Nested Loop
    Join Filter: (ec1.ff = ec2.x1)
    ->  Index Scan using ec1_pkey on ec1
+         Skip scan: All
          Index Cond: (ff = '42'::bigint)
    ->  Seq Scan on ec2
          Filter: ('42'::bigint = x1)
-(6 rows)
+(7 rows)
 
 explain (costs off)
   select * from ec1, ec2 where ff = x1 and x1 = '42'::int8alias1;
@@ -177,10 +183,11 @@ explain (costs off)
 ---------------------------------------------
  Nested Loop
    ->  Index Scan using ec1_pkey on ec1
+         Skip scan: All
          Index Cond: (ff = '42'::int8alias1)
    ->  Seq Scan on ec2
          Filter: (x1 = '42'::int8alias1)
-(5 rows)
+(6 rows)
 
 explain (costs off)
   select * from ec1, ec2 where ff = x1 and x1 = '42'::int8alias2;
@@ -190,8 +197,9 @@ explain (costs off)
    ->  Seq Scan on ec2
          Filter: (x1 = '42'::int8alias2)
    ->  Index Scan using ec1_pkey on ec1
+         Skip scan: All
          Index Cond: (ff = ec2.x1)
-(5 rows)
+(6 rows)
 
 create unique index ec1_expr1 on ec1((ff + 1));
 create unique index ec1_expr2 on ec1((ff + 2 + 1));
@@ -210,15 +218,19 @@ explain (costs off)
 -----------------------------------------------------
  Nested Loop
    ->  Index Scan using ec1_pkey on ec1
+         Skip scan: All
          Index Cond: (ff = '42'::bigint)
    ->  Append
          ->  Index Scan using ec1_expr2 on ec1 ec1_1
+               Skip scan: All
                Index Cond: (((ff + 2) + 1) = ec1.f1)
          ->  Index Scan using ec1_expr3 on ec1 ec1_2
+               Skip scan: All
                Index Cond: (((ff + 3) + 1) = ec1.f1)
          ->  Index Scan using ec1_expr4 on ec1 ec1_3
+               Skip scan: All
                Index Cond: ((ff + 4) = ec1.f1)
-(10 rows)
+(14 rows)
 
 explain (costs off)
   select * from ec1,
@@ -234,16 +246,20 @@ explain (costs off)
  Nested Loop
    Join Filter: ((((ec1_1.ff + 2) + 1)) = ec1.f1)
    ->  Index Scan using ec1_pkey on ec1
+         Skip scan: All
          Index Cond: ((ff = '42'::bigint) AND (ff = '42'::bigint))
          Filter: (ff = f1)
    ->  Append
          ->  Index Scan using ec1_expr2 on ec1 ec1_1
+               Skip scan: All
                Index Cond: (((ff + 2) + 1) = '42'::bigint)
          ->  Index Scan using ec1_expr3 on ec1 ec1_2
+               Skip scan: All
                Index Cond: (((ff + 3) + 1) = '42'::bigint)
          ->  Index Scan using ec1_expr4 on ec1 ec1_3
+               Skip scan: All
                Index Cond: ((ff + 4) = '42'::bigint)
-(12 rows)
+(16 rows)
 
 explain (costs off)
   select * from ec1,
@@ -265,22 +281,29 @@ explain (costs off)
  Nested Loop
    ->  Nested Loop
          ->  Index Scan using ec1_pkey on ec1
+               Skip scan: All
                Index Cond: (ff = '42'::bigint)
          ->  Append
                ->  Index Scan using ec1_expr2 on ec1 ec1_1
+                     Skip scan: All
                      Index Cond: (((ff + 2) + 1) = ec1.f1)
                ->  Index Scan using ec1_expr3 on ec1 ec1_2
+                     Skip scan: All
                      Index Cond: (((ff + 3) + 1) = ec1.f1)
                ->  Index Scan using ec1_expr4 on ec1 ec1_3
+                     Skip scan: All
                      Index Cond: ((ff + 4) = ec1.f1)
    ->  Append
          ->  Index Scan using ec1_expr2 on ec1 ec1_4
+               Skip scan: All
                Index Cond: (((ff + 2) + 1) = (((ec1_1.ff + 2) + 1)))
          ->  Index Scan using ec1_expr3 on ec1 ec1_5
+               Skip scan: All
                Index Cond: (((ff + 3) + 1) = (((ec1_1.ff + 2) + 1)))
          ->  Index Scan using ec1_expr4 on ec1 ec1_6
+               Skip scan: All
                Index Cond: ((ff + 4) = (((ec1_1.ff + 2) + 1)))
-(18 rows)
+(25 rows)
 
 -- let's try that as a mergejoin
 set enable_mergejoin = on;
@@ -307,21 +330,28 @@ explain (costs off)
    ->  Merge Append
          Sort Key: (((ec1_4.ff + 2) + 1))
          ->  Index Scan using ec1_expr2 on ec1 ec1_4
+               Skip scan: All
          ->  Index Scan using ec1_expr3 on ec1 ec1_5
+               Skip scan: All
          ->  Index Scan using ec1_expr4 on ec1 ec1_6
+               Skip scan: All
    ->  Materialize
          ->  Merge Join
                Merge Cond: ((((ec1_1.ff + 2) + 1)) = ec1.f1)
                ->  Merge Append
                      Sort Key: (((ec1_1.ff + 2) + 1))
                      ->  Index Scan using ec1_expr2 on ec1 ec1_1
+                           Skip scan: All
                      ->  Index Scan using ec1_expr3 on ec1 ec1_2
+                           Skip scan: All
                      ->  Index Scan using ec1_expr4 on ec1 ec1_3
+                           Skip scan: All
                ->  Sort
                      Sort Key: ec1.f1 USING <
                      ->  Index Scan using ec1_pkey on ec1
+                           Skip scan: All
                            Index Cond: (ff = '42'::bigint)
-(19 rows)
+(26 rows)
 
 -- check partially indexed scan
 set enable_nestloop = on;
@@ -340,15 +370,18 @@ explain (costs off)
 -----------------------------------------------------
  Nested Loop
    ->  Index Scan using ec1_pkey on ec1
+         Skip scan: All
          Index Cond: (ff = '42'::bigint)
    ->  Append
          ->  Index Scan using ec1_expr2 on ec1 ec1_1
+               Skip scan: All
                Index Cond: (((ff + 2) + 1) = ec1.f1)
          ->  Seq Scan on ec1 ec1_2
                Filter: (((ff + 3) + 1) = ec1.f1)
          ->  Index Scan using ec1_expr4 on ec1 ec1_3
+               Skip scan: All
                Index Cond: ((ff + 4) = ec1.f1)
-(10 rows)
+(13 rows)
 
 -- let's try that as a mergejoin
 set enable_mergejoin = on;
@@ -369,15 +402,18 @@ explain (costs off)
    ->  Merge Append
          Sort Key: (((ec1_1.ff + 2) + 1))
          ->  Index Scan using ec1_expr2 on ec1 ec1_1
+               Skip scan: All
          ->  Sort
                Sort Key: (((ec1_2.ff + 3) + 1))
                ->  Seq Scan on ec1 ec1_2
          ->  Index Scan using ec1_expr4 on ec1 ec1_3
+               Skip scan: All
    ->  Sort
          Sort Key: ec1.f1 USING <
          ->  Index Scan using ec1_pkey on ec1
+               Skip scan: All
                Index Cond: (ff = '42'::bigint)
-(13 rows)
+(16 rows)
 
 -- check effects of row-level security
 set enable_nestloop = on;
@@ -395,10 +431,12 @@ explain (costs off)
 ---------------------------------------------
  Nested Loop
    ->  Index Scan using ec0_pkey on ec0 a
+         Skip scan: All
          Index Cond: (ff = '43'::int8alias1)
    ->  Index Scan using ec1_pkey on ec1 b
+         Skip scan: All
          Index Cond: (ff = '43'::int8alias1)
-(5 rows)
+(7 rows)
 
 set session authorization regress_user_ectest;
 -- with RLS active, the non-leakproof a.ff = 43 clause is not treated
@@ -411,11 +449,13 @@ explain (costs off)
 ---------------------------------------------
  Nested Loop
    ->  Index Scan using ec0_pkey on ec0 a
+         Skip scan: All
          Index Cond: (ff = '43'::int8alias1)
    ->  Index Scan using ec1_pkey on ec1 b
+         Skip scan: All
          Index Cond: (ff = a.ff)
          Filter: (f1 < '5'::int8alias1)
-(6 rows)
+(8 rows)
 
 reset session authorization;
 revoke select on ec0 from regress_user_ectest;
diff --git a/src/test/regress/expected/fast_default.out b/src/test/regress/expected/fast_default.out
index 10bc5ff757..145145b71c 100644
--- a/src/test/regress/expected/fast_default.out
+++ b/src/test/regress/expected/fast_default.out
@@ -431,8 +431,9 @@ DELETE FROM T WHERE pk BETWEEN 10 AND 20 RETURNING *;
          Output: ctid
          Recheck Cond: ((t.pk >= 10) AND (t.pk <= 20))
          ->  Bitmap Index Scan on t_pkey
+               Skip scan: All
                Index Cond: ((t.pk >= 10) AND (t.pk <= 20))
-(7 rows)
+(8 rows)
 
 -- UPDATE
 UPDATE T SET c_text = '"' || c_text || '"'  WHERE pk < 10;
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 07bd5b6434..d9f5d6bfbe 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -1418,14 +1418,16 @@ explain (costs off) delete from t1 where a = 1;
  Delete on t2
    ->  Nested Loop
          ->  Index Scan using t1_pkey on t1
+               Skip scan: All
                Index Cond: (a = 1)
          ->  Seq Scan on t2
                Filter: (b = 1)
  
  Delete on t1
    ->  Index Scan using t1_pkey on t1
+         Skip scan: All
          Index Cond: (a = 1)
-(10 rows)
+(12 rows)
 
 delete from t1 where a = 1;
 -- Test a primary key with attributes located in later attnum positions
diff --git a/src/test/regress/expected/generated.out b/src/test/regress/expected/generated.out
index 620579a6fd..61b267f9aa 100644
--- a/src/test/regress/expected/generated.out
+++ b/src/test/regress/expected/generated.out
@@ -462,8 +462,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE b = 4;
                  QUERY PLAN                  
 ---------------------------------------------
  Index Scan using gtest22c_b_idx on gtest22c
+   Skip scan: All
    Index Cond: (b = 4)
-(2 rows)
+(3 rows)
 
 SELECT * FROM gtest22c WHERE b = 4;
  a | b 
@@ -475,8 +476,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE b * 3 = 6;
                    QUERY PLAN                   
 ------------------------------------------------
  Index Scan using gtest22c_expr_idx on gtest22c
+   Skip scan: All
    Index Cond: ((b * 3) = 6)
-(2 rows)
+(3 rows)
 
 SELECT * FROM gtest22c WHERE b * 3 = 6;
  a | b 
@@ -488,8 +490,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE a = 1 AND b > 0;
                    QUERY PLAN                   
 ------------------------------------------------
  Index Scan using gtest22c_pred_idx on gtest22c
+   Skip scan: All
    Index Cond: (a = 1)
-(2 rows)
+(3 rows)
 
 SELECT * FROM gtest22c WHERE a = 1 AND b > 0;
  a | b 
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index dbe5140b55..164451558c 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -466,8 +466,9 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan using tenk1_unique1 on tenk1
+                 Skip scan: All
                  Index Cond: (unique1 IS NOT NULL)
-(5 rows)
+(6 rows)
 
 -- Views with GROUPING SET queries
 CREATE VIEW gstest_view AS select a, b, grouping(a,b), sum(c), count(*), max(c)
@@ -1402,7 +1403,8 @@ EXPLAIN (COSTS OFF) SELECT a, b, count(*), max(a), max(b) FROM gstest3 GROUP BY
          Sort Key: b
            Group Key: b
          ->  Index Scan using gstest3_pkey on gstest3
-(8 rows)
+               Skip scan: All
+(9 rows)
 
 SELECT a, b, count(*), max(a), max(b) FROM gstest3 GROUP BY GROUPING SETS(a, b,()) ORDER BY a, b;
  a | b | count | max | max 
diff --git a/src/test/regress/expected/index_including.out b/src/test/regress/expected/index_including.out
index 8e5d53e712..4db90623f9 100644
--- a/src/test/regress/expected/index_including.out
+++ b/src/test/regress/expected/index_including.out
@@ -134,8 +134,9 @@ select * from tbl where (c1,c2,c3) < (2,5,1);
  Bitmap Heap Scan on tbl
    Filter: (ROW(c1, c2, c3) < ROW(2, 5, 1))
    ->  Bitmap Index Scan on covering
+         Skip scan: All
          Index Cond: (ROW(c1, c2) <= ROW(2, 5))
-(4 rows)
+(5 rows)
 
 select * from tbl where (c1,c2,c3) < (2,5,1);
  c1 | c2 | c3 | c4 
@@ -152,9 +153,10 @@ select * from tbl where (c1,c2,c3) < (262,1,1) limit 1;
 ----------------------------------------------------
  Limit
    ->  Index Only Scan using covering on tbl
+         Skip scan: All
          Index Cond: (ROW(c1, c2) <= ROW(262, 1))
          Filter: (ROW(c1, c2, c3) < ROW(262, 1, 1))
-(4 rows)
+(5 rows)
 
 select * from tbl where (c1,c2,c3) < (262,1,1) limit 1;
  c1 | c2 | c3 | c4 
diff --git a/src/test/regress/expected/inet.out b/src/test/regress/expected/inet.out
index 12df25fe9d..3d1fd73fd6 100644
--- a/src/test/regress/expected/inet.out
+++ b/src/test/regress/expected/inet.out
@@ -247,9 +247,10 @@ SELECT * FROM inet_tbl WHERE i<<'192.168.1.0/24'::cidr;
                                   QUERY PLAN                                   
 -------------------------------------------------------------------------------
  Index Scan using inet_idx1 on inet_tbl
+   Skip scan: All
    Index Cond: ((i > '192.168.1.0/24'::inet) AND (i <= '192.168.1.255'::inet))
    Filter: (i << '192.168.1.0/24'::inet)
-(3 rows)
+(4 rows)
 
 SELECT * FROM inet_tbl WHERE i<<'192.168.1.0/24'::cidr;
        c        |        i         
@@ -264,9 +265,10 @@ SELECT * FROM inet_tbl WHERE i<<='192.168.1.0/24'::cidr;
                                    QUERY PLAN                                   
 --------------------------------------------------------------------------------
  Index Scan using inet_idx1 on inet_tbl
+   Skip scan: All
    Index Cond: ((i >= '192.168.1.0/24'::inet) AND (i <= '192.168.1.255'::inet))
    Filter: (i <<= '192.168.1.0/24'::inet)
-(3 rows)
+(4 rows)
 
 SELECT * FROM inet_tbl WHERE i<<='192.168.1.0/24'::cidr;
        c        |        i         
@@ -284,9 +286,10 @@ SELECT * FROM inet_tbl WHERE '192.168.1.0/24'::cidr >>= i;
                                    QUERY PLAN                                   
 --------------------------------------------------------------------------------
  Index Scan using inet_idx1 on inet_tbl
+   Skip scan: All
    Index Cond: ((i >= '192.168.1.0/24'::inet) AND (i <= '192.168.1.255'::inet))
    Filter: ('192.168.1.0/24'::inet >>= i)
-(3 rows)
+(4 rows)
 
 SELECT * FROM inet_tbl WHERE '192.168.1.0/24'::cidr >>= i;
        c        |        i         
@@ -304,9 +307,10 @@ SELECT * FROM inet_tbl WHERE '192.168.1.0/24'::cidr >> i;
                                   QUERY PLAN                                   
 -------------------------------------------------------------------------------
  Index Scan using inet_idx1 on inet_tbl
+   Skip scan: All
    Index Cond: ((i > '192.168.1.0/24'::inet) AND (i <= '192.168.1.255'::inet))
    Filter: ('192.168.1.0/24'::inet >> i)
-(3 rows)
+(4 rows)
 
 SELECT * FROM inet_tbl WHERE '192.168.1.0/24'::cidr >> i;
        c        |        i         
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index dfd0ee414f..2f0be34a15 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1339,12 +1339,15 @@ select * from patest0 join (select f1 from int4_tbl limit 1) ss on id = f1;
          ->  Seq Scan on int4_tbl
    ->  Append
          ->  Index Scan using patest0i on patest0 patest0_1
+               Skip scan: All
                Index Cond: (id = int4_tbl.f1)
          ->  Index Scan using patest1i on patest1 patest0_2
+               Skip scan: All
                Index Cond: (id = int4_tbl.f1)
          ->  Index Scan using patest2i on patest2 patest0_3
+               Skip scan: All
                Index Cond: (id = int4_tbl.f1)
-(10 rows)
+(13 rows)
 
 select * from patest0 join (select f1 from int4_tbl limit 1) ss on id = f1;
  id | x | f1 
@@ -1364,12 +1367,14 @@ select * from patest0 join (select f1 from int4_tbl limit 1) ss on id = f1;
          ->  Seq Scan on int4_tbl
    ->  Append
          ->  Index Scan using patest0i on patest0 patest0_1
+               Skip scan: All
                Index Cond: (id = int4_tbl.f1)
          ->  Index Scan using patest1i on patest1 patest0_2
+               Skip scan: All
                Index Cond: (id = int4_tbl.f1)
          ->  Seq Scan on patest2 patest0_3
                Filter: (int4_tbl.f1 = id)
-(10 rows)
+(12 rows)
 
 select * from patest0 join (select f1 from int4_tbl limit 1) ss on id = f1;
  id | x | f1 
@@ -1466,8 +1471,10 @@ explain (verbose, costs off) select * from matest0 order by 1-id;
    Sort Key: ((1 - matest0.id))
    ->  Index Scan using matest0i on public.matest0 matest0_1
          Output: matest0_1.id, matest0_1.name, (1 - matest0_1.id)
+         Skip scan: All
    ->  Index Scan using matest1i on public.matest1 matest0_2
          Output: matest0_2.id, matest0_2.name, (1 - matest0_2.id)
+         Skip scan: All
    ->  Sort
          Output: matest0_3.id, matest0_3.name, ((1 - matest0_3.id))
          Sort Key: ((1 - matest0_3.id))
@@ -1475,7 +1482,8 @@ explain (verbose, costs off) select * from matest0 order by 1-id;
                Output: matest0_3.id, matest0_3.name, (1 - matest0_3.id)
    ->  Index Scan using matest3i on public.matest3 matest0_4
          Output: matest0_4.id, matest0_4.name, (1 - matest0_4.id)
-(13 rows)
+         Skip scan: All
+(16 rows)
 
 select * from matest0 order by 1-id;
  id |  name  
@@ -1502,9 +1510,11 @@ explain (verbose, costs off) select min(1-id) from matest0;
                        Sort Key: ((1 - matest0.id))
                        ->  Index Scan using matest0i on public.matest0 matest0_1
                              Output: matest0_1.id, (1 - matest0_1.id)
+                             Skip scan: All
                              Index Cond: ((1 - matest0_1.id) IS NOT NULL)
                        ->  Index Scan using matest1i on public.matest1 matest0_2
                              Output: matest0_2.id, (1 - matest0_2.id)
+                             Skip scan: All
                              Index Cond: ((1 - matest0_2.id) IS NOT NULL)
                        ->  Sort
                              Output: matest0_3.id, ((1 - matest0_3.id))
@@ -1513,10 +1523,12 @@ explain (verbose, costs off) select min(1-id) from matest0;
                                    Output: matest0_3.id, (1 - matest0_3.id)
                                    Filter: ((1 - matest0_3.id) IS NOT NULL)
                                    ->  Bitmap Index Scan on matest2_pkey
+                                         Skip scan: All
                        ->  Index Scan using matest3i on public.matest3 matest0_4
                              Output: matest0_4.id, (1 - matest0_4.id)
+                             Skip scan: All
                              Index Cond: ((1 - matest0_4.id) IS NOT NULL)
-(25 rows)
+(29 rows)
 
 select min(1-id) from matest0;
  min 
@@ -1552,15 +1564,19 @@ order by t1.b limit 10;
          ->  Merge Append
                Sort Key: t1.b
                ->  Index Scan using matest0i on matest0 t1_1
+                     Skip scan: All
                ->  Index Scan using matest1i on matest1 t1_2
+                     Skip scan: All
          ->  Materialize
                ->  Merge Append
                      Sort Key: t2.b
                      ->  Index Scan using matest0i on matest0 t2_1
+                           Skip scan: All
                            Filter: (c = d)
                      ->  Index Scan using matest1i on matest1 t2_2
+                           Skip scan: All
                            Filter: (c = d)
-(14 rows)
+(18 rows)
 
 reset enable_nestloop;
 drop table matest0 cascade;
@@ -1582,10 +1598,12 @@ ORDER BY thousand, tenthous;
  Merge Append
    Sort Key: tenk1.thousand, tenk1.tenthous
    ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+         Skip scan: All
    ->  Sort
          Sort Key: tenk1_1.thousand, tenk1_1.thousand
          ->  Index Only Scan using tenk1_thous_tenthous on tenk1 tenk1_1
-(6 rows)
+               Skip scan: All
+(8 rows)
 
 explain (costs off)
 SELECT thousand, tenthous, thousand+tenthous AS x FROM tenk1
@@ -1597,10 +1615,12 @@ ORDER BY thousand, tenthous;
  Merge Append
    Sort Key: tenk1.thousand, tenk1.tenthous
    ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+         Skip scan: All
    ->  Sort
          Sort Key: 42, 42
          ->  Index Only Scan using tenk1_hundred on tenk1 tenk1_1
-(6 rows)
+               Skip scan: All
+(8 rows)
 
 explain (costs off)
 SELECT thousand, tenthous FROM tenk1
@@ -1612,10 +1632,12 @@ ORDER BY thousand, tenthous;
  Merge Append
    Sort Key: tenk1.thousand, tenk1.tenthous
    ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+         Skip scan: All
    ->  Sort
          Sort Key: tenk1_1.thousand, ((random())::integer)
          ->  Index Only Scan using tenk1_thous_tenthous on tenk1 tenk1_1
-(6 rows)
+               Skip scan: All
+(8 rows)
 
 -- Check min/max aggregate optimization
 explain (costs off)
@@ -1631,10 +1653,12 @@ SELECT min(x) FROM
            ->  Merge Append
                  Sort Key: a.unique1
                  ->  Index Only Scan using tenk1_unique1 on tenk1 a
+                       Skip scan: All
                        Index Cond: (unique1 IS NOT NULL)
                  ->  Index Only Scan using tenk1_unique2 on tenk1 b
+                       Skip scan: All
                        Index Cond: (unique2 IS NOT NULL)
-(9 rows)
+(11 rows)
 
 explain (costs off)
 SELECT min(y) FROM
@@ -1649,10 +1673,12 @@ SELECT min(y) FROM
            ->  Merge Append
                  Sort Key: a.unique1
                  ->  Index Only Scan using tenk1_unique1 on tenk1 a
+                       Skip scan: All
                        Index Cond: (unique1 IS NOT NULL)
                  ->  Index Only Scan using tenk1_unique2 on tenk1 b
+                       Skip scan: All
                        Index Cond: (unique2 IS NOT NULL)
-(9 rows)
+(11 rows)
 
 -- XXX planner doesn't recognize that index on unique2 is sufficiently sorted
 explain (costs off)
@@ -1666,10 +1692,12 @@ ORDER BY x, y;
  Merge Append
    Sort Key: a.thousand, a.tenthous
    ->  Index Only Scan using tenk1_thous_tenthous on tenk1 a
+         Skip scan: All
    ->  Sort
          Sort Key: b.unique2, b.unique2
          ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(6 rows)
+               Skip scan: All
+(8 rows)
 
 -- exercise rescan code path via a repeatedly-evaluated subquery
 explain (costs off)
@@ -2069,12 +2097,14 @@ explain (costs off) select min(a), max(a) from parted_minmax where b = '12345';
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan using parted_minmax1i on parted_minmax1 parted_minmax
+                 Skip scan: All
                  Index Cond: ((a IS NOT NULL) AND (b = '12345'::text))
    InitPlan 2 (returns $1)
      ->  Limit
            ->  Index Only Scan Backward using parted_minmax1i on parted_minmax1 parted_minmax_1
+                 Skip scan: All
                  Index Cond: ((a IS NOT NULL) AND (b = '12345'::text))
-(9 rows)
+(11 rows)
 
 select min(a), max(a) from parted_minmax where b = '12345';
  min | max 
@@ -2093,13 +2123,20 @@ explain (costs off) select * from mcrparted order by a, abs(b), c;
  Merge Append
    Sort Key: mcrparted.a, (abs(mcrparted.b)), mcrparted.c
    ->  Index Scan using mcrparted0_a_abs_c_idx on mcrparted0 mcrparted_1
+         Skip scan: All
    ->  Index Scan using mcrparted1_a_abs_c_idx on mcrparted1 mcrparted_2
+         Skip scan: All
    ->  Index Scan using mcrparted2_a_abs_c_idx on mcrparted2 mcrparted_3
+         Skip scan: All
    ->  Index Scan using mcrparted3_a_abs_c_idx on mcrparted3 mcrparted_4
+         Skip scan: All
    ->  Index Scan using mcrparted4_a_abs_c_idx on mcrparted4 mcrparted_5
+         Skip scan: All
    ->  Index Scan using mcrparted5_a_abs_c_idx on mcrparted5 mcrparted_6
+         Skip scan: All
    ->  Index Scan using mcrparted_def_a_abs_c_idx on mcrparted_def mcrparted_7
-(9 rows)
+         Skip scan: All
+(16 rows)
 
 drop table mcrparted_def;
 -- Append is used for a RANGE partitioned table with no default
@@ -2109,12 +2146,18 @@ explain (costs off) select * from mcrparted order by a, abs(b), c;
 -------------------------------------------------------------------------
  Append
    ->  Index Scan using mcrparted0_a_abs_c_idx on mcrparted0 mcrparted_1
+         Skip scan: All
    ->  Index Scan using mcrparted1_a_abs_c_idx on mcrparted1 mcrparted_2
+         Skip scan: All
    ->  Index Scan using mcrparted2_a_abs_c_idx on mcrparted2 mcrparted_3
+         Skip scan: All
    ->  Index Scan using mcrparted3_a_abs_c_idx on mcrparted3 mcrparted_4
+         Skip scan: All
    ->  Index Scan using mcrparted4_a_abs_c_idx on mcrparted4 mcrparted_5
+         Skip scan: All
    ->  Index Scan using mcrparted5_a_abs_c_idx on mcrparted5 mcrparted_6
-(7 rows)
+         Skip scan: All
+(13 rows)
 
 -- Append is used with subpaths in reverse order with backwards index scans
 explain (costs off) select * from mcrparted order by a desc, abs(b) desc, c desc;
@@ -2122,12 +2165,18 @@ explain (costs off) select * from mcrparted order by a desc, abs(b) desc, c desc
 ----------------------------------------------------------------------------------
  Append
    ->  Index Scan Backward using mcrparted5_a_abs_c_idx on mcrparted5 mcrparted_6
+         Skip scan: All
    ->  Index Scan Backward using mcrparted4_a_abs_c_idx on mcrparted4 mcrparted_5
+         Skip scan: All
    ->  Index Scan Backward using mcrparted3_a_abs_c_idx on mcrparted3 mcrparted_4
+         Skip scan: All
    ->  Index Scan Backward using mcrparted2_a_abs_c_idx on mcrparted2 mcrparted_3
+         Skip scan: All
    ->  Index Scan Backward using mcrparted1_a_abs_c_idx on mcrparted1 mcrparted_2
+         Skip scan: All
    ->  Index Scan Backward using mcrparted0_a_abs_c_idx on mcrparted0 mcrparted_1
-(7 rows)
+         Skip scan: All
+(13 rows)
 
 -- check that Append plan is used containing a MergeAppend for sub-partitions
 -- that are unordered.
@@ -2140,15 +2189,22 @@ explain (costs off) select * from mcrparted order by a, abs(b), c;
 ---------------------------------------------------------------------------------------
  Append
    ->  Index Scan using mcrparted0_a_abs_c_idx on mcrparted0 mcrparted_1
+         Skip scan: All
    ->  Index Scan using mcrparted1_a_abs_c_idx on mcrparted1 mcrparted_2
+         Skip scan: All
    ->  Index Scan using mcrparted2_a_abs_c_idx on mcrparted2 mcrparted_3
+         Skip scan: All
    ->  Index Scan using mcrparted3_a_abs_c_idx on mcrparted3 mcrparted_4
+         Skip scan: All
    ->  Index Scan using mcrparted4_a_abs_c_idx on mcrparted4 mcrparted_5
+         Skip scan: All
    ->  Merge Append
          Sort Key: mcrparted_7.a, (abs(mcrparted_7.b)), mcrparted_7.c
          ->  Index Scan using mcrparted5a_a_abs_c_idx on mcrparted5a mcrparted_7
+               Skip scan: All
          ->  Index Scan using mcrparted5_def_a_abs_c_idx on mcrparted5_def mcrparted_8
-(10 rows)
+               Skip scan: All
+(17 rows)
 
 drop table mcrparted5_def;
 -- check that an Append plan is used and the sub-partitions are flattened
@@ -2159,12 +2215,18 @@ explain (costs off) select a, abs(b) from mcrparted order by a, abs(b), c;
 ---------------------------------------------------------------------------
  Append
    ->  Index Scan using mcrparted0_a_abs_c_idx on mcrparted0 mcrparted_1
+         Skip scan: All
    ->  Index Scan using mcrparted1_a_abs_c_idx on mcrparted1 mcrparted_2
+         Skip scan: All
    ->  Index Scan using mcrparted2_a_abs_c_idx on mcrparted2 mcrparted_3
+         Skip scan: All
    ->  Index Scan using mcrparted3_a_abs_c_idx on mcrparted3 mcrparted_4
+         Skip scan: All
    ->  Index Scan using mcrparted4_a_abs_c_idx on mcrparted4 mcrparted_5
+         Skip scan: All
    ->  Index Scan using mcrparted5a_a_abs_c_idx on mcrparted5a mcrparted_6
-(7 rows)
+         Skip scan: All
+(13 rows)
 
 -- check that Append is used when the sub-partitioned tables are pruned
 -- during planning.
@@ -2173,14 +2235,18 @@ explain (costs off) select * from mcrparted where a < 20 order by a, abs(b), c;
 -------------------------------------------------------------------------
  Append
    ->  Index Scan using mcrparted0_a_abs_c_idx on mcrparted0 mcrparted_1
+         Skip scan: All
          Index Cond: (a < 20)
    ->  Index Scan using mcrparted1_a_abs_c_idx on mcrparted1 mcrparted_2
+         Skip scan: All
          Index Cond: (a < 20)
    ->  Index Scan using mcrparted2_a_abs_c_idx on mcrparted2 mcrparted_3
+         Skip scan: All
          Index Cond: (a < 20)
    ->  Index Scan using mcrparted3_a_abs_c_idx on mcrparted3 mcrparted_4
+         Skip scan: All
          Index Cond: (a < 20)
-(9 rows)
+(13 rows)
 
 create table mclparted (a int) partition by list(a);
 create table mclparted1 partition of mclparted for values in(1);
@@ -2192,8 +2258,10 @@ explain (costs off) select * from mclparted order by a;
 ------------------------------------------------------------------------
  Append
    ->  Index Only Scan using mclparted1_a_idx on mclparted1 mclparted_1
+         Skip scan: All
    ->  Index Only Scan using mclparted2_a_idx on mclparted2 mclparted_2
-(3 rows)
+         Skip scan: All
+(5 rows)
 
 -- Ensure a MergeAppend is used when a partition exists with interleaved
 -- datums in the partition bound.
@@ -2205,10 +2273,14 @@ explain (costs off) select * from mclparted order by a;
  Merge Append
    Sort Key: mclparted.a
    ->  Index Only Scan using mclparted1_a_idx on mclparted1 mclparted_1
+         Skip scan: All
    ->  Index Only Scan using mclparted2_a_idx on mclparted2 mclparted_2
+         Skip scan: All
    ->  Index Only Scan using mclparted3_5_a_idx on mclparted3_5 mclparted_3
+         Skip scan: All
    ->  Index Only Scan using mclparted4_a_idx on mclparted4 mclparted_4
-(6 rows)
+         Skip scan: All
+(10 rows)
 
 drop table mclparted;
 -- Ensure subplans which don't have a path with the correct pathkeys get
@@ -2228,12 +2300,15 @@ explain (costs off) select * from mcrparted where a < 20 order by a, abs(b), c l
                ->  Seq Scan on mcrparted0 mcrparted_1
                      Filter: (a < 20)
          ->  Index Scan using mcrparted1_a_abs_c_idx on mcrparted1 mcrparted_2
+               Skip scan: All
                Index Cond: (a < 20)
          ->  Index Scan using mcrparted2_a_abs_c_idx on mcrparted2 mcrparted_3
+               Skip scan: All
                Index Cond: (a < 20)
          ->  Index Scan using mcrparted3_a_abs_c_idx on mcrparted3 mcrparted_4
+               Skip scan: All
                Index Cond: (a < 20)
-(12 rows)
+(15 rows)
 
 set enable_bitmapscan = 0;
 -- Ensure Append node can be used when the partition is ordered by some
@@ -2243,10 +2318,12 @@ explain (costs off) select * from mcrparted where a = 10 order by a, abs(b), c;
 -------------------------------------------------------------------------
  Append
    ->  Index Scan using mcrparted1_a_abs_c_idx on mcrparted1 mcrparted_1
+         Skip scan: All
          Index Cond: (a = 10)
    ->  Index Scan using mcrparted2_a_abs_c_idx on mcrparted2 mcrparted_2
+         Skip scan: All
          Index Cond: (a = 10)
-(5 rows)
+(7 rows)
 
 reset enable_bitmapscan;
 drop table mcrparted;
@@ -2260,8 +2337,10 @@ explain (costs off) select * from bool_lp order by b;
 ----------------------------------------------------------------------------
  Append
    ->  Index Only Scan using bool_lp_false_b_idx on bool_lp_false bool_lp_1
+         Skip scan: All
    ->  Index Only Scan using bool_lp_true_b_idx on bool_lp_true bool_lp_2
-(3 rows)
+         Skip scan: All
+(5 rows)
 
 drop table bool_lp;
 -- Ensure const bool quals can be properly detected as redundant
@@ -2276,40 +2355,48 @@ explain (costs off) select * from bool_rp where b = true order by b,a;
 ----------------------------------------------------------------------------------
  Append
    ->  Index Only Scan using bool_rp_true_1k_b_a_idx on bool_rp_true_1k bool_rp_1
+         Skip scan: All
          Index Cond: (b = true)
    ->  Index Only Scan using bool_rp_true_2k_b_a_idx on bool_rp_true_2k bool_rp_2
+         Skip scan: All
          Index Cond: (b = true)
-(5 rows)
+(7 rows)
 
 explain (costs off) select * from bool_rp where b = false order by b,a;
                                      QUERY PLAN                                     
 ------------------------------------------------------------------------------------
  Append
    ->  Index Only Scan using bool_rp_false_1k_b_a_idx on bool_rp_false_1k bool_rp_1
+         Skip scan: All
          Index Cond: (b = false)
    ->  Index Only Scan using bool_rp_false_2k_b_a_idx on bool_rp_false_2k bool_rp_2
+         Skip scan: All
          Index Cond: (b = false)
-(5 rows)
+(7 rows)
 
 explain (costs off) select * from bool_rp where b = true order by a;
                                     QUERY PLAN                                    
 ----------------------------------------------------------------------------------
  Append
    ->  Index Only Scan using bool_rp_true_1k_b_a_idx on bool_rp_true_1k bool_rp_1
+         Skip scan: All
          Index Cond: (b = true)
    ->  Index Only Scan using bool_rp_true_2k_b_a_idx on bool_rp_true_2k bool_rp_2
+         Skip scan: All
          Index Cond: (b = true)
-(5 rows)
+(7 rows)
 
 explain (costs off) select * from bool_rp where b = false order by a;
                                      QUERY PLAN                                     
 ------------------------------------------------------------------------------------
  Append
    ->  Index Only Scan using bool_rp_false_1k_b_a_idx on bool_rp_false_1k bool_rp_1
+         Skip scan: All
          Index Cond: (b = false)
    ->  Index Only Scan using bool_rp_false_2k_b_a_idx on bool_rp_false_2k bool_rp_2
+         Skip scan: All
          Index Cond: (b = false)
-(5 rows)
+(7 rows)
 
 drop table bool_rp;
 -- Ensure an Append scan is chosen when the partition order is a subset of
@@ -2323,16 +2410,20 @@ explain (costs off) select * from range_parted order by a,b,c;
 -------------------------------------------------------------------------------------
  Append
    ->  Index Only Scan using range_parted1_a_b_c_idx on range_parted1 range_parted_1
+         Skip scan: All
    ->  Index Only Scan using range_parted2_a_b_c_idx on range_parted2 range_parted_2
-(3 rows)
+         Skip scan: All
+(5 rows)
 
 explain (costs off) select * from range_parted order by a desc,b desc,c desc;
                                           QUERY PLAN                                          
 ----------------------------------------------------------------------------------------------
  Append
    ->  Index Only Scan Backward using range_parted2_a_b_c_idx on range_parted2 range_parted_2
+         Skip scan: All
    ->  Index Only Scan Backward using range_parted1_a_b_c_idx on range_parted1 range_parted_1
-(3 rows)
+         Skip scan: All
+(5 rows)
 
 drop table range_parted;
 -- Check that we allow access to a child table's statistics when the user
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index 1338b2b23e..e7eae05abf 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -54,10 +54,11 @@ explain (costs off) insert into insertconflicttest values(0, 'Crowberry') on con
    ->  Result
    SubPlan 1
      ->  Index Only Scan using both_index_expr_key on insertconflicttest ii
+           Skip scan: All
            Index Cond: (key = excluded.key)
    SubPlan 2
      ->  Seq Scan on insertconflicttest ii_1
-(10 rows)
+(11 rows)
 
 -- Neither collation nor operator class specifications are required --
 -- supplying them merely *limits* matches to indexes with matching opclasses
diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out
index f772909e49..09131c5933 100644
--- a/src/test/regress/expected/interval.out
+++ b/src/test/regress/expected/interval.out
@@ -260,7 +260,8 @@ SELECT f1 FROM INTERVAL_TBL_OF r1 ORDER BY f1;
                              QUERY PLAN                             
 --------------------------------------------------------------------
  Index Only Scan using interval_tbl_of_f1_idx on interval_tbl_of r1
-(1 row)
+   Skip scan: All
+(2 rows)
 
 SELECT f1 FROM INTERVAL_TBL_OF r1 ORDER BY f1;
                     f1                     
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 761376b007..789cb32585 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -1857,16 +1857,16 @@ select * from int4_tbl i4, tenk1 a
 where exists(select * from tenk1 b
              where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
       and i4.f1 = a.tenthous;
-                  QUERY PLAN                  
-----------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Hash Semi Join
    Hash Cond: (a.twothousand = b.twothousand)
    Join Filter: (a.fivethous <> b.fivethous)
-   ->  Hash Join
-         Hash Cond: (a.tenthous = i4.f1)
-         ->  Seq Scan on tenk1 a
-         ->  Hash
-               ->  Seq Scan on int4_tbl i4
+   ->  Nested Loop
+         ->  Seq Scan on int4_tbl i4
+         ->  Index Scan using tenk1_thous_tenthous on tenk1 a
+               Skip scan: All
+               Index Cond: (tenthous = i4.f1)
    ->  Hash
          ->  Seq Scan on tenk1 b
 (10 rows)
@@ -2293,8 +2293,9 @@ where b.f1 = t.thousand and a.f1 = b.f1 and (a.f1+b.f1+999) = t.tenthous;
          ->  Aggregate
                ->  Seq Scan on int4_tbl i4a
          ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t
+               Skip scan: All
                Index Cond: ((thousand = (sum(i4b.f1))) AND (tenthous = ((((sum(i4a.f1) + 1)) + (sum(i4b.f1))) + 999)))
-(9 rows)
+(10 rows)
 
 select a.f1, b.f1, t.thousand, t.tenthous from
   tenk1 t,
@@ -2373,7 +2374,8 @@ select count(*) from
                ->  Seq Scan on tenk1 x
          ->  Materialize
                ->  Index Scan using tenk1_unique2 on tenk1 y
-(9 rows)
+                     Skip scan: All
+(10 rows)
 
 select count(*) from
   (select * from tenk1 x order by x.thousand, x.twothousand, x.fivethous) x
@@ -2493,10 +2495,11 @@ select count(*) from tenk1 a, tenk1 b
    ->  Hash Join
          Hash Cond: (a.hundred = b.thousand)
          ->  Index Only Scan using tenk1_hundred on tenk1 a
+               Skip scan: All
          ->  Hash
                ->  Seq Scan on tenk1 b
                      Filter: ((fivethous % 10) < 10)
-(7 rows)
+(8 rows)
 
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2702,9 +2705,11 @@ select a.idv, b.idv from tidv a, tidv b where a.idv = b.idv;
  Merge Join
    Merge Cond: (a.idv = b.idv)
    ->  Index Only Scan using tidv_idv_idx on tidv a
+         Skip scan: All
    ->  Materialize
          ->  Index Only Scan using tidv_idv_idx on tidv b
-(5 rows)
+               Skip scan: All
+(7 rows)
 
 set enable_mergejoin = 0;
 explain (costs off)
@@ -2714,8 +2719,9 @@ select a.idv, b.idv from tidv a, tidv b where a.idv = b.idv;
  Nested Loop
    ->  Seq Scan on tidv a
    ->  Index Only Scan using tidv_idv_idx on tidv b
+         Skip scan: All
          Index Cond: (idv = a.idv)
-(4 rows)
+(5 rows)
 
 rollback;
 --
@@ -2874,8 +2880,9 @@ SELECT qq, unique1
          ->  Hash
                ->  Seq Scan on int8_tbl b
    ->  Index Scan using tenk1_unique2 on tenk1 c
+         Skip scan: All
          Index Cond: (unique2 = COALESCE((COALESCE(a.q1, '0'::bigint)), (COALESCE(b.q2, '-1'::bigint))))
-(8 rows)
+(9 rows)
 
 SELECT qq, unique1
   FROM
@@ -2938,13 +2945,16 @@ where nt3.id = 1 and ss2.b3;
  Nested Loop
    ->  Nested Loop
          ->  Index Scan using nt3_pkey on nt3
+               Skip scan: All
                Index Cond: (id = 1)
          ->  Index Scan using nt2_pkey on nt2
+               Skip scan: All
                Index Cond: (id = nt3.nt2_id)
    ->  Index Only Scan using nt1_pkey on nt1
+         Skip scan: All
          Index Cond: (id = nt2.nt1_id)
          Filter: (nt2.b1 AND (id IS NOT NULL))
-(9 rows)
+(12 rows)
 
 select nt3.id
 from nt3 as nt3
@@ -3081,12 +3091,14 @@ where q1 = thousand or q2 = thousand;
                Recheck Cond: ((q1.q1 = thousand) OR (q2.q2 = thousand))
                ->  BitmapOr
                      ->  Bitmap Index Scan on tenk1_thous_tenthous
+                           Skip scan: All
                            Index Cond: (thousand = q1.q1)
                      ->  Bitmap Index Scan on tenk1_thous_tenthous
+                           Skip scan: All
                            Index Cond: (thousand = q2.q2)
    ->  Hash
          ->  Seq Scan on int4_tbl
-(15 rows)
+(17 rows)
 
 explain (costs off)
 select * from
@@ -3104,10 +3116,11 @@ where thousand = (q1 + q2);
          ->  Bitmap Heap Scan on tenk1
                Recheck Cond: (thousand = (q1.q1 + q2.q2))
                ->  Bitmap Index Scan on tenk1_thous_tenthous
+                     Skip scan: All
                      Index Cond: (thousand = (q1.q1 + q2.q2))
    ->  Hash
          ->  Seq Scan on int4_tbl
-(12 rows)
+(13 rows)
 
 --
 -- test ability to generate a suitable plan for a star-schema query
@@ -3116,17 +3129,19 @@ explain (costs off)
 select * from
   tenk1, int8_tbl a, int8_tbl b
 where thousand = a.q1 and tenthous = b.q1 and a.q2 = 1 and b.q2 = 2;
-                             QUERY PLAN                              
----------------------------------------------------------------------
+                         QUERY PLAN                         
+------------------------------------------------------------
  Nested Loop
-   ->  Seq Scan on int8_tbl b
-         Filter: (q2 = 2)
+   Join Filter: (tenk1.thousand = a.q1)
    ->  Nested Loop
-         ->  Seq Scan on int8_tbl a
-               Filter: (q2 = 1)
+         ->  Seq Scan on int8_tbl b
+               Filter: (q2 = 2)
          ->  Index Scan using tenk1_thous_tenthous on tenk1
-               Index Cond: ((thousand = a.q1) AND (tenthous = b.q1))
-(8 rows)
+               Skip scan: All
+               Index Cond: (tenthous = b.q1)
+   ->  Seq Scan on int8_tbl a
+         Filter: (q2 = 1)
+(10 rows)
 
 --
 -- test a corner case in which we shouldn't apply the star-schema optimization
@@ -3154,12 +3169,14 @@ where t1.unique2 < 42 and t1.stringu1 > t2.stringu2;
                      ->  Seq Scan on onerow
                      ->  Seq Scan on onerow onerow_1
                ->  Index Scan using tenk1_unique2 on tenk1 t1
+                     Skip scan: All
                      Index Cond: ((unique2 = (11)) AND (unique2 < 42))
          ->  Index Scan using tenk1_unique1 on tenk1 t2
+               Skip scan: All
                Index Cond: (unique1 = (3))
    ->  Seq Scan on int4_tbl i1
          Filter: (f1 = 0)
-(13 rows)
+(15 rows)
 
 select t1.unique2, t1.stringu1, t2.unique1, t2.stringu2 from
   tenk1 t1
@@ -3220,10 +3237,12 @@ where t1.unique2 < 42 and t1.stringu1 > t2.stringu2;
          ->  Seq Scan on int4_tbl i1
                Filter: (f1 = 0)
          ->  Index Scan using tenk1_unique2 on tenk1 t1
+               Skip scan: All
                Index Cond: ((unique2 = (11)) AND (unique2 < 42))
    ->  Index Scan using tenk1_unique1 on tenk1 t2
+         Skip scan: All
          Index Cond: (unique1 = (3))
-(9 rows)
+(11 rows)
 
 select t1.unique2, t1.stringu1, t2.unique1, t2.stringu2 from
   tenk1 t1
@@ -3280,8 +3299,9 @@ where x = unique1;
                   QUERY PLAN                  
 ----------------------------------------------
  Index Only Scan using tenk1_unique1 on tenk1
+   Skip scan: All
    Index Cond: (unique1 = 1)
-(2 rows)
+(3 rows)
 
 explain (verbose, costs off)
 select unique1, x.*
@@ -3295,32 +3315,36 @@ where x = unique1;
          Output: 1, random()
    ->  Index Only Scan using tenk1_unique1 on public.tenk1
          Output: tenk1.unique1
+         Skip scan: All
          Index Cond: (tenk1.unique1 = (1))
-(7 rows)
+(8 rows)
 
 explain (costs off)
 select unique1 from tenk1, f_immutable_int4(1) x where x = unique1;
                   QUERY PLAN                  
 ----------------------------------------------
  Index Only Scan using tenk1_unique1 on tenk1
+   Skip scan: All
    Index Cond: (unique1 = 1)
-(2 rows)
+(3 rows)
 
 explain (costs off)
 select unique1 from tenk1, lateral f_immutable_int4(1) x where x = unique1;
                   QUERY PLAN                  
 ----------------------------------------------
  Index Only Scan using tenk1_unique1 on tenk1
+   Skip scan: All
    Index Cond: (unique1 = 1)
-(2 rows)
+(3 rows)
 
 explain (costs off)
 select unique1, x from tenk1 join f_immutable_int4(1) x on unique1 = x;
                   QUERY PLAN                  
 ----------------------------------------------
  Index Only Scan using tenk1_unique1 on tenk1
+   Skip scan: All
    Index Cond: (unique1 = 1)
-(2 rows)
+(3 rows)
 
 explain (costs off)
 select unique1, x from tenk1 left join f_immutable_int4(1) x on unique1 = x;
@@ -3329,9 +3353,10 @@ select unique1, x from tenk1 left join f_immutable_int4(1) x on unique1 = x;
  Nested Loop Left Join
    Join Filter: (tenk1.unique1 = 1)
    ->  Index Only Scan using tenk1_unique1 on tenk1
+         Skip scan: All
    ->  Materialize
          ->  Result
-(5 rows)
+(6 rows)
 
 explain (costs off)
 select unique1, x from tenk1 right join f_immutable_int4(1) x on unique1 = x;
@@ -3340,8 +3365,9 @@ select unique1, x from tenk1 right join f_immutable_int4(1) x on unique1 = x;
  Nested Loop Left Join
    ->  Result
    ->  Index Only Scan using tenk1_unique1 on tenk1
+         Skip scan: All
          Index Cond: (unique1 = 1)
-(4 rows)
+(5 rows)
 
 explain (costs off)
 select unique1, x from tenk1 full join f_immutable_int4(1) x on unique1 = x;
@@ -3350,10 +3376,11 @@ select unique1, x from tenk1 full join f_immutable_int4(1) x on unique1 = x;
  Merge Full Join
    Merge Cond: (tenk1.unique1 = (1))
    ->  Index Only Scan using tenk1_unique1 on tenk1
+         Skip scan: All
    ->  Sort
          Sort Key: (1)
          ->  Result
-(6 rows)
+(7 rows)
 
 -- check that pullup of a const function allows further const-folding
 explain (costs off)
@@ -3382,13 +3409,15 @@ where nt3.id = 1 and ss2.b3;
  Nested Loop Left Join
    Filter: ((nt2.b1 OR ((0) = 42)))
    ->  Index Scan using nt3_pkey on nt3
+         Skip scan: All
          Index Cond: (id = 1)
    ->  Nested Loop Left Join
          Join Filter: (0 = nt2.nt1_id)
          ->  Index Scan using nt2_pkey on nt2
+               Skip scan: All
                Index Cond: (id = nt3.nt2_id)
          ->  Result
-(9 rows)
+(11 rows)
 
 drop function f_immutable_int4(int);
 -- test inlining when function returns composite
@@ -3443,18 +3472,22 @@ select * from tenk1 a join tenk1 b on
          Recheck Cond: ((unique1 = 2) OR (hundred = 4))
          ->  BitmapOr
                ->  Bitmap Index Scan on tenk1_unique1
+                     Skip scan: All
                      Index Cond: (unique1 = 2)
                ->  Bitmap Index Scan on tenk1_hundred
+                     Skip scan: All
                      Index Cond: (hundred = 4)
    ->  Materialize
          ->  Bitmap Heap Scan on tenk1 a
                Recheck Cond: ((unique1 = 1) OR (unique2 = 3))
                ->  BitmapOr
                      ->  Bitmap Index Scan on tenk1_unique1
+                           Skip scan: All
                            Index Cond: (unique1 = 1)
                      ->  Bitmap Index Scan on tenk1_unique2
+                           Skip scan: All
                            Index Cond: (unique2 = 3)
-(17 rows)
+(21 rows)
 
 explain (costs off)
 select * from tenk1 a join tenk1 b on
@@ -3470,10 +3503,12 @@ select * from tenk1 a join tenk1 b on
                Recheck Cond: ((unique1 = 1) OR (unique2 = 3))
                ->  BitmapOr
                      ->  Bitmap Index Scan on tenk1_unique1
+                           Skip scan: All
                            Index Cond: (unique1 = 1)
                      ->  Bitmap Index Scan on tenk1_unique2
+                           Skip scan: All
                            Index Cond: (unique2 = 3)
-(12 rows)
+(14 rows)
 
 explain (costs off)
 select * from tenk1 a join tenk1 b on
@@ -3487,20 +3522,25 @@ select * from tenk1 a join tenk1 b on
          Recheck Cond: ((unique1 = 2) OR (hundred = 4))
          ->  BitmapOr
                ->  Bitmap Index Scan on tenk1_unique1
+                     Skip scan: All
                      Index Cond: (unique1 = 2)
                ->  Bitmap Index Scan on tenk1_hundred
+                     Skip scan: All
                      Index Cond: (hundred = 4)
    ->  Materialize
          ->  Bitmap Heap Scan on tenk1 a
                Recheck Cond: ((unique1 = 1) OR (unique2 = 3) OR (unique2 = 7))
                ->  BitmapOr
                      ->  Bitmap Index Scan on tenk1_unique1
+                           Skip scan: All
                            Index Cond: (unique1 = 1)
                      ->  Bitmap Index Scan on tenk1_unique2
+                           Skip scan: All
                            Index Cond: (unique2 = 3)
                      ->  Bitmap Index Scan on tenk1_unique2
+                           Skip scan: All
                            Index Cond: (unique2 = 7)
-(19 rows)
+(24 rows)
 
 --
 -- test placement of movable quals in a parameterized join tree
@@ -3514,16 +3554,19 @@ where t1.unique1 = 1;
 --------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
+         Skip scan: All
          Index Cond: (unique1 = 1)
    ->  Nested Loop
          Join Filter: (t1.ten = t3.ten)
          ->  Bitmap Heap Scan on tenk1 t2
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
+                     Skip scan: All
                      Index Cond: (hundred = t1.hundred)
          ->  Index Scan using tenk1_unique2 on tenk1 t3
+               Skip scan: All
                Index Cond: (unique2 = t2.thousand)
-(11 rows)
+(14 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
@@ -3534,16 +3577,19 @@ where t1.unique1 = 1;
 --------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
+         Skip scan: All
          Index Cond: (unique1 = 1)
    ->  Nested Loop
          Join Filter: ((t1.ten + t2.ten) = t3.ten)
          ->  Bitmap Heap Scan on tenk1 t2
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
+                     Skip scan: All
                      Index Cond: (hundred = t1.hundred)
          ->  Index Scan using tenk1_unique2 on tenk1 t3
+               Skip scan: All
                Index Cond: (unique2 = t2.thousand)
-(11 rows)
+(14 rows)
 
 explain (costs off)
 select count(*) from
@@ -3561,12 +3607,15 @@ select count(*) from
                      ->  Bitmap Heap Scan on tenk1 b
                            Recheck Cond: (thousand = int4_tbl.f1)
                            ->  Bitmap Index Scan on tenk1_thous_tenthous
+                                 Skip scan: All
                                  Index Cond: (thousand = int4_tbl.f1)
                ->  Index Scan using tenk1_unique1 on tenk1 a
+                     Skip scan: All
                      Index Cond: (unique1 = b.unique2)
          ->  Index Only Scan using tenk1_thous_tenthous on tenk1 c
+               Skip scan: All
                Index Cond: (thousand = a.thousand)
-(14 rows)
+(17 rows)
 
 select count(*) from
   tenk1 a join tenk1 b on a.unique1 = b.unique2
@@ -3584,24 +3633,28 @@ select b.unique1 from
   join int4_tbl i1 on b.thousand = f1
   right join int4_tbl i2 on i2.f1 = b.tenthous
   order by 1;
-                                       QUERY PLAN                                        
------------------------------------------------------------------------------------------
+                                   QUERY PLAN                                   
+--------------------------------------------------------------------------------
  Sort
    Sort Key: b.unique1
    ->  Nested Loop Left Join
          ->  Seq Scan on int4_tbl i2
-         ->  Nested Loop Left Join
-               Join Filter: (b.unique1 = 42)
-               ->  Nested Loop
+         ->  Nested Loop
+               Join Filter: (b.thousand = i1.f1)
+               ->  Nested Loop Left Join
+                     Join Filter: (b.unique1 = 42)
                      ->  Nested Loop
-                           ->  Seq Scan on int4_tbl i1
                            ->  Index Scan using tenk1_thous_tenthous on tenk1 b
-                                 Index Cond: ((thousand = i1.f1) AND (tenthous = i2.f1))
-                     ->  Index Scan using tenk1_unique1 on tenk1 a
-                           Index Cond: (unique1 = b.unique2)
-               ->  Index Only Scan using tenk1_thous_tenthous on tenk1 c
-                     Index Cond: (thousand = a.thousand)
-(15 rows)
+                                 Skip scan: All
+                                 Index Cond: (tenthous = i2.f1)
+                           ->  Index Scan using tenk1_unique1 on tenk1 a
+                                 Skip scan: All
+                                 Index Cond: (unique1 = b.unique2)
+                     ->  Index Only Scan using tenk1_thous_tenthous on tenk1 c
+                           Skip scan: All
+                           Index Cond: (thousand = a.thousand)
+               ->  Seq Scan on int4_tbl i1
+(19 rows)
 
 select b.unique1 from
   tenk1 a join tenk1 b on a.unique1 = b.unique2
@@ -3632,8 +3685,9 @@ order by fault;
    Filter: ((COALESCE(tenk1.unique1, '-1'::integer) + int8_tbl.q1) = 122)
    ->  Seq Scan on int8_tbl
    ->  Index Scan using tenk1_unique2 on tenk1
+         Skip scan: All
          Index Cond: (unique2 = int8_tbl.q2)
-(5 rows)
+(6 rows)
 
 select * from
 (
@@ -3687,8 +3741,9 @@ select q1, unique2, thousand, hundred
    Filter: ((COALESCE(b.thousand, 123) = a.q1) AND (a.q1 = COALESCE(b.hundred, 123)))
    ->  Seq Scan on int8_tbl a
    ->  Index Scan using tenk1_unique2 on tenk1 b
+         Skip scan: All
          Index Cond: (unique2 = a.q1)
-(5 rows)
+(6 rows)
 
 select q1, unique2, thousand, hundred
   from int8_tbl a left join tenk1 b on q1 = unique2
@@ -3707,8 +3762,9 @@ select f1, unique2, case when unique2 is null then f1 else 0 end
    Filter: (CASE WHEN (b.unique2 IS NULL) THEN a.f1 ELSE 0 END = 0)
    ->  Seq Scan on int4_tbl a
    ->  Index Only Scan using tenk1_unique2 on tenk1 b
+         Skip scan: All
          Index Cond: (unique2 = a.f1)
-(5 rows)
+(6 rows)
 
 select f1, unique2, case when unique2 is null then f1 else 0 end
   from int4_tbl a left join tenk1 b on f1 = unique2
@@ -3731,14 +3787,17 @@ select a.unique1, b.unique1, c.unique1, coalesce(b.twothousand, a.twothousand)
    ->  Nested Loop Left Join
          Filter: (COALESCE(b.twothousand, a.twothousand) = 44)
          ->  Index Scan using tenk1_unique2 on tenk1 a
+               Skip scan: All
                Index Cond: (unique2 < 10)
          ->  Bitmap Heap Scan on tenk1 b
                Recheck Cond: (thousand = a.unique1)
                ->  Bitmap Index Scan on tenk1_thous_tenthous
+                     Skip scan: All
                      Index Cond: (thousand = a.unique1)
    ->  Index Scan using tenk1_unique2 on tenk1 c
+         Skip scan: All
          Index Cond: ((unique2 = COALESCE(b.twothousand, a.twothousand)) AND (unique2 = 44))
-(11 rows)
+(14 rows)
 
 select a.unique1, b.unique1, c.unique1, coalesce(b.twothousand, a.twothousand)
   from tenk1 a left join tenk1 b on b.thousand = a.unique1                        left join tenk1 c on c.unique2 = coalesce(b.twothousand, a.twothousand)
@@ -3778,8 +3837,9 @@ using (join_key);
                      Output: i1.f1
                ->  Index Only Scan using tenk1_unique2 on public.tenk1 i2
                      Output: i2.unique2
+                     Skip scan: All
                      Index Cond: (i2.unique2 = i1.f1)
-(14 rows)
+(15 rows)
 
 select foo1.join_key as foo1_id, foo3.join_key AS foo3_id, bug_field from
   (values (0),(1)) foo1(join_key)
@@ -4281,8 +4341,9 @@ explain (costs off)
    ->  Seq Scan on int4_tbl a
          Filter: (f1 = 0)
    ->  Index Scan using tenk1_unique2 on tenk1 b
+         Skip scan: All
          Index Cond: (unique2 = 0)
-(6 rows)
+(7 rows)
 
 explain (costs off)
   select * from tenk1 a full join tenk1 b using(unique2) where unique2 = 42;
@@ -4291,10 +4352,12 @@ explain (costs off)
  Merge Full Join
    Merge Cond: (a.unique2 = b.unique2)
    ->  Index Scan using tenk1_unique2 on tenk1 a
+         Skip scan: All
          Index Cond: (unique2 = 42)
    ->  Index Scan using tenk1_unique2 on tenk1 b
+         Skip scan: All
          Index Cond: (unique2 = 42)
-(6 rows)
+(8 rows)
 
 --
 -- test that quals attached to an outer join have correct semantics,
@@ -4424,10 +4487,11 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
    ->  Group
          Group Key: b.id
          ->  Index Scan using b_pkey on b
+               Skip scan: All
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
-(8 rows)
+(9 rows)
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
@@ -4539,8 +4603,9 @@ select p.* from
  Result
    One-Time Filter: false
    ->  Index Scan using parent_pkey on parent p
+         Skip scan: All
          Index Cond: (k = 1)
-(4 rows)
+(5 rows)
 
 select p.* from
   (parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
@@ -4632,11 +4697,12 @@ where ss.stringu2 !~* ss.case1;
    ->  Nested Loop
          ->  Seq Scan on int4_tbl i4
          ->  Index Scan using tenk1_unique2 on tenk1 t1
+               Skip scan: All
                Index Cond: (unique2 = i4.f1)
                Filter: (stringu2 !~* CASE ten WHEN 0 THEN 'doh!'::text ELSE NULL::text END)
    ->  Materialize
          ->  Seq Scan on text_tbl t0
-(9 rows)
+(10 rows)
 
 select t0.*
 from
@@ -4723,8 +4789,9 @@ explain (costs off)
  Nested Loop
    ->  Seq Scan on int4_tbl b
    ->  Index Scan using tenk1_unique1 on tenk1 a
+         Skip scan: All
          Index Cond: (unique1 = b.f1)
-(4 rows)
+(5 rows)
 
 select unique2, x.*
 from int4_tbl x, lateral (select unique2 from tenk1 where f1 = unique1) ss;
@@ -4741,8 +4808,9 @@ explain (costs off)
  Nested Loop
    ->  Seq Scan on int4_tbl x
    ->  Index Scan using tenk1_unique1 on tenk1
+         Skip scan: All
          Index Cond: (unique1 = x.f1)
-(4 rows)
+(5 rows)
 
 explain (costs off)
   select unique2, x.*
@@ -4752,8 +4820,9 @@ explain (costs off)
  Nested Loop
    ->  Seq Scan on int4_tbl x
    ->  Index Scan using tenk1_unique1 on tenk1
+         Skip scan: All
          Index Cond: (unique1 = x.f1)
-(4 rows)
+(5 rows)
 
 select unique2, x.*
 from int4_tbl x left join lateral (select unique1, unique2 from tenk1 where f1 = unique1) ss on true;
@@ -4774,8 +4843,9 @@ explain (costs off)
  Nested Loop Left Join
    ->  Seq Scan on int4_tbl x
    ->  Index Scan using tenk1_unique1 on tenk1
+         Skip scan: All
          Index Cond: (unique1 = x.f1)
-(4 rows)
+(5 rows)
 
 -- check scoping of lateral versus parent references
 -- the first of these should return int8_tbl.q2, the second int8_tbl.q1
@@ -4873,8 +4943,10 @@ explain (costs off)
    ->  Merge Join
          Merge Cond: (a.unique1 = b.unique2)
          ->  Index Only Scan using tenk1_unique1 on tenk1 a
+               Skip scan: All
          ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(5 rows)
+               Skip scan: All
+(7 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1)) ss(x) on b.unique2 = ss.x;
@@ -4894,10 +4966,12 @@ explain (costs off)
          Hash Cond: ("*VALUES*".column1 = b.unique2)
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
+                     Skip scan: All
                ->  Values Scan on "*VALUES*"
          ->  Hash
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Skip scan: All
+(10 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
@@ -5638,8 +5712,9 @@ select * from
                Output: tenk1.unique1
                ->  Index Scan using tenk1_unique2 on public.tenk1
                      Output: tenk1.unique1
+                     Skip scan: All
                      Index Cond: (tenk1.unique2 = "*VALUES*".column2)
-(14 rows)
+(15 rows)
 
 select * from
   (values (0,9998), (1,1000)) v(id,x),
@@ -5867,14 +5942,18 @@ where f.c = 1;
    ->  Nested Loop Left Join
          ->  Nested Loop Left Join
                ->  Index Scan using fkest_c_key on fkest f
+                     Skip scan: All
                      Index Cond: (c = 1)
                ->  Index Only Scan using fkest1_pkey on fkest1 f1
+                     Skip scan: All
                      Index Cond: ((a = f.a) AND (b = f.b))
          ->  Index Only Scan using fkest1_pkey on fkest1 f2
+               Skip scan: All
                Index Cond: ((a = f.a) AND (b = f.b))
    ->  Index Only Scan using fkest1_pkey on fkest1 f3
+         Skip scan: All
          Index Cond: ((a = f.a) AND (b = f.b))
-(11 rows)
+(15 rows)
 
 rollback;
 --
@@ -6164,8 +6243,10 @@ where j1.id1 % 1000 = 1 and j2.id1 % 1000 = 1;
    Merge Cond: (j1.id1 = j2.id1)
    Join Filter: (j1.id2 = j2.id2)
    ->  Index Scan using j1_id1_idx on j1
+         Skip scan: All
    ->  Index Scan using j2_id1_idx on j2
-(5 rows)
+         Skip scan: All
+(7 rows)
 
 select * from j1
 inner join j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
@@ -6201,15 +6282,18 @@ where exists (select 1 from tenk1 t3
                Group Key: t3.thousand, t3.tenthous
                ->  Index Only Scan using tenk1_thous_tenthous on public.tenk1 t3
                      Output: t3.thousand, t3.tenthous
+                     Skip scan: All
          ->  Hash
                Output: t1.unique1
                ->  Index Only Scan using onek_unique1 on public.onek t1
                      Output: t1.unique1
+                     Skip scan: All
                      Index Cond: (t1.unique1 < 1)
    ->  Index Only Scan using tenk1_hundred on public.tenk1 t2
          Output: t2.hundred
+         Skip scan: All
          Index Cond: (t2.hundred = t3.tenthous)
-(18 rows)
+(21 rows)
 
 -- ... unless it actually is unique
 create table j3 as select unique1, tenthous from onek;
@@ -6229,13 +6313,16 @@ where exists (select 1 from j3
          Output: t1.unique1, j3.tenthous
          ->  Index Only Scan using onek_unique1 on public.onek t1
                Output: t1.unique1
+               Skip scan: All
                Index Cond: (t1.unique1 < 1)
          ->  Index Only Scan using j3_unique1_tenthous_idx on public.j3
                Output: j3.unique1, j3.tenthous
+               Skip scan: All
                Index Cond: (j3.unique1 = t1.unique1)
    ->  Index Only Scan using tenk1_hundred on public.tenk1 t2
          Output: t2.hundred
+         Skip scan: All
          Index Cond: (t2.hundred = j3.tenthous)
-(13 rows)
+(16 rows)
 
 drop table j3;
diff --git a/src/test/regress/expected/limit.out b/src/test/regress/expected/limit.out
index c18f547cbd..3e7085b379 100644
--- a/src/test/regress/expected/limit.out
+++ b/src/test/regress/expected/limit.out
@@ -322,7 +322,8 @@ select unique1, unique2, nextval('testseq')
    Output: unique1, unique2, (nextval('testseq'::regclass))
    ->  Index Scan using tenk1_unique2 on public.tenk1
          Output: unique1, unique2, nextval('testseq'::regclass)
-(4 rows)
+         Skip scan: All
+(5 rows)
 
 select unique1, unique2, nextval('testseq')
   from tenk1 order by unique2 limit 10;
@@ -395,7 +396,8 @@ select unique1, unique2, generate_series(1,10)
          Output: unique1, unique2, generate_series(1, 10)
          ->  Index Scan using tenk1_unique2 on public.tenk1
                Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
-(6 rows)
+               Skip scan: All
+(7 rows)
 
 select unique1, unique2, generate_series(1,10)
   from tenk1 order by unique2 limit 7;
@@ -492,7 +494,8 @@ select sum(tenthous) as s1, sum(tenthous) + random()*0 as s2
          Group Key: tenk1.thousand
          ->  Index Only Scan using tenk1_thous_tenthous on public.tenk1
                Output: thousand, tenthous
-(7 rows)
+               Skip scan: All
+(8 rows)
 
 select sum(tenthous) as s1, sum(tenthous) + random()*0 as s2
   from tenk1 group by thousand order by thousand limit 3;
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index d3acb98d04..ee6bcc4b1b 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -232,8 +232,9 @@ WHERE my_int_eq(a.unique2, 42);
    ->  Seq Scan on tenk1 a
          Filter: my_int_eq(unique2, 42)
    ->  Index Scan using tenk1_unique1 on tenk1 b
+         Skip scan: All
          Index Cond: (unique1 = a.unique1)
-(5 rows)
+(6 rows)
 
 -- Also test non-default rowcount estimate
 CREATE FUNCTION my_gen_series(int, int) RETURNS SETOF integer
@@ -258,6 +259,7 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
  Nested Loop
    ->  Function Scan on my_gen_series g
    ->  Index Scan using tenk1_unique1 on tenk1 a
+         Skip scan: All
          Index Cond: (unique1 = g.g)
-(4 rows)
+(5 rows)
 
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index b3fbe47bde..a1a52212a7 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -126,8 +126,9 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1 RIGHT JOIN prt2 t2 ON t1.a = t2.b WHE
                ->  Seq Scan on prt2_p3 t2_3
                      Filter: (a = 0)
                ->  Index Scan using iprt1_p3_a on prt1_p3 t1_3
+                     Skip scan: All
                      Index Cond: (a = t2_3.b)
-(20 rows)
+(21 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1 RIGHT JOIN prt2 t2 ON t1.a = t2.b WHERE t2.a = 0 ORDER BY t1.a, t2.b;
   a  |  c   |  b  |  c   
@@ -366,26 +367,32 @@ SELECT * FROM prt1 t1 LEFT JOIN LATERAL
                      Filter: (b = 0)
                ->  Nested Loop
                      ->  Index Only Scan using iprt1_p1_a on prt1_p1 t2_1
+                           Skip scan: All
                            Index Cond: (a = t1_1.a)
                      ->  Index Scan using iprt2_p1_b on prt2_p1 t3_1
+                           Skip scan: All
                            Index Cond: (b = t2_1.a)
          ->  Nested Loop Left Join
                ->  Seq Scan on prt1_p2 t1_2
                      Filter: (b = 0)
                ->  Nested Loop
                      ->  Index Only Scan using iprt1_p2_a on prt1_p2 t2_2
+                           Skip scan: All
                            Index Cond: (a = t1_2.a)
                      ->  Index Scan using iprt2_p2_b on prt2_p2 t3_2
+                           Skip scan: All
                            Index Cond: (b = t2_2.a)
          ->  Nested Loop Left Join
                ->  Seq Scan on prt1_p3 t1_3
                      Filter: (b = 0)
                ->  Nested Loop
                      ->  Index Only Scan using iprt1_p3_a on prt1_p3 t2_3
+                           Skip scan: All
                            Index Cond: (a = t1_3.a)
                      ->  Index Scan using iprt2_p3_b on prt2_p3 t3_3
+                           Skip scan: All
                            Index Cond: (b = t2_3.a)
-(27 rows)
+(33 rows)
 
 SELECT * FROM prt1 t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t3.a AS t3a, least(t1.a,t2.a,t3.b) FROM prt1 t2 JOIN prt2 t3 ON (t2.a = t3.b)) ss
@@ -609,6 +616,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t
                            ->  Seq Scan on prt1_p1 t1_1
                                  Filter: (b = 0)
                ->  Index Scan using iprt1_e_p1_ab2 on prt1_e_p1 t3_1
+                     Skip scan: All
                      Index Cond: (((a + b) / 2) = t2_1.b)
          ->  Nested Loop
                Join Filter: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
@@ -619,6 +627,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t
                            ->  Seq Scan on prt1_p2 t1_2
                                  Filter: (b = 0)
                ->  Index Scan using iprt1_e_p2_ab2 on prt1_e_p2 t3_2
+                     Skip scan: All
                      Index Cond: (((a + b) / 2) = t2_2.b)
          ->  Nested Loop
                Join Filter: (t1_3.a = ((t3_3.a + t3_3.b) / 2))
@@ -629,8 +638,9 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t
                            ->  Seq Scan on prt1_p3 t1_3
                                  Filter: (b = 0)
                ->  Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t3_3
+                     Skip scan: All
                      Index Cond: (((a + b) / 2) = t2_3.b)
-(33 rows)
+(36 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t3 WHERE t1.a = t2.b AND t1.a = (t3.a + t3.b)/2 AND t1.b = 0 ORDER BY t1.a, t2.b;
   a  |  c   |  b  |  c   | ?column? | c 
@@ -712,6 +722,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
                            ->  Seq Scan on prt1_e_p1 t3_1
                                  Filter: (c = 0)
                ->  Index Scan using iprt2_p1_b on prt2_p1 t2_1
+                     Skip scan: All
                      Index Cond: (b = t1_1.a)
          ->  Nested Loop Left Join
                ->  Hash Right Join
@@ -721,6 +732,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
                            ->  Seq Scan on prt1_e_p2 t3_2
                                  Filter: (c = 0)
                ->  Index Scan using iprt2_p2_b on prt2_p2 t2_2
+                     Skip scan: All
                      Index Cond: (b = t1_2.a)
          ->  Nested Loop Left Join
                ->  Hash Right Join
@@ -730,8 +742,9 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
                            ->  Seq Scan on prt1_e_p3 t3_3
                                  Filter: (c = 0)
                ->  Index Scan using iprt2_p3_b on prt2_p3 t2_3
+                     Skip scan: All
                      Index Cond: (b = t1_3.a)
-(30 rows)
+(33 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
@@ -826,6 +839,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHER
                                  ->  Seq Scan on prt2_p1 t1_5
                                        Filter: (a = 0)
                ->  Index Scan using iprt1_p1_a on prt1_p1 t1_2
+                     Skip scan: All
                      Index Cond: (a = ((t2_1.a + t2_1.b) / 2))
                      Filter: (b = 0)
          ->  Nested Loop
@@ -839,6 +853,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHER
                                  ->  Seq Scan on prt2_p2 t1_6
                                        Filter: (a = 0)
                ->  Index Scan using iprt1_p2_a on prt1_p2 t1_3
+                     Skip scan: All
                      Index Cond: (a = ((t2_2.a + t2_2.b) / 2))
                      Filter: (b = 0)
          ->  Nested Loop
@@ -849,11 +864,13 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHER
                            ->  Seq Scan on prt2_p3 t1_7
                                  Filter: (a = 0)
                            ->  Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t2_3
+                                 Skip scan: All
                                  Index Cond: (((a + b) / 2) = t1_7.b)
                ->  Index Scan using iprt1_p3_a on prt1_p3 t1_4
+                     Skip scan: All
                      Index Cond: (a = ((t2_3.a + t2_3.b) / 2))
                      Filter: (b = 0)
-(41 rows)
+(45 rows)
 
 SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHERE t1.a = 0 AND t1.b = (t2.a + t2.b)/2) AND t1.b = 0 ORDER BY t1.a;
   a  | b |  c   
@@ -881,6 +898,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (
                                  ->  Seq Scan on prt1_e_p1 t1_9
                                        Filter: (c = 0)
                ->  Index Scan using iprt1_p1_a on prt1_p1 t1_3
+                     Skip scan: All
                      Index Cond: (a = t1_6.b)
                      Filter: (b = 0)
          ->  Nested Loop
@@ -893,6 +911,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (
                                  ->  Seq Scan on prt1_e_p2 t1_10
                                        Filter: (c = 0)
                ->  Index Scan using iprt1_p2_a on prt1_p2 t1_4
+                     Skip scan: All
                      Index Cond: (a = t1_7.b)
                      Filter: (b = 0)
          ->  Nested Loop
@@ -905,9 +924,10 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (
                                  ->  Seq Scan on prt1_e_p3 t1_11
                                        Filter: (c = 0)
                ->  Index Scan using iprt1_p3_a on prt1_p3 t1_5
+                     Skip scan: All
                      Index Cond: (a = t1_8.b)
                      Filter: (b = 0)
-(39 rows)
+(42 rows)
 
 SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
   a  | b |  c   
@@ -1933,12 +1953,15 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1 LEFT JOIN prt2 t2 ON (t1.a < t2.b);
          ->  Seq Scan on prt1_p3 t1_3
    ->  Append
          ->  Index Scan using iprt2_p1_b on prt2_p1 t2_1
+               Skip scan: All
                Index Cond: (b > t1.a)
          ->  Index Scan using iprt2_p2_b on prt2_p2 t2_2
+               Skip scan: All
                Index Cond: (b > t1.a)
          ->  Index Scan using iprt2_p3_b on prt2_p3 t2_3
+               Skip scan: All
                Index Cond: (b > t1.a)
-(12 rows)
+(15 rows)
 
 -- equi-join with join condition on partial keys does not qualify for
 -- partitionwise join
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 9c8f80da87..8708a54ada 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2070,24 +2070,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
                      ->  Append (actual rows=N loops=N)
                            ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
-(27 rows)
+(36 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
@@ -2104,24 +2113,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
                      ->  Append (actual rows=N loops=N)
                            ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
                            ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
                            ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
                            ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
                            ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
                            ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
                            ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
                            ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
                            ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
-(27 rows)
+(36 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
@@ -2137,24 +2155,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
                      ->  Append (actual rows=N loops=N)
                            ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
-(27 rows)
+(36 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
                                         explain_parallel_append                                         
@@ -2170,24 +2197,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                            Rows Removed by Filter: N
                      ->  Append (actual rows=N loops=N)
                            ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
-(28 rows)
+(37 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
@@ -2204,24 +2240,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                            Rows Removed by Filter: N
                      ->  Append (actual rows=N loops=N)
                            ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
-(28 rows)
+(37 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
@@ -2245,48 +2290,57 @@ select * from ab where a = (select max(a) from lprt_a) and b = (select max(a)-1
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a1_b1_a_idx (never executed)
+               Skip scan: All
                Index Cond: (a = $0)
    ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (never executed)
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a1_b2_a_idx (never executed)
+               Skip scan: All
                Index Cond: (a = $0)
    ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (never executed)
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
+               Skip scan: All
                Index Cond: (a = $0)
    ->  Bitmap Heap Scan on ab_a2_b1 ab_4 (never executed)
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a2_b1_a_idx (never executed)
+               Skip scan: All
                Index Cond: (a = $0)
    ->  Bitmap Heap Scan on ab_a2_b2 ab_5 (never executed)
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a2_b2_a_idx (never executed)
+               Skip scan: All
                Index Cond: (a = $0)
    ->  Bitmap Heap Scan on ab_a2_b3 ab_6 (never executed)
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a2_b3_a_idx (never executed)
+               Skip scan: All
                Index Cond: (a = $0)
    ->  Bitmap Heap Scan on ab_a3_b1 ab_7 (never executed)
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a3_b1_a_idx (never executed)
+               Skip scan: All
                Index Cond: (a = $0)
    ->  Bitmap Heap Scan on ab_a3_b2 ab_8 (actual rows=0 loops=1)
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a3_b2_a_idx (actual rows=0 loops=1)
+               Skip scan: All
                Index Cond: (a = $0)
    ->  Bitmap Heap Scan on ab_a3_b3 ab_9 (never executed)
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a3_b3_a_idx (never executed)
+               Skip scan: All
                Index Cond: (a = $0)
-(52 rows)
+(61 rows)
 
 -- Test run-time partition pruning with UNION ALL parents
 explain (analyze, costs off, summary off, timing off)
@@ -2301,16 +2355,19 @@ select * from (select * from ab where a = 1 union all select * from ab) ab where
                Recheck Cond: (a = 1)
                Filter: (b = $0)
                ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                     Skip scan: All
                      Index Cond: (a = 1)
          ->  Bitmap Heap Scan on ab_a1_b2 ab_12 (never executed)
                Recheck Cond: (a = 1)
                Filter: (b = $0)
                ->  Bitmap Index Scan on ab_a1_b2_a_idx (never executed)
+                     Skip scan: All
                      Index Cond: (a = 1)
          ->  Bitmap Heap Scan on ab_a1_b3 ab_13 (never executed)
                Recheck Cond: (a = 1)
                Filter: (b = $0)
                ->  Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
+                     Skip scan: All
                      Index Cond: (a = 1)
    ->  Seq Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
          Filter: (b = $0)
@@ -2330,7 +2387,7 @@ select * from (select * from ab where a = 1 union all select * from ab) ab where
          Filter: (b = $0)
    ->  Seq Scan on ab_a3_b3 ab_9 (never executed)
          Filter: (b = $0)
-(37 rows)
+(40 rows)
 
 -- A case containing a UNION ALL with a non-partitioned child.
 explain (analyze, costs off, summary off, timing off)
@@ -2345,16 +2402,19 @@ select * from (select * from ab where a = 1 union all (values(10,5)) union all s
                Recheck Cond: (a = 1)
                Filter: (b = $0)
                ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                     Skip scan: All
                      Index Cond: (a = 1)
          ->  Bitmap Heap Scan on ab_a1_b2 ab_12 (never executed)
                Recheck Cond: (a = 1)
                Filter: (b = $0)
                ->  Bitmap Index Scan on ab_a1_b2_a_idx (never executed)
+                     Skip scan: All
                      Index Cond: (a = 1)
          ->  Bitmap Heap Scan on ab_a1_b3 ab_13 (never executed)
                Recheck Cond: (a = 1)
                Filter: (b = $0)
                ->  Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
+                     Skip scan: All
                      Index Cond: (a = 1)
    ->  Result (actual rows=0 loops=1)
          One-Time Filter: (5 = $0)
@@ -2376,7 +2436,7 @@ select * from (select * from ab where a = 1 union all (values(10,5)) union all s
          Filter: (b = $0)
    ->  Seq Scan on ab_a3_b3 ab_9 (never executed)
          Filter: (b = $0)
-(39 rows)
+(42 rows)
 
 -- Another UNION ALL test, but containing a mix of exec init and exec run-time pruning.
 create table xy_1 (x int, y int);
@@ -2446,63 +2506,75 @@ update ab_a1 set b = 3 from ab where ab.a = 1 and ab.a = ab_a1.a;
                ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
                ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
                      Recheck Cond: (a = 1)
                      Heap Blocks: exact=1
                      ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
                ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=0 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
          ->  Materialize (actual rows=0 loops=1)
                ->  Bitmap Heap Scan on ab_a1_b1 ab_a1_1 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
    ->  Nested Loop (actual rows=1 loops=1)
          ->  Append (actual rows=1 loops=1)
                ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
                ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
                      Recheck Cond: (a = 1)
                      Heap Blocks: exact=1
                      ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
                ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
          ->  Materialize (actual rows=1 loops=1)
                ->  Bitmap Heap Scan on ab_a1_b2 ab_a1_2 (actual rows=1 loops=1)
                      Recheck Cond: (a = 1)
                      Heap Blocks: exact=1
                      ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
    ->  Nested Loop (actual rows=0 loops=1)
          ->  Append (actual rows=1 loops=1)
                ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
                ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
                      Recheck Cond: (a = 1)
                      Heap Blocks: exact=1
                      ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
                ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
          ->  Materialize (actual rows=0 loops=1)
                ->  Bitmap Heap Scan on ab_a1_b3 ab_a1_3 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
-(65 rows)
+(77 rows)
 
 table ab;
  a | b 
@@ -2593,18 +2665,24 @@ select * from tbl1 join tprt on tbl1.col1 > tprt.col1;
    ->  Seq Scan on tbl1 (actual rows=2 loops=1)
    ->  Append (actual rows=3 loops=2)
          ->  Index Scan using tprt1_idx on tprt_1 (actual rows=2 loops=2)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt2_idx on tprt_2 (actual rows=2 loops=1)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt3_idx on tprt_3 (never executed)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt4_idx on tprt_4 (never executed)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt5_idx on tprt_5 (never executed)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt6_idx on tprt_6 (never executed)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
-(15 rows)
+(21 rows)
 
 explain (analyze, costs off, summary off, timing off)
 select * from tbl1 join tprt on tbl1.col1 = tprt.col1;
@@ -2614,18 +2692,24 @@ select * from tbl1 join tprt on tbl1.col1 = tprt.col1;
    ->  Seq Scan on tbl1 (actual rows=2 loops=1)
    ->  Append (actual rows=1 loops=2)
          ->  Index Scan using tprt1_idx on tprt_1 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt2_idx on tprt_2 (actual rows=1 loops=2)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt3_idx on tprt_3 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt4_idx on tprt_4 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt5_idx on tprt_5 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt6_idx on tprt_6 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
-(15 rows)
+(21 rows)
 
 select tbl1.col1, tprt.col1 from tbl1
 inner join tprt on tbl1.col1 > tprt.col1
@@ -2659,18 +2743,24 @@ select * from tbl1 inner join tprt on tbl1.col1 > tprt.col1;
    ->  Seq Scan on tbl1 (actual rows=5 loops=1)
    ->  Append (actual rows=5 loops=5)
          ->  Index Scan using tprt1_idx on tprt_1 (actual rows=2 loops=5)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt2_idx on tprt_2 (actual rows=3 loops=4)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt3_idx on tprt_3 (actual rows=1 loops=2)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt4_idx on tprt_4 (never executed)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt5_idx on tprt_5 (never executed)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt6_idx on tprt_6 (never executed)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
-(15 rows)
+(21 rows)
 
 explain (analyze, costs off, summary off, timing off)
 select * from tbl1 inner join tprt on tbl1.col1 = tprt.col1;
@@ -2680,18 +2770,24 @@ select * from tbl1 inner join tprt on tbl1.col1 = tprt.col1;
    ->  Seq Scan on tbl1 (actual rows=5 loops=1)
    ->  Append (actual rows=1 loops=5)
          ->  Index Scan using tprt1_idx on tprt_1 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt2_idx on tprt_2 (actual rows=1 loops=2)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt3_idx on tprt_3 (actual rows=0 loops=3)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt4_idx on tprt_4 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt5_idx on tprt_5 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt6_idx on tprt_6 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
-(15 rows)
+(21 rows)
 
 select tbl1.col1, tprt.col1 from tbl1
 inner join tprt on tbl1.col1 > tprt.col1
@@ -2744,18 +2840,24 @@ select * from tbl1 join tprt on tbl1.col1 < tprt.col1;
    ->  Seq Scan on tbl1 (actual rows=1 loops=1)
    ->  Append (actual rows=1 loops=1)
          ->  Index Scan using tprt1_idx on tprt_1 (never executed)
+               Skip scan: All
                Index Cond: (col1 > tbl1.col1)
          ->  Index Scan using tprt2_idx on tprt_2 (never executed)
+               Skip scan: All
                Index Cond: (col1 > tbl1.col1)
          ->  Index Scan using tprt3_idx on tprt_3 (never executed)
+               Skip scan: All
                Index Cond: (col1 > tbl1.col1)
          ->  Index Scan using tprt4_idx on tprt_4 (never executed)
+               Skip scan: All
                Index Cond: (col1 > tbl1.col1)
          ->  Index Scan using tprt5_idx on tprt_5 (never executed)
+               Skip scan: All
                Index Cond: (col1 > tbl1.col1)
          ->  Index Scan using tprt6_idx on tprt_6 (actual rows=1 loops=1)
+               Skip scan: All
                Index Cond: (col1 > tbl1.col1)
-(15 rows)
+(21 rows)
 
 select tbl1.col1, tprt.col1 from tbl1
 inner join tprt on tbl1.col1 < tprt.col1
@@ -2776,18 +2878,24 @@ select * from tbl1 join tprt on tbl1.col1 = tprt.col1;
    ->  Seq Scan on tbl1 (actual rows=1 loops=1)
    ->  Append (actual rows=0 loops=1)
          ->  Index Scan using tprt1_idx on tprt_1 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt2_idx on tprt_2 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt3_idx on tprt_3 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt4_idx on tprt_4 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt5_idx on tprt_5 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt6_idx on tprt_6 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
-(15 rows)
+(21 rows)
 
 select tbl1.col1, tprt.col1 from tbl1
 inner join tprt on tbl1.col1 = tprt.col1
@@ -3115,12 +3223,14 @@ explain (analyze, costs off, summary off, timing off) execute mt_q1(15);
    Sort Key: ma_test.b
    Subplans Removed: 1
    ->  Index Scan using ma_test_p2_b_idx on ma_test_p2 ma_test_1 (actual rows=1 loops=1)
+         Skip scan: All
          Filter: ((a >= $1) AND ((a % 10) = 5))
          Rows Removed by Filter: 9
    ->  Index Scan using ma_test_p3_b_idx on ma_test_p3 ma_test_2 (actual rows=1 loops=1)
+         Skip scan: All
          Filter: ((a >= $1) AND ((a % 10) = 5))
          Rows Removed by Filter: 9
-(9 rows)
+(11 rows)
 
 execute mt_q1(15);
  a  
@@ -3136,9 +3246,10 @@ explain (analyze, costs off, summary off, timing off) execute mt_q1(25);
    Sort Key: ma_test.b
    Subplans Removed: 2
    ->  Index Scan using ma_test_p3_b_idx on ma_test_p3 ma_test_1 (actual rows=1 loops=1)
+         Skip scan: All
          Filter: ((a >= $1) AND ((a % 10) = 5))
          Rows Removed by Filter: 9
-(6 rows)
+(7 rows)
 
 execute mt_q1(25);
  a  
@@ -3185,14 +3296,18 @@ explain (analyze, costs off, summary off, timing off) select * from ma_test wher
            InitPlan 1 (returns $0)
              ->  Limit (actual rows=1 loops=1)
                    ->  Index Scan using ma_test_p2_b_idx on ma_test_p2 (actual rows=1 loops=1)
+                         Skip scan: All
                          Index Cond: (b IS NOT NULL)
    ->  Index Scan using ma_test_p1_b_idx on ma_test_p1 ma_test_1 (never executed)
+         Skip scan: All
          Filter: (a >= $1)
    ->  Index Scan using ma_test_p2_b_idx on ma_test_p2 ma_test_2 (actual rows=10 loops=1)
+         Skip scan: All
          Filter: (a >= $1)
    ->  Index Scan using ma_test_p3_b_idx on ma_test_p3 ma_test_3 (actual rows=10 loops=1)
+         Skip scan: All
          Filter: (a >= $1)
-(14 rows)
+(18 rows)
 
 reset enable_seqscan;
 reset enable_sort;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 7d289b8c5e..2e75bad44d 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -290,8 +290,9 @@ explain (costs off) execute test_mode_pp(2);
 ----------------------------------------------------------
  Aggregate
    ->  Index Only Scan using test_mode_a_idx on test_mode
+         Skip scan: All
          Index Cond: (a = 2)
-(3 rows)
+(4 rows)
 
 -- force generic plan
 set plan_cache_mode to force_generic_plan;
@@ -351,7 +352,8 @@ explain (costs off) execute test_mode_pp(2);
 ----------------------------------------------------------
  Aggregate
    ->  Index Only Scan using test_mode_a_idx on test_mode
+         Skip scan: All
          Index Cond: (a = 2)
-(3 rows)
+(4 rows)
 
 drop table test_mode;
diff --git a/src/test/regress/expected/portals.out b/src/test/regress/expected/portals.out
index dc0d2ef7dd..5c18de3aa0 100644
--- a/src/test/regress/expected/portals.out
+++ b/src/test/regress/expected/portals.out
@@ -1253,8 +1253,9 @@ DECLARE c1 CURSOR FOR SELECT stringu1 FROM onek WHERE stringu1 = 'DZAAAA';
                  QUERY PLAN                  
 ---------------------------------------------
  Index Only Scan using onek_stringu1 on onek
+   Skip scan: All
    Index Cond: (stringu1 = 'DZAAAA'::name)
-(2 rows)
+(3 rows)
 
 DECLARE c1 CURSOR FOR SELECT stringu1 FROM onek WHERE stringu1 = 'DZAAAA';
 FETCH FROM c1;
diff --git a/src/test/regress/expected/privileges.out b/src/test/regress/expected/privileges.out
index c2d037b614..00a113bcd9 100644
--- a/src/test/regress/expected/privileges.out
+++ b/src/test/regress/expected/privileges.out
@@ -212,9 +212,10 @@ EXPLAIN (COSTS OFF) SELECT * FROM atest12v x, atest12v y WHERE x.a = y.b;
    ->  Seq Scan on atest12 atest12_1
          Filter: (b <<< 5)
    ->  Index Scan using atest12_a_idx on atest12
+         Skip scan: All
          Index Cond: (a = atest12_1.b)
          Filter: (b <<< 5)
-(6 rows)
+(7 rows)
 
 -- And this one.
 EXPLAIN (COSTS OFF) SELECT * FROM atest12 x, atest12 y
@@ -225,8 +226,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM atest12 x, atest12 y
    ->  Seq Scan on atest12 y
          Filter: (abs(a) <<< 5)
    ->  Index Scan using atest12_a_idx on atest12 x
+         Skip scan: All
          Index Cond: (a = y.b)
-(5 rows)
+(6 rows)
 
 -- This should also be a nestloop, but the security barrier forces the inner
 -- scan to be materialized
@@ -261,9 +263,10 @@ EXPLAIN (COSTS OFF) SELECT * FROM atest12v x, atest12v y WHERE x.a = y.b;
    ->  Seq Scan on atest12 atest12_1
          Filter: (b <<< 5)
    ->  Index Scan using atest12_a_idx on atest12
+         Skip scan: All
          Index Cond: (a = atest12_1.b)
          Filter: (b <<< 5)
-(6 rows)
+(7 rows)
 
 EXPLAIN (COSTS OFF) SELECT * FROM atest12sbv x, atest12sbv y WHERE x.a = y.b;
                 QUERY PLAN                 
@@ -286,9 +289,10 @@ EXPLAIN (COSTS OFF) SELECT * FROM atest12v x, atest12v y
    ->  Seq Scan on atest12 atest12_1
          Filter: ((b <<< 5) AND (abs(a) <<< 5))
    ->  Index Scan using atest12_a_idx on atest12
+         Skip scan: All
          Index Cond: (a = atest12_1.b)
          Filter: (b <<< 5)
-(6 rows)
+(7 rows)
 
 -- But a security barrier view isolates the leaky operator.
 EXPLAIN (COSTS OFF) SELECT * FROM atest12sbv x, atest12sbv y
@@ -317,9 +321,10 @@ EXPLAIN (COSTS OFF) SELECT * FROM atest12v x, atest12v y WHERE x.a = y.b;
    ->  Seq Scan on atest12 atest12_1
          Filter: (b <<< 5)
    ->  Index Scan using atest12_a_idx on atest12
+         Skip scan: All
          Index Cond: (a = atest12_1.b)
          Filter: (b <<< 5)
-(6 rows)
+(7 rows)
 
 -- But not for this, due to lack of table-wide permissions needed
 -- to make use of the expression index's statistics.
diff --git a/src/test/regress/expected/regex.out b/src/test/regress/expected/regex.out
index 0923ad9b5b..1cd0fc95fa 100644
--- a/src/test/regress/expected/regex.out
+++ b/src/test/regress/expected/regex.out
@@ -299,49 +299,55 @@ explain (costs off) select * from pg_proc where proname ~ '^abc';
                               QUERY PLAN                              
 ----------------------------------------------------------------------
  Index Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: ((proname >= 'abc'::text) AND (proname < 'abd'::text))
    Filter: (proname ~ '^abc'::text)
-(3 rows)
+(4 rows)
 
 explain (costs off) select * from pg_proc where proname ~ '^abc$';
                          QUERY PLAN                         
 ------------------------------------------------------------
  Index Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: (proname = 'abc'::text)
    Filter: (proname ~ '^abc$'::text)
-(3 rows)
+(4 rows)
 
 explain (costs off) select * from pg_proc where proname ~ '^abcd*e';
                               QUERY PLAN                              
 ----------------------------------------------------------------------
  Index Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: ((proname >= 'abc'::text) AND (proname < 'abd'::text))
    Filter: (proname ~ '^abcd*e'::text)
-(3 rows)
+(4 rows)
 
 explain (costs off) select * from pg_proc where proname ~ '^abc+d';
                               QUERY PLAN                              
 ----------------------------------------------------------------------
  Index Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: ((proname >= 'abc'::text) AND (proname < 'abd'::text))
    Filter: (proname ~ '^abc+d'::text)
-(3 rows)
+(4 rows)
 
 explain (costs off) select * from pg_proc where proname ~ '^(abc)(def)';
                                  QUERY PLAN                                 
 ----------------------------------------------------------------------------
  Index Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: ((proname >= 'abcdef'::text) AND (proname < 'abcdeg'::text))
    Filter: (proname ~ '^(abc)(def)'::text)
-(3 rows)
+(4 rows)
 
 explain (costs off) select * from pg_proc where proname ~ '^(abc)$';
                          QUERY PLAN                         
 ------------------------------------------------------------
  Index Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: (proname = 'abc'::text)
    Filter: (proname ~ '^(abc)$'::text)
-(3 rows)
+(4 rows)
 
 explain (costs off) select * from pg_proc where proname ~ '^(abc)?d';
                QUERY PLAN               
@@ -354,9 +360,10 @@ explain (costs off) select * from pg_proc where proname ~ '^abcd(x|(?=\w\w)q)';
                                QUERY PLAN                               
 ------------------------------------------------------------------------
  Index Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: ((proname >= 'abcd'::text) AND (proname < 'abce'::text))
    Filter: (proname ~ '^abcd(x|(?=\w\w)q)'::text)
-(3 rows)
+(4 rows)
 
 -- Test for infinite loop in pullback() (CVE-2007-4772)
 select 'a' ~ '($|^)*';
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 9506aaef82..17d1c916cb 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -271,8 +271,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM document WHERE f_leak(dtitle);
    Filter: ((dlevel <= $0) AND f_leak(dtitle))
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
-(5 rows)
+(6 rows)
 
 EXPLAIN (COSTS OFF) SELECT * FROM document NATURAL JOIN category WHERE f_leak(dtitle);
                         QUERY PLAN                         
@@ -281,12 +282,13 @@ EXPLAIN (COSTS OFF) SELECT * FROM document NATURAL JOIN category WHERE f_leak(dt
    Hash Cond: (category.cid = document.cid)
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
    ->  Seq Scan on category
    ->  Hash
          ->  Seq Scan on document
                Filter: ((dlevel <= $0) AND f_leak(dtitle))
-(9 rows)
+(10 rows)
 
 -- viewpoint from regress_rls_dave
 SET SESSION AUTHORIZATION regress_rls_dave;
@@ -335,8 +337,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM document WHERE f_leak(dtitle);
    Filter: ((cid <> 44) AND (cid <> 44) AND (cid < 50) AND (dlevel <= $0) AND f_leak(dtitle))
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
-(5 rows)
+(6 rows)
 
 EXPLAIN (COSTS OFF) SELECT * FROM document NATURAL JOIN category WHERE f_leak(dtitle);
                                                 QUERY PLAN                                                
@@ -345,12 +348,13 @@ EXPLAIN (COSTS OFF) SELECT * FROM document NATURAL JOIN category WHERE f_leak(dt
    Hash Cond: (category.cid = document.cid)
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
    ->  Seq Scan on category
    ->  Hash
          ->  Seq Scan on document
                Filter: ((cid <> 44) AND (cid <> 44) AND (cid < 50) AND (dlevel <= $0) AND f_leak(dtitle))
-(9 rows)
+(10 rows)
 
 -- 44 would technically fail for both p2r and p1r, but we should get an error
 -- back from p1r for this because it sorts first
@@ -436,8 +440,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM document NATURAL JOIN category WHERE f_leak(dt
    ->  Seq Scan on document
          Filter: ((dauthor = CURRENT_USER) AND f_leak(dtitle))
    ->  Index Scan using category_pkey on category
+         Skip scan: All
          Index Cond: (cid = document.cid)
-(5 rows)
+(6 rows)
 
 -- interaction of FK/PK constraints
 SET SESSION AUTHORIZATION regress_rls_alice;
@@ -990,6 +995,7 @@ EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
  Append
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
    ->  Seq Scan on part_document_fiction part_document_1
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
@@ -997,7 +1003,7 @@ EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
    ->  Seq Scan on part_document_nonfiction part_document_3
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
-(10 rows)
+(11 rows)
 
 -- viewpoint from regress_rls_carol
 SET SESSION AUTHORIZATION regress_rls_carol;
@@ -1032,6 +1038,7 @@ EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
  Append
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
    ->  Seq Scan on part_document_fiction part_document_1
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
@@ -1039,7 +1046,7 @@ EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
    ->  Seq Scan on part_document_nonfiction part_document_3
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
-(10 rows)
+(11 rows)
 
 -- viewpoint from regress_rls_dave
 SET SESSION AUTHORIZATION regress_rls_dave;
@@ -1063,8 +1070,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
    Filter: ((cid < 55) AND (dlevel <= $0) AND f_leak(dtitle))
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
-(5 rows)
+(6 rows)
 
 -- pp1 ERROR
 INSERT INTO part_document VALUES (100, 11, 5, 'regress_rls_dave', 'testing pp1'); -- fail
@@ -1141,8 +1149,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
    Filter: ((cid < 55) AND (dlevel <= $0) AND f_leak(dtitle))
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
-(5 rows)
+(6 rows)
 
 -- viewpoint from regress_rls_carol
 SET SESSION AUTHORIZATION regress_rls_carol;
@@ -1179,6 +1188,7 @@ EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
  Append
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
    ->  Seq Scan on part_document_fiction part_document_1
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
@@ -1186,7 +1196,7 @@ EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
    ->  Seq Scan on part_document_nonfiction part_document_3
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
-(10 rows)
+(11 rows)
 
 -- only owner can change policies
 ALTER POLICY pp1 ON part_document USING (true);    --fail
diff --git a/src/test/regress/expected/rowtypes.out b/src/test/regress/expected/rowtypes.out
index 2a273f8404..4d58053e4f 100644
--- a/src/test/regress/expected/rowtypes.out
+++ b/src/test/regress/expected/rowtypes.out
@@ -259,8 +259,9 @@ order by thousand, tenthous;
                         QUERY PLAN                         
 -----------------------------------------------------------
  Index Only Scan using tenk1_thous_tenthous on tenk1
+   Skip scan: All
    Index Cond: (ROW(thousand, tenthous) >= ROW(997, 5000))
-(2 rows)
+(3 rows)
 
 select thousand, tenthous from tenk1
 where (thousand, tenthous) >= (997, 5000)
@@ -305,8 +306,9 @@ order by thousand, tenthous;
    ->  Bitmap Heap Scan on tenk1
          Filter: (ROW(thousand, tenthous, four) > ROW(998, 5000, 3))
          ->  Bitmap Index Scan on tenk1_thous_tenthous
+               Skip scan: All
                Index Cond: (ROW(thousand, tenthous) >= ROW(998, 5000))
-(6 rows)
+(7 rows)
 
 select thousand, tenthous, four from tenk1
 where (thousand, tenthous, four) > (998, 5000, 3)
@@ -337,8 +339,9 @@ order by thousand, tenthous;
                         QUERY PLAN                        
 ----------------------------------------------------------
  Index Only Scan using tenk1_thous_tenthous on tenk1
+   Skip scan: All
    Index Cond: (ROW(thousand, tenthous) > ROW(998, 5000))
-(2 rows)
+(3 rows)
 
 select thousand, tenthous from tenk1
 where (998, 5000) < (thousand, tenthous)
@@ -373,8 +376,9 @@ order by thousand, hundred;
    ->  Bitmap Heap Scan on tenk1
          Filter: (ROW(998, 5000) < ROW(thousand, hundred))
          ->  Bitmap Index Scan on tenk1_thous_tenthous
+               Skip scan: All
                Index Cond: (thousand >= 998)
-(6 rows)
+(7 rows)
 
 select thousand, hundred from tenk1
 where (998, 5000) < (thousand, hundred)
@@ -405,8 +409,9 @@ select a,b from test_table where (a,b) > ('a','a') order by a,b;
                        QUERY PLAN                       
 --------------------------------------------------------
  Index Only Scan using test_table_a_b_idx on test_table
+   Skip scan: All
    Index Cond: (ROW(a, b) > ROW('a'::text, 'a'::text))
-(2 rows)
+(3 rows)
 
 select a,b from test_table where (a,b) > ('a','a') order by a,b;
  a | b 
@@ -1109,8 +1114,9 @@ select row_to_json(q) from
 -------------------------------------------------------------
  Subquery Scan on q
    ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+         Skip scan: All
          Index Cond: ((thousand = 42) AND (tenthous < 2000))
-(3 rows)
+(4 rows)
 
 select row_to_json(q) from
   (select thousand, tenthous from tenk1
diff --git a/src/test/regress/expected/select.out b/src/test/regress/expected/select.out
index c441049f41..dd7c4117ed 100644
--- a/src/test/regress/expected/select.out
+++ b/src/test/regress/expected/select.out
@@ -742,9 +742,10 @@ select * from onek2 where unique2 = 11 and stringu1 = 'ATAAAA';
                QUERY PLAN                
 -----------------------------------------
  Index Scan using onek2_u2_prtl on onek2
+   Skip scan: All
    Index Cond: (unique2 = 11)
    Filter: (stringu1 = 'ATAAAA'::name)
-(3 rows)
+(4 rows)
 
 select * from onek2 where unique2 = 11 and stringu1 = 'ATAAAA';
  unique1 | unique2 | two | four | ten | twenty | hundred | thousand | twothousand | fivethous | tenthous | odd | even | stringu1 | stringu2 | string4 
@@ -758,18 +759,20 @@ select * from onek2 where unique2 = 11 and stringu1 = 'ATAAAA';
                            QUERY PLAN                            
 -----------------------------------------------------------------
  Index Scan using onek2_u2_prtl on onek2 (actual rows=1 loops=1)
+   Skip scan: All
    Index Cond: (unique2 = 11)
    Filter: (stringu1 = 'ATAAAA'::name)
-(3 rows)
+(4 rows)
 
 explain (costs off)
 select unique2 from onek2 where unique2 = 11 and stringu1 = 'ATAAAA';
                QUERY PLAN                
 -----------------------------------------
  Index Scan using onek2_u2_prtl on onek2
+   Skip scan: All
    Index Cond: (unique2 = 11)
    Filter: (stringu1 = 'ATAAAA'::name)
-(3 rows)
+(4 rows)
 
 select unique2 from onek2 where unique2 = 11 and stringu1 = 'ATAAAA';
  unique2 
@@ -783,8 +786,9 @@ select * from onek2 where unique2 = 11 and stringu1 < 'B';
                QUERY PLAN                
 -----------------------------------------
  Index Scan using onek2_u2_prtl on onek2
+   Skip scan: All
    Index Cond: (unique2 = 11)
-(2 rows)
+(3 rows)
 
 select * from onek2 where unique2 = 11 and stringu1 < 'B';
  unique1 | unique2 | two | four | ten | twenty | hundred | thousand | twothousand | fivethous | tenthous | odd | even | stringu1 | stringu2 | string4 
@@ -797,8 +801,9 @@ select unique2 from onek2 where unique2 = 11 and stringu1 < 'B';
                   QUERY PLAN                  
 ----------------------------------------------
  Index Only Scan using onek2_u2_prtl on onek2
+   Skip scan: All
    Index Cond: (unique2 = 11)
-(2 rows)
+(3 rows)
 
 select unique2 from onek2 where unique2 = 11 and stringu1 < 'B';
  unique2 
@@ -813,9 +818,10 @@ select unique2 from onek2 where unique2 = 11 and stringu1 < 'B' for update;
 -----------------------------------------------
  LockRows
    ->  Index Scan using onek2_u2_prtl on onek2
+         Skip scan: All
          Index Cond: (unique2 = 11)
          Filter: (stringu1 < 'B'::name)
-(4 rows)
+(5 rows)
 
 select unique2 from onek2 where unique2 = 11 and stringu1 < 'B' for update;
  unique2 
@@ -847,8 +853,9 @@ select unique2 from onek2 where unique2 = 11 and stringu1 < 'B';
  Bitmap Heap Scan on onek2
    Recheck Cond: ((unique2 = 11) AND (stringu1 < 'B'::name))
    ->  Bitmap Index Scan on onek2_u2_prtl
+         Skip scan: All
          Index Cond: (unique2 = 11)
-(4 rows)
+(5 rows)
 
 select unique2 from onek2 where unique2 = 11 and stringu1 < 'B';
  unique2 
@@ -868,10 +875,12 @@ select unique1, unique2 from onek2
    Filter: (stringu1 < 'B'::name)
    ->  BitmapOr
          ->  Bitmap Index Scan on onek2_u2_prtl
+               Skip scan: All
                Index Cond: (unique2 = 11)
          ->  Bitmap Index Scan on onek2_u1_prtl
+               Skip scan: All
                Index Cond: (unique1 = 0)
-(8 rows)
+(10 rows)
 
 select unique1, unique2 from onek2
   where (unique2 = 11 or unique1 = 0) and stringu1 < 'B';
@@ -890,10 +899,12 @@ select unique1, unique2 from onek2
    Recheck Cond: (((unique2 = 11) AND (stringu1 < 'B'::name)) OR (unique1 = 0))
    ->  BitmapOr
          ->  Bitmap Index Scan on onek2_u2_prtl
+               Skip scan: All
                Index Cond: (unique2 = 11)
          ->  Bitmap Index Scan on onek2_u1_prtl
+               Skip scan: All
                Index Cond: (unique1 = 0)
-(7 rows)
+(9 rows)
 
 select unique1, unique2 from onek2
   where (unique2 = 11 and stringu1 < 'B') or unique1 = 0;
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index e21afa7990..076c3d571d 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -395,18 +395,18 @@ SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
 
 EXPLAIN (COSTS OFF)
 SELECT DISTINCT a FROM distinct_a WHERE b = 2;
-                     QUERY PLAN                     
-----------------------------------------------------
- Index Only Scan using distinct_a_b_a on distinct_a
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Index Only Scan using distinct_a_a_b_idx on distinct_a
    Skip scan: Distinct only
    Index Cond: (b = 2)
 (3 rows)
 
 EXPLAIN (COSTS OFF)
 SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
-                     QUERY PLAN                     
-----------------------------------------------------
- Index Only Scan using distinct_a_b_a on distinct_a
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Index Only Scan using distinct_a_a_b_idx on distinct_a
    Skip scan: Distinct only
    Index Cond: (b = 2)
 (3 rows)
@@ -633,8 +633,9 @@ FROM distinct_a WHERE a = 1 ORDER BY a;
    ->  Bitmap Heap Scan on distinct_a
          Recheck Cond: (a = 1)
          ->  Bitmap Index Scan on distinct_a_a_b_idx
+               Skip scan: All
                Index Cond: (a = 1)
-(5 rows)
+(6 rows)
 
 -- check colums order
 SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 96dfb7c8dd..5e6377f258 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -523,8 +523,9 @@ explain (costs off)
                ->  Parallel Bitmap Heap Scan on tenk1
                      Recheck Cond: (hundred > 1)
                      ->  Bitmap Index Scan on tenk1_hundred
+                           Skip scan: All
                            Index Cond: (hundred > 1)
-(10 rows)
+(11 rows)
 
 select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
  count 
@@ -621,7 +622,8 @@ explain (costs off)
                      Merge Cond: (tenk1.unique1 = tenk2.unique1)
                      ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
                      ->  Index Only Scan using tenk2_unique1 on tenk2
-(8 rows)
+                           Skip scan: All
+(9 rows)
 
 select  count(*) from tenk1, tenk2 where tenk1.unique1 = tenk2.unique1;
  count 
@@ -949,8 +951,9 @@ explain (costs off)
    Workers Planned: 1
    Single Copy: true
    ->  Index Scan using tenk1_unique1 on tenk1
+         Skip scan: All
          Index Cond: (unique1 = 1)
-(5 rows)
+(6 rows)
 
 ROLLBACK TO SAVEPOINT settings;
 -- exercise record typmod remapping between backends
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 4c6cd5f146..7334413144 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -895,12 +895,13 @@ where o.ten = 0;
                Filter: (o.ten = 0)
          ->  Index Scan using onek_unique1 on public.onek i
                Output: (hashed SubPlan 1), random()
+               Skip scan: All
                Index Cond: (i.unique1 = o.unique1)
                SubPlan 1
                  ->  Seq Scan on public.int4_tbl
                        Output: int4_tbl.f1
                        Filter: (int4_tbl.f1 <= $0)
-(14 rows)
+(15 rows)
 
 select sum(ss.tst::int) from
   onek o cross join lateral (
@@ -935,11 +936,13 @@ where o.ten = 1;
                      ->  Append
                            ->  Subquery Scan on "*SELECT* 1"
                                  ->  Index Scan using onek_unique1 on onek i1
+                                       Skip scan: All
                                        Index Cond: (unique1 = o.unique1)
                            ->  Subquery Scan on "*SELECT* 2"
                                  ->  Index Scan using onek_unique1 on onek i2
+                                       Skip scan: All
                                        Index Cond: (unique1 = o.unique2)
-(13 rows)
+(15 rows)
 
 select count(*) from
   onek o cross join lateral (
@@ -1095,7 +1098,8 @@ select * from int4_tbl where
    SubPlan 1
      ->  Index Only Scan using tenk1_unique1 on public.tenk1 a
            Output: a.unique1
-(10 rows)
+           Skip scan: All
+(11 rows)
 
 select * from int4_tbl where
   (case when f1 in (select unique1 from tenk1 a) then f1 else null end) in
diff --git a/src/test/regress/expected/tuplesort.out b/src/test/regress/expected/tuplesort.out
index 3fc1998bf2..f47744d2fe 100644
--- a/src/test/regress/expected/tuplesort.out
+++ b/src/test/regress/expected/tuplesort.out
@@ -146,7 +146,8 @@ SELECT id, noabort_increasing, noabort_decreasing FROM abbrev_abort_uuids ORDER
 -----------------------------------------------------------------------------------------
  Limit
    ->  Index Scan using abbrev_abort_uuids__noabort_increasing_idx on abbrev_abort_uuids
-(2 rows)
+         Skip scan: All
+(3 rows)
 
 SELECT id, noabort_increasing, noabort_decreasing FROM abbrev_abort_uuids ORDER BY noabort_increasing LIMIT 5;
   id   |          noabort_increasing          |          noabort_decreasing          
@@ -164,7 +165,8 @@ SELECT id, noabort_increasing, noabort_decreasing FROM abbrev_abort_uuids ORDER
 -----------------------------------------------------------------------------------------
  Limit
    ->  Index Scan using abbrev_abort_uuids__noabort_decreasing_idx on abbrev_abort_uuids
-(2 rows)
+         Skip scan: All
+(3 rows)
 
 SELECT id, noabort_increasing, noabort_decreasing FROM abbrev_abort_uuids ORDER BY noabort_decreasing LIMIT 5;
   id   |          noabort_increasing          |          noabort_decreasing          
@@ -186,7 +188,8 @@ SELECT id, abort_increasing, abort_decreasing FROM abbrev_abort_uuids ORDER BY a
 ---------------------------------------------------------------------------------------
  Limit
    ->  Index Scan using abbrev_abort_uuids__abort_increasing_idx on abbrev_abort_uuids
-(2 rows)
+         Skip scan: All
+(3 rows)
 
 SELECT id, abort_increasing, abort_decreasing FROM abbrev_abort_uuids ORDER BY abort_increasing LIMIT 5;
   id   |           abort_increasing           |           abort_decreasing           
@@ -204,7 +207,8 @@ SELECT id, abort_increasing, abort_decreasing FROM abbrev_abort_uuids ORDER BY a
 ---------------------------------------------------------------------------------------
  Limit
    ->  Index Scan using abbrev_abort_uuids__abort_decreasing_idx on abbrev_abort_uuids
-(2 rows)
+         Skip scan: All
+(3 rows)
 
 SELECT id, abort_increasing, abort_decreasing FROM abbrev_abort_uuids ORDER BY abort_decreasing LIMIT 5;
   id   |           abort_increasing           |           abort_decreasing           
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 6e72e92d80..1739a87d46 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -360,7 +360,8 @@ select count(*) from
                            ->  Seq Scan on tenk1
                      ->  Subquery Scan on "*SELECT* 1"
                            ->  Index Only Scan using tenk1_unique1 on tenk1 tenk1_1
-(8 rows)
+                                 Skip scan: All
+(9 rows)
 
 select count(*) from
   ( select unique1 from tenk1 intersect select fivethous from tenk1 ) ss;
@@ -377,10 +378,12 @@ select unique1 from tenk1 except select unique2 from tenk1 where unique2 != 10;
    ->  Append
          ->  Subquery Scan on "*SELECT* 1"
                ->  Index Only Scan using tenk1_unique1 on tenk1
+                     Skip scan: All
          ->  Subquery Scan on "*SELECT* 2"
                ->  Index Only Scan using tenk1_unique2 on tenk1 tenk1_1
+                     Skip scan: All
                      Filter: (unique2 <> 10)
-(7 rows)
+(9 rows)
 
 select unique1 from tenk1 except select unique2 from tenk1 where unique2 != 10;
  unique1 
@@ -404,7 +407,8 @@ select count(*) from
                                  ->  Seq Scan on tenk1
                            ->  Subquery Scan on "*SELECT* 1"
                                  ->  Index Only Scan using tenk1_unique1 on tenk1 tenk1_1
-(10 rows)
+                                       Skip scan: All
+(11 rows)
 
 select count(*) from
   ( select unique1 from tenk1 intersect select fivethous from tenk1 ) ss;
@@ -423,10 +427,12 @@ select unique1 from tenk1 except select unique2 from tenk1 where unique2 != 10;
          ->  Append
                ->  Subquery Scan on "*SELECT* 1"
                      ->  Index Only Scan using tenk1_unique1 on tenk1
+                           Skip scan: All
                ->  Subquery Scan on "*SELECT* 2"
                      ->  Index Only Scan using tenk1_unique2 on tenk1 tenk1_1
+                           Skip scan: All
                            Filter: (unique2 <> 10)
-(9 rows)
+(11 rows)
 
 select unique1 from tenk1 except select unique2 from tenk1 where unique2 != 10;
  unique1 
@@ -711,10 +717,12 @@ explain (costs off)
 ---------------------------------------------
  Append
    ->  Index Scan using t1_ab_idx on t1
+         Skip scan: All
          Index Cond: ((a || b) = 'ab'::text)
    ->  Index Only Scan using t2_pkey on t2
+         Skip scan: All
          Index Cond: (ab = 'ab'::text)
-(5 rows)
+(7 rows)
 
 explain (costs off)
  SELECT * FROM
@@ -728,10 +736,12 @@ explain (costs off)
    Group Key: ((t1.a || t1.b))
    ->  Append
          ->  Index Scan using t1_ab_idx on t1
+               Skip scan: All
                Index Cond: ((a || b) = 'ab'::text)
          ->  Index Only Scan using t2_pkey on t2
+               Skip scan: All
                Index Cond: (ab = 'ab'::text)
-(7 rows)
+(9 rows)
 
 --
 -- Test that ORDER BY for UNION ALL can be pushed down to inheritance
@@ -757,10 +767,14 @@ explain (costs off)
    ->  Merge Append
          Sort Key: ((t1.a || t1.b))
          ->  Index Scan using t1_ab_idx on t1
+               Skip scan: All
          ->  Index Scan using t1c_ab_idx on t1c t1_1
+               Skip scan: All
          ->  Index Scan using t2_pkey on t2
+               Skip scan: All
          ->  Index Scan using t2c_pkey on t2c t2_1
-(7 rows)
+               Skip scan: All
+(11 rows)
 
   SELECT * FROM
   (SELECT a || b AS ab FROM t1
@@ -797,11 +811,13 @@ select event_id
  Merge Append
    Sort Key: events.event_id
    ->  Index Scan using events_pkey on events
+         Skip scan: All
    ->  Sort
          Sort Key: events_1.event_id
          ->  Seq Scan on events_child events_1
    ->  Index Scan using other_events_pkey on other_events
-(7 rows)
+         Skip scan: All
+(9 rows)
 
 drop table events_child, events, other_events;
 reset enable_indexonlyscan;
@@ -1006,10 +1022,12 @@ select * from
    ->  Seq Scan on int4_tbl
    ->  Append
          ->  Index Scan using t3i on t3 a
+               Skip scan: All
                Index Cond: (expensivefunc(x) = int4_tbl.f1)
          ->  Index Scan using t3i on t3 b
+               Skip scan: All
                Index Cond: (expensivefunc(x) = int4_tbl.f1)
-(7 rows)
+(9 rows)
 
 select * from
   (select * from t3 a union all select * from t3 b) ss
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 5de53f2782..3b0d9d42bf 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -422,16 +422,18 @@ EXPLAIN (costs off) UPDATE rw_view1 SET a=6 WHERE a=5;
 --------------------------------------------------
  Update on base_tbl
    ->  Index Scan using base_tbl_pkey on base_tbl
+         Skip scan: All
          Index Cond: ((a > 0) AND (a = 5))
-(3 rows)
+(4 rows)
 
 EXPLAIN (costs off) DELETE FROM rw_view1 WHERE a=5;
                     QUERY PLAN                    
 --------------------------------------------------
  Delete on base_tbl
    ->  Index Scan using base_tbl_pkey on base_tbl
+         Skip scan: All
          Index Cond: ((a > 0) AND (a = 5))
-(3 rows)
+(4 rows)
 
 DROP TABLE base_tbl CASCADE;
 NOTICE:  drop cascades to view rw_view1
@@ -492,16 +494,18 @@ EXPLAIN (costs off) UPDATE rw_view2 SET aaa=5 WHERE aaa=4;
 --------------------------------------------------------
  Update on base_tbl
    ->  Index Scan using base_tbl_pkey on base_tbl
+         Skip scan: All
          Index Cond: ((a < 10) AND (a > 0) AND (a = 4))
-(3 rows)
+(4 rows)
 
 EXPLAIN (costs off) DELETE FROM rw_view2 WHERE aaa=4;
                        QUERY PLAN                       
 --------------------------------------------------------
  Delete on base_tbl
    ->  Index Scan using base_tbl_pkey on base_tbl
+         Skip scan: All
          Index Cond: ((a < 10) AND (a > 0) AND (a = 4))
-(3 rows)
+(4 rows)
 
 DROP TABLE base_tbl CASCADE;
 NOTICE:  drop cascades to 2 other objects
@@ -685,14 +689,16 @@ EXPLAIN (costs off) UPDATE rw_view2 SET a=3 WHERE a=2;
  Update on base_tbl
    ->  Nested Loop
          ->  Index Scan using base_tbl_pkey on base_tbl
+               Skip scan: All
                Index Cond: (a = 2)
          ->  Subquery Scan on rw_view1
                Filter: ((rw_view1.a < 10) AND (rw_view1.a = 2))
                ->  Bitmap Heap Scan on base_tbl base_tbl_1
                      Recheck Cond: (a > 0)
                      ->  Bitmap Index Scan on base_tbl_pkey
+                           Skip scan: All
                            Index Cond: (a > 0)
-(10 rows)
+(12 rows)
 
 EXPLAIN (costs off) DELETE FROM rw_view2 WHERE a=2;
                            QUERY PLAN                           
@@ -700,14 +706,16 @@ EXPLAIN (costs off) DELETE FROM rw_view2 WHERE a=2;
  Delete on base_tbl
    ->  Nested Loop
          ->  Index Scan using base_tbl_pkey on base_tbl
+               Skip scan: All
                Index Cond: (a = 2)
          ->  Subquery Scan on rw_view1
                Filter: ((rw_view1.a < 10) AND (rw_view1.a = 2))
                ->  Bitmap Heap Scan on base_tbl base_tbl_1
                      Recheck Cond: (a > 0)
                      ->  Bitmap Index Scan on base_tbl_pkey
+                           Skip scan: All
                            Index Cond: (a > 0)
-(10 rows)
+(12 rows)
 
 DROP TABLE base_tbl CASCADE;
 NOTICE:  drop cascades to 2 other objects
@@ -919,8 +927,9 @@ EXPLAIN (costs off) UPDATE rw_view2 SET a=3 WHERE a=2;
          ->  Bitmap Heap Scan on base_tbl
                Recheck Cond: (a > 0)
                ->  Bitmap Index Scan on base_tbl_pkey
+                     Skip scan: All
                      Index Cond: (a > 0)
-(7 rows)
+(8 rows)
 
 EXPLAIN (costs off) DELETE FROM rw_view2 WHERE a=2;
                         QUERY PLAN                        
@@ -931,8 +940,9 @@ EXPLAIN (costs off) DELETE FROM rw_view2 WHERE a=2;
          ->  Bitmap Heap Scan on base_tbl
                Recheck Cond: (a > 0)
                ->  Bitmap Index Scan on base_tbl_pkey
+                     Skip scan: All
                      Index Cond: (a > 0)
-(7 rows)
+(8 rows)
 
 DROP TABLE base_tbl CASCADE;
 NOTICE:  drop cascades to 2 other objects
@@ -969,8 +979,9 @@ UPDATE rw_view1 v SET bb='Updated row 2' WHERE rw_view1_aa(v)=2
 --------------------------------------------------
  Update on base_tbl
    ->  Index Scan using base_tbl_pkey on base_tbl
+         Skip scan: All
          Index Cond: (a = 2)
-(3 rows)
+(4 rows)
 
 DROP TABLE base_tbl CASCADE;
 NOTICE:  drop cascades to 2 other objects
@@ -1868,10 +1879,11 @@ EXPLAIN (costs off) INSERT INTO rw_view1 VALUES (5);
    ->  Result
    SubPlan 1
      ->  Index Only Scan using ref_tbl_pkey on ref_tbl r
+           Skip scan: All
            Index Cond: (a = b.a)
    SubPlan 2
      ->  Seq Scan on ref_tbl r_1
-(7 rows)
+(8 rows)
 
 EXPLAIN (costs off) UPDATE rw_view1 SET a = a + 5;
                         QUERY PLAN                         
@@ -1884,10 +1896,11 @@ EXPLAIN (costs off) UPDATE rw_view1 SET a = a + 5;
                ->  Seq Scan on ref_tbl r
    SubPlan 1
      ->  Index Only Scan using ref_tbl_pkey on ref_tbl r_1
+           Skip scan: All
            Index Cond: (a = b.a)
    SubPlan 2
      ->  Seq Scan on ref_tbl r_2
-(11 rows)
+(12 rows)
 
 DROP TABLE base_tbl, ref_tbl CASCADE;
 NOTICE:  drop cascades to view rw_view1
@@ -2219,11 +2232,13 @@ EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
  Update on base_tbl base_tbl_1
    ->  Nested Loop
          ->  Index Scan using base_tbl_pkey on base_tbl base_tbl_1
+               Skip scan: All
                Index Cond: (id = 1)
          ->  Index Scan using base_tbl_pkey on base_tbl
+               Skip scan: All
                Index Cond: (id = 1)
                Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+(9 rows)
 
 DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
 NOTICE:  snooped value: Row 1
@@ -2233,6 +2248,7 @@ EXPLAIN (costs off) INSERT INTO rw_view1 VALUES (2, 'New row 2');
  Insert on base_tbl
    InitPlan 1 (returns $0)
      ->  Index Only Scan using base_tbl_pkey on base_tbl t
+           Skip scan: All
            Index Cond: (id = 2)
    ->  Result
          One-Time Filter: ($0 IS NOT TRUE)
@@ -2240,12 +2256,14 @@ EXPLAIN (costs off) INSERT INTO rw_view1 VALUES (2, 'New row 2');
  Update on base_tbl
    InitPlan 1 (returns $0)
      ->  Index Only Scan using base_tbl_pkey on base_tbl t
+           Skip scan: All
            Index Cond: (id = 2)
    ->  Result
          One-Time Filter: $0
          ->  Index Scan using base_tbl_pkey on base_tbl
+               Skip scan: All
                Index Cond: (id = 2)
-(15 rows)
+(18 rows)
 
 INSERT INTO rw_view1 VALUES (2, 'New row 2');
 SELECT * FROM base_tbl;
@@ -2310,6 +2328,7 @@ UPDATE v1 SET a=100 WHERE snoop(a) AND leakproof(a) AND a < 7 AND a != 6;
    Update on public.t111 t1_3
    ->  Index Scan using t1_a_idx on public.t1
          Output: 100, t1.b, t1.c, t1.ctid
+         Skip scan: All
          Index Cond: ((t1.a > 5) AND (t1.a < 7))
          Filter: ((t1.a <> 6) AND (alternatives: SubPlan 1 or hashed SubPlan 2) AND snoop(t1.a) AND leakproof(t1.a))
          SubPlan 1
@@ -2326,17 +2345,20 @@ UPDATE v1 SET a=100 WHERE snoop(a) AND leakproof(a) AND a < 7 AND a != 6;
                        Output: t12_5.a
    ->  Index Scan using t11_a_idx on public.t11 t1_1
          Output: 100, t1_1.b, t1_1.c, t1_1.d, t1_1.ctid
+         Skip scan: All
          Index Cond: ((t1_1.a > 5) AND (t1_1.a < 7))
          Filter: ((t1_1.a <> 6) AND (alternatives: SubPlan 1 or hashed SubPlan 2) AND snoop(t1_1.a) AND leakproof(t1_1.a))
    ->  Index Scan using t12_a_idx on public.t12 t1_2
          Output: 100, t1_2.b, t1_2.c, t1_2.e, t1_2.ctid
+         Skip scan: All
          Index Cond: ((t1_2.a > 5) AND (t1_2.a < 7))
          Filter: ((t1_2.a <> 6) AND (alternatives: SubPlan 1 or hashed SubPlan 2) AND snoop(t1_2.a) AND leakproof(t1_2.a))
    ->  Index Scan using t111_a_idx on public.t111 t1_3
          Output: 100, t1_3.b, t1_3.c, t1_3.d, t1_3.e, t1_3.ctid
+         Skip scan: All
          Index Cond: ((t1_3.a > 5) AND (t1_3.a < 7))
          Filter: ((t1_3.a <> 6) AND (alternatives: SubPlan 1 or hashed SubPlan 2) AND snoop(t1_3.a) AND leakproof(t1_3.a))
-(33 rows)
+(37 rows)
 
 UPDATE v1 SET a=100 WHERE snoop(a) AND leakproof(a) AND a < 7 AND a != 6;
 SELECT * FROM v1 WHERE a=100; -- Nothing should have been changed to 100
@@ -2360,6 +2382,7 @@ UPDATE v1 SET a=a+1 WHERE snoop(a) AND leakproof(a) AND a = 8;
    Update on public.t111 t1_3
    ->  Index Scan using t1_a_idx on public.t1
          Output: (t1.a + 1), t1.b, t1.c, t1.ctid
+         Skip scan: All
          Index Cond: ((t1.a > 5) AND (t1.a = 8))
          Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) AND snoop(t1.a) AND leakproof(t1.a))
          SubPlan 1
@@ -2376,17 +2399,20 @@ UPDATE v1 SET a=a+1 WHERE snoop(a) AND leakproof(a) AND a = 8;
                        Output: t12_5.a
    ->  Index Scan using t11_a_idx on public.t11 t1_1
          Output: (t1_1.a + 1), t1_1.b, t1_1.c, t1_1.d, t1_1.ctid
+         Skip scan: All
          Index Cond: ((t1_1.a > 5) AND (t1_1.a = 8))
          Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) AND snoop(t1_1.a) AND leakproof(t1_1.a))
    ->  Index Scan using t12_a_idx on public.t12 t1_2
          Output: (t1_2.a + 1), t1_2.b, t1_2.c, t1_2.e, t1_2.ctid
+         Skip scan: All
          Index Cond: ((t1_2.a > 5) AND (t1_2.a = 8))
          Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) AND snoop(t1_2.a) AND leakproof(t1_2.a))
    ->  Index Scan using t111_a_idx on public.t111 t1_3
          Output: (t1_3.a + 1), t1_3.b, t1_3.c, t1_3.d, t1_3.e, t1_3.ctid
+         Skip scan: All
          Index Cond: ((t1_3.a > 5) AND (t1_3.a = 8))
          Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) AND snoop(t1_3.a) AND leakproof(t1_3.a))
-(33 rows)
+(37 rows)
 
 UPDATE v1 SET a=a+1 WHERE snoop(a) AND leakproof(a) AND a = 8;
 NOTICE:  snooped value: 8
-- 
2.25.0

#55

Floris Van Nee

florisvannee@Optiver.com

almost 6 years ago

In reply to: Floris Van Nee (#54)

3 attachment(s)

It seems that the documentation build was broken. I've fixed it in attached patch.

I'm unsure which version number to give this patch (to continue with numbers from previous skip scan patches, or to start numbering from scratch again). It's a rather big change, so one could argue it's mostly a separate patch. I guess it mostly depends on how close the original versions were to be committable. Thoughts?

-Floris

Attachments:

v33-0001-Unique-key.patchapplication/octet-stream; name=v33-0001-Unique-key.patchDownload

From fd48c4a0067c1c96a2b53fd162bbe9456a9608dd Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Tue, 9 Jul 2019 06:44:57 -0400
Subject: [PATCH 1/3] Unique key

Design by David Rowley.

Author: Jesper Pedersen
---
 src/backend/nodes/outfuncs.c           |  14 +++
 src/backend/nodes/print.c              |  39 +++++++
 src/backend/optimizer/path/Makefile    |   3 +-
 src/backend/optimizer/path/allpaths.c  |   8 ++
 src/backend/optimizer/path/indxpath.c  |  41 +++++++
 src/backend/optimizer/path/pathkeys.c  |  71 ++++++++++--
 src/backend/optimizer/path/uniquekey.c | 147 +++++++++++++++++++++++++
 src/backend/optimizer/plan/planagg.c   |   1 +
 src/backend/optimizer/plan/planmain.c  |   1 +
 src/backend/optimizer/plan/planner.c   |  17 ++-
 src/backend/optimizer/util/pathnode.c  |  12 ++
 src/include/nodes/nodes.h              |   1 +
 src/include/nodes/pathnodes.h          |  18 +++
 src/include/nodes/print.h              |   1 +
 src/include/optimizer/pathnode.h       |   1 +
 src/include/optimizer/paths.h          |  11 ++
 16 files changed, 373 insertions(+), 13 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekey.c

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 89d00444ed..82fcabd9ee 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1724,6 +1724,7 @@ _outPathInfo(StringInfo str, const Path *node)
 	WRITE_FLOAT_FIELD(startup_cost, "%.2f");
 	WRITE_FLOAT_FIELD(total_cost, "%.2f");
 	WRITE_NODE_FIELD(pathkeys);
+	WRITE_NODE_FIELD(uniquekeys);
 }
 
 /*
@@ -2208,6 +2209,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(eq_classes);
 	WRITE_BOOL_FIELD(ec_merging_done);
 	WRITE_NODE_FIELD(canon_pathkeys);
+	WRITE_NODE_FIELD(canon_uniquekeys);
 	WRITE_NODE_FIELD(left_join_clauses);
 	WRITE_NODE_FIELD(right_join_clauses);
 	WRITE_NODE_FIELD(full_join_clauses);
@@ -2217,6 +2219,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(placeholder_list);
 	WRITE_NODE_FIELD(fkey_list);
 	WRITE_NODE_FIELD(query_pathkeys);
+	WRITE_NODE_FIELD(query_uniquekeys);
 	WRITE_NODE_FIELD(group_pathkeys);
 	WRITE_NODE_FIELD(window_pathkeys);
 	WRITE_NODE_FIELD(distinct_pathkeys);
@@ -2404,6 +2407,14 @@ _outPathKey(StringInfo str, const PathKey *node)
 	WRITE_BOOL_FIELD(pk_nulls_first);
 }
 
+static void
+_outUniqueKey(StringInfo str, const UniqueKey *node)
+{
+	WRITE_NODE_TYPE("UNIQUEKEY");
+
+	WRITE_NODE_FIELD(eq_clause);
+}
+
 static void
 _outPathTarget(StringInfo str, const PathTarget *node)
 {
@@ -4097,6 +4108,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PathKey:
 				_outPathKey(str, obj);
 				break;
+			case T_UniqueKey:
+				_outUniqueKey(str, obj);
+				break;
 			case T_PathTarget:
 				_outPathTarget(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 42476724d8..d286b34544 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -459,6 +459,45 @@ print_pathkeys(const List *pathkeys, const List *rtable)
 	printf(")\n");
 }
 
+/*
+ * print_uniquekeys -
+ *	  uniquekeys list of UniqueKeys
+ */
+void
+print_uniquekeys(const List *uniquekeys, const List *rtable)
+{
+	ListCell   *l;
+
+	printf("(");
+	foreach(l, uniquekeys)
+	{
+		UniqueKey *unique_key = (UniqueKey *) lfirst(l);
+		EquivalenceClass *eclass = (EquivalenceClass *) unique_key->eq_clause;
+		ListCell   *k;
+		bool		first = true;
+
+		/* chase up */
+		while (eclass->ec_merged)
+			eclass = eclass->ec_merged;
+
+		printf("(");
+		foreach(k, eclass->ec_members)
+		{
+			EquivalenceMember *mem = (EquivalenceMember *) lfirst(k);
+
+			if (first)
+				first = false;
+			else
+				printf(", ");
+			print_expr((Node *) mem->em_expr, rtable);
+		}
+		printf(")");
+		if (lnext(uniquekeys, l))
+			printf(", ");
+	}
+	printf(")\n");
+}
+
 /*
  * print_tl
  *	  print targetlist in a more legible way.
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..63cc1505d9 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekey.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 905bbe77d8..e98ab4eada 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3954,6 +3954,14 @@ print_path(PlannerInfo *root, Path *path, int indent)
 		print_pathkeys(path->pathkeys, root->parse->rtable);
 	}
 
+	if (path->uniquekeys)
+	{
+		for (i = 0; i < indent; i++)
+			printf("\t");
+		printf("  uniquekeys: ");
+		print_uniquekeys(path->uniquekeys, root->parse->rtable);
+	}
+
 	if (join)
 	{
 		JoinPath   *jp = (JoinPath *) path;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2a50272da6..bd1ea53e5c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -189,6 +189,7 @@ static Expr *match_clause_to_ordering_op(IndexOptInfo *index,
 static bool ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 									   EquivalenceClass *ec, EquivalenceMember *em,
 									   void *arg);
+static List *get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys);
 
 
 /*
@@ -874,6 +875,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	List	   *orderbyclausecols;
 	List	   *index_pathkeys;
 	List	   *useful_pathkeys;
+	List	   *useful_uniquekeys = NIL;
 	bool		found_lower_saop_clause;
 	bool		pathkeys_possibly_useful;
 	bool		index_is_ordered;
@@ -1036,11 +1038,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	if (index_clauses != NIL || useful_pathkeys != NIL || useful_predicate ||
 		index_only_scan)
 	{
+		if (has_useful_uniquekeys(root))
+			useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 		ipath = create_index_path(root, index,
 								  index_clauses,
 								  orderbyclauses,
 								  orderbyclausecols,
 								  useful_pathkeys,
+								  useful_uniquekeys,
 								  index_is_ordered ?
 								  ForwardScanDirection :
 								  NoMovementScanDirection,
@@ -1063,6 +1069,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 									  orderbyclauses,
 									  orderbyclausecols,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  index_is_ordered ?
 									  ForwardScanDirection :
 									  NoMovementScanDirection,
@@ -1093,11 +1100,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 													index_pathkeys);
 		if (useful_pathkeys != NIL)
 		{
+			if (has_useful_uniquekeys(root))
+				useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 			ipath = create_index_path(root, index,
 									  index_clauses,
 									  NIL,
 									  NIL,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  BackwardScanDirection,
 									  index_only_scan,
 									  outer_relids,
@@ -1115,6 +1126,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 										  NIL,
 										  NIL,
 										  useful_pathkeys,
+										  useful_uniquekeys,
 										  BackwardScanDirection,
 										  index_only_scan,
 										  outer_relids,
@@ -3365,6 +3377,35 @@ match_clause_to_ordering_op(IndexOptInfo *index,
 	return clause;
 }
 
+/*
+ * get_uniquekeys_for_index
+ */
+static List *
+get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys)
+{
+	ListCell *lc;
+
+	if (pathkeys)
+	{
+		List *uniquekeys = NIL;
+		foreach(lc, pathkeys)
+		{
+			UniqueKey *unique_key;
+			PathKey *pk = (PathKey *) lfirst(lc);
+			EquivalenceClass *ec = (EquivalenceClass *) pk->pk_eclass;
+
+			unique_key = makeNode(UniqueKey);
+			unique_key->eq_clause = ec;
+
+			lappend(uniquekeys, unique_key);
+		}
+
+		if (uniquekeys_contained_in(root->canon_uniquekeys, uniquekeys))
+			return uniquekeys;
+	}
+
+	return NIL;
+}
 
 /****************************************************************************
  *				----  ROUTINES TO DO PARTIAL INDEX PREDICATE TESTS	----
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 71b9d42c99..054df9a617 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -29,6 +29,7 @@
 #include "utils/lsyscache.h"
 
 
+static bool pathkey_is_unique(PathKey *new_pathkey, List *pathkeys);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
 											 RelOptInfo *partrel,
@@ -96,6 +97,29 @@ make_canonical_pathkey(PlannerInfo *root,
 	return pk;
 }
 
+/*
+ * pathkey_is_unique
+ *	   Checks if the new pathkey's equivalence class is the same as that of
+ *     any existing member of the pathkey list.
+ */
+static bool
+pathkey_is_unique(PathKey *new_pathkey, List *pathkeys)
+{
+	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
+	ListCell   *lc;
+
+	/* If same EC already is already in the list, then not unique */
+	foreach(lc, pathkeys)
+	{
+		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
+
+		if (new_ec == old_pathkey->pk_eclass)
+			return false;
+	}
+
+	return true;
+}
+
 /*
  * pathkey_is_redundant
  *	   Is a pathkey redundant with one already in the given list?
@@ -135,22 +159,12 @@ static bool
 pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys)
 {
 	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
-	ListCell   *lc;
 
 	/* Check for EC containing a constant --- unconditionally redundant */
 	if (EC_MUST_BE_REDUNDANT(new_ec))
 		return true;
 
-	/* If same EC already used in list, then redundant */
-	foreach(lc, pathkeys)
-	{
-		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
-
-		if (new_ec == old_pathkey->pk_eclass)
-			return true;
-	}
-
-	return false;
+	return !pathkey_is_unique(new_pathkey, pathkeys);
 }
 
 /*
@@ -1098,6 +1112,41 @@ make_pathkeys_for_sortclauses(PlannerInfo *root,
 	return pathkeys;
 }
 
+/*
+ * make_pathkeys_for_uniquekeyclauses
+ *		Generate a pathkeys list to be used for uniquekey clauses
+ */
+List *
+make_pathkeys_for_uniquekeys(PlannerInfo *root,
+							 List *sortclauses,
+							 List *tlist)
+{
+	List	   *pathkeys = NIL;
+	ListCell   *l;
+
+	foreach(l, sortclauses)
+	{
+		SortGroupClause *sortcl = (SortGroupClause *) lfirst(l);
+		Expr	   *sortkey;
+		PathKey    *pathkey;
+
+		sortkey = (Expr *) get_sortgroupclause_expr(sortcl, tlist);
+		Assert(OidIsValid(sortcl->sortop));
+		pathkey = make_pathkey_from_sortop(root,
+										   sortkey,
+										   root->nullable_baserels,
+										   sortcl->sortop,
+										   sortcl->nulls_first,
+										   sortcl->tleSortGroupRef,
+										   true);
+
+		if (pathkey_is_unique(pathkey, pathkeys))
+			pathkeys = lappend(pathkeys, pathkey);
+	}
+
+	return pathkeys;
+}
+
 /****************************************************************************
  *		PATHKEYS AND MERGECLAUSES
  ****************************************************************************/
diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c
new file mode 100644
index 0000000000..13d4ebb98c
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekey.c
@@ -0,0 +1,147 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekey.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekey.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "nodes/pg_list.h"
+
+static UniqueKey *make_canonical_uniquekey(PlannerInfo *root, EquivalenceClass *eclass);
+
+/*
+ * Build a list of unique keys
+ */
+List*
+build_uniquekeys(PlannerInfo *root, List *sortclauses)
+{
+	List *result = NIL;
+	List *sortkeys;
+	ListCell *l;
+
+	sortkeys = make_pathkeys_for_uniquekeys(root,
+											sortclauses,
+											root->processed_tlist);
+
+	/* Create a uniquekey and add it to the list */
+	foreach(l, sortkeys)
+	{
+		PathKey    *pathkey = (PathKey *) lfirst(l);
+		EquivalenceClass *ec = pathkey->pk_eclass;
+		UniqueKey *unique_key = make_canonical_uniquekey(root, ec);
+
+		result = lappend(result, unique_key);
+	}
+
+	return result;
+}
+
+/*
+ * uniquekeys_contained_in
+ *	  Are the keys2 included in the keys1 superset
+ */
+bool
+uniquekeys_contained_in(List *keys1, List *keys2)
+{
+	ListCell   *key1,
+			   *key2;
+
+	/*
+	 * Fall out quickly if we are passed two identical lists.  This mostly
+	 * catches the case where both are NIL, but that's common enough to
+	 * warrant the test.
+	 */
+	if (keys1 == keys2)
+		return true;
+
+	foreach(key2, keys2)
+	{
+		bool found = false;
+		UniqueKey  *uniquekey2 = (UniqueKey *) lfirst(key2);
+
+		foreach(key1, keys1)
+		{
+			UniqueKey  *uniquekey1 = (UniqueKey *) lfirst(key1);
+
+			if (uniquekey1->eq_clause == uniquekey2->eq_clause)
+			{
+				found = true;
+				break;
+			}
+		}
+
+		if (!found)
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * has_useful_uniquekeys
+ *		Detect whether the planner could have any uniquekeys that are
+ *		useful.
+ */
+bool
+has_useful_uniquekeys(PlannerInfo *root)
+{
+	if (root->query_uniquekeys != NIL)
+		return true;	/* there are some */
+	return false;		/* definitely useless */
+}
+
+/*
+ * make_canonical_uniquekey
+ *	  Given the parameters for a UniqueKey, find any pre-existing matching
+ *	  uniquekey in the query's list of "canonical" uniquekeys.  Make a new
+ *	  entry if there's not one already.
+ *
+ * Note that this function must not be used until after we have completed
+ * merging EquivalenceClasses.  (We don't try to enforce that here; instead,
+ * equivclass.c will complain if a merge occurs after root->canon_uniquekeys
+ * has become nonempty.)
+ */
+static UniqueKey *
+make_canonical_uniquekey(PlannerInfo *root,
+						 EquivalenceClass *eclass)
+{
+	UniqueKey  *uk;
+	ListCell   *lc;
+	MemoryContext oldcontext;
+
+	/* The passed eclass might be non-canonical, so chase up to the top */
+	while (eclass->ec_merged)
+		eclass = eclass->ec_merged;
+
+	foreach(lc, root->canon_uniquekeys)
+	{
+		uk = (UniqueKey *) lfirst(lc);
+		if (eclass == uk->eq_clause)
+			return uk;
+	}
+
+	/*
+	 * Be sure canonical uniquekeys are allocated in the main planning context.
+	 * Not an issue in normal planning, but it is for GEQO.
+	 */
+	oldcontext = MemoryContextSwitchTo(root->planner_cxt);
+
+	uk = makeNode(UniqueKey);
+	uk->eq_clause = eclass;
+
+	root->canon_uniquekeys = lappend(root->canon_uniquekeys, uk);
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return uk;
+}
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 8634940efc..dd64775d8f 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -511,6 +511,7 @@ minmax_qp_callback(PlannerInfo *root, void *extra)
 									  root->parse->targetList);
 
 	root->query_pathkeys = root->sort_pathkeys;
+	root->query_uniquekeys = NIL;
 }
 
 /*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..3a372af91b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -70,6 +70,7 @@ query_planner(PlannerInfo *root,
 	root->join_rel_level = NULL;
 	root->join_cur_level = 0;
 	root->canon_pathkeys = NIL;
+	root->canon_uniquekeys = NIL;
 	root->left_join_clauses = NIL;
 	root->right_join_clauses = NIL;
 	root->full_join_clauses = NIL;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5da0528382..6a7b55abd2 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3654,15 +3654,30 @@ standard_qp_callback(PlannerInfo *root, void *extra)
 	 * much easier, since we know that the parser ensured that one is a
 	 * superset of the other.
 	 */
+	root->query_uniquekeys = NIL;
+
 	if (root->group_pathkeys)
+	{
 		root->query_pathkeys = root->group_pathkeys;
+
+		if (!root->parse->hasAggs)
+			root->query_uniquekeys = build_uniquekeys(root, qp_extra->groupClause);
+	}
 	else if (root->window_pathkeys)
 		root->query_pathkeys = root->window_pathkeys;
 	else if (list_length(root->distinct_pathkeys) >
 			 list_length(root->sort_pathkeys))
+	{
 		root->query_pathkeys = root->distinct_pathkeys;
+		root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else if (root->sort_pathkeys)
+	{
 		root->query_pathkeys = root->sort_pathkeys;
+
+		if (root->distinct_pathkeys)
+			root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else
 		root->query_pathkeys = NIL;
 }
@@ -6215,7 +6230,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
 
 	/* Estimate the cost of index scan */
 	indexScanPath = create_index_path(root, indexInfo,
-									  NIL, NIL, NIL, NIL,
+									  NIL, NIL, NIL, NIL, NIL,
 									  ForwardScanDirection, false,
 									  NULL, 1.0, false);
 
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 8ba8122ee2..278436f102 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -940,6 +940,7 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = parallel_workers;
 	pathnode->pathkeys = NIL;	/* seqscan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_seqscan(pathnode, root, rel, pathnode->param_info);
 
@@ -964,6 +965,7 @@ create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* samplescan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_samplescan(pathnode, root, rel, pathnode->param_info);
 
@@ -1000,6 +1002,7 @@ create_index_path(PlannerInfo *root,
 				  List *indexorderbys,
 				  List *indexorderbycols,
 				  List *pathkeys,
+				  List *uniquekeys,
 				  ScanDirection indexscandir,
 				  bool indexonly,
 				  Relids required_outer,
@@ -1018,6 +1021,7 @@ create_index_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = 0;
 	pathnode->path.pathkeys = pathkeys;
+	pathnode->path.uniquekeys = uniquekeys;
 
 	pathnode->indexinfo = index;
 	pathnode->indexclauses = indexclauses;
@@ -1061,6 +1065,7 @@ create_bitmap_heap_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_degree;
 	pathnode->path.pathkeys = NIL;	/* always unordered */
+	pathnode->path.uniquekeys = NIL;
 
 	pathnode->bitmapqual = bitmapqual;
 
@@ -1923,6 +1928,7 @@ create_functionscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = pathkeys;
+	pathnode->uniquekeys = NIL;
 
 	cost_functionscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1949,6 +1955,7 @@ create_tablefuncscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_tablefuncscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1975,6 +1982,7 @@ create_valuesscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_valuesscan(pathnode, root, rel, pathnode->param_info);
 
@@ -2000,6 +2008,7 @@ create_ctescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* XXX for now, result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2026,6 +2035,7 @@ create_namedtuplestorescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_namedtuplestorescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2052,6 +2062,7 @@ create_resultscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_resultscan(pathnode, root, rel, pathnode->param_info);
 
@@ -2078,6 +2089,7 @@ create_worktablescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	/* Cost is the same as for a regular CTE scan */
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 8a76afe8cc..679cc4cc9c 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ceb809644..d4816c180d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -269,6 +269,8 @@ struct PlannerInfo
 
 	List	   *canon_pathkeys; /* list of "canonical" PathKeys */
 
+	List	   *canon_uniquekeys; /* list of "canonical" UniqueKeys */
+
 	List	   *left_join_clauses;	/* list of RestrictInfos for mergejoinable
 									 * outer join clauses w/nonnullable var on
 									 * left */
@@ -297,6 +299,8 @@ struct PlannerInfo
 
 	List	   *query_pathkeys; /* desired pathkeys for query_planner() */
 
+	List	   *query_uniquekeys; /* unique keys used for the query */
+
 	List	   *group_pathkeys; /* groupClause pathkeys, if any */
 	List	   *window_pathkeys;	/* pathkeys of bottom window, if any */
 	List	   *distinct_pathkeys;	/* distinctClause pathkeys, if any */
@@ -1077,6 +1081,15 @@ typedef struct ParamPathInfo
 	List	   *ppi_clauses;	/* join clauses available from outer rels */
 } ParamPathInfo;
 
+/*
+ * UniqueKey
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+
+	EquivalenceClass *eq_clause;	/* equivalence class */
+} UniqueKey;
 
 /*
  * Type "Path" is used as-is for sequential-scan paths, as well as some other
@@ -1106,6 +1119,9 @@ typedef struct ParamPathInfo
  *
  * "pathkeys" is a List of PathKey nodes (see above), describing the sort
  * ordering of the path's output rows.
+ *
+ * "uniquekeys", if not NIL, is a list of UniqueKey nodes (see above),
+ * describing the XXX.
  */
 typedef struct Path
 {
@@ -1129,6 +1145,8 @@ typedef struct Path
 
 	List	   *pathkeys;		/* sort ordering of path's output */
 	/* pathkeys is a List of PathKey nodes; see above */
+
+	List	   *uniquekeys;	/* the unique keys, or NIL if none */
 } Path;
 
 /* Macro for extracting a path's parameterization relids; beware double eval */
diff --git a/src/include/nodes/print.h b/src/include/nodes/print.h
index 6126b491bf..006248bfb5 100644
--- a/src/include/nodes/print.h
+++ b/src/include/nodes/print.h
@@ -28,6 +28,7 @@ extern char *pretty_format_node_dump(const char *dump);
 extern void print_rt(const List *rtable);
 extern void print_expr(const Node *expr, const List *rtable);
 extern void print_pathkeys(const List *pathkeys, const List *rtable);
+extern void print_uniquekeys(const List *uniquekeys, const List *rtable);
 extern void print_tl(const List *tlist, const List *rtable);
 extern void print_slot(TupleTableSlot *slot);
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e450fe112a..f75ff6f323 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -44,6 +44,7 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									List *indexorderbys,
 									List *indexorderbycols,
 									List *pathkeys,
+									List *uniquekeys,
 									ScanDirection indexscandir,
 									bool indexonly,
 									Relids required_outer,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ab73bd20c..5b6be383b3 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -214,6 +214,9 @@ extern List *build_join_pathkeys(PlannerInfo *root,
 extern List *make_pathkeys_for_sortclauses(PlannerInfo *root,
 										   List *sortclauses,
 										   List *tlist);
+extern List *make_pathkeys_for_uniquekeys(PlannerInfo *root,
+										  List *sortclauses,
+										  List *tlist);
 extern void initialize_mergeclause_eclasses(PlannerInfo *root,
 											RestrictInfo *restrictinfo);
 extern void update_mergeclause_eclasses(PlannerInfo *root,
@@ -240,4 +243,12 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+/*
+ * uniquekey.c
+ *	  Utilities for matching and building unique keys
+ */
+extern List *build_uniquekeys(PlannerInfo *root, List *sortclauses);
+extern bool uniquekeys_contained_in(List *keys1, List *keys2);
+extern bool has_useful_uniquekeys(PlannerInfo *root);
+
 #endif							/* PATHS_H */
-- 
2.25.0

v33-0002-Index-skip-scan.patchapplication/octet-stream; name=v33-0002-Index-skip-scan.patchDownload

From b2927aaeb4dac21ccaec356d2d4ce1ba2ced90b3 Mon Sep 17 00:00:00 2001
From: Floris van Nee <floris.vannee@gmail.com>
Date: Fri, 15 Nov 2019 09:46:53 -0500
Subject: [PATCH 2/3] Index skip scan

Implementation of Index Skip Scan (see Loose Index Scan in the wiki [1])
as part of the IndexOnlyScan, IndexScan and BitmapIndexScan for nbtree.
This patch improves performance of two main types of queries significantly:
- SELECT DISTINCT, SELECT DISTINCT ON
- Regular SELECTs with WHERE-clauses on non-leading index attributes
For example, given an nbtree index on three columns (a,b,c), the following queries
may now be significantly faster:
- SELECT DISTINCT ON (a) * FROM t1
- SELECT * FROM t1 WHERE b=2
- SELECT * FROM t1 WHERE b IN (10,40)
- SELECT DISTINCT ON (a,b) * FROM t1 WHERE c BETWEEN 10 AND 100 ORDER BY a,b,c

Original patch and design were proposed by Thomas Munro [2], revived and
improved by Dmitry Dolgov and Jesper Pedersen. Further enhanced functionality
added by Floris van Nee regarding a more general and performant skip implementation.

[1] https://wiki.postgresql.org/wiki/Loose_indexscan
[2] https://www.postgresql.org/message-id/flat/CADLWmXXbTSBxP-MzJuPAYSsL_2f0iPm5VWPbCvDbVvfX93FKkw%40mail.gmail.com

Author: Floris van Nee, Jesper Pedersen, Dmitry Dolgov
Reviewed-by: Thomas Munro, David Rowley, Kyotaro Horiguchi, Tomas Vondra, Peter Geoghegan
---
 contrib/amcheck/verify_nbtree.c               |    4 +-
 contrib/bloom/blutils.c                       |    3 +
 doc/src/sgml/config.sgml                      |   15 +
 doc/src/sgml/indexam.sgml                     |  121 +-
 doc/src/sgml/indices.sgml                     |   28 +
 src/backend/access/brin/brin.c                |    3 +
 src/backend/access/gin/ginutil.c              |    3 +
 src/backend/access/gist/gist.c                |    3 +
 src/backend/access/hash/hash.c                |    3 +
 src/backend/access/index/indexam.c            |  163 ++
 src/backend/access/nbtree/Makefile            |    1 +
 src/backend/access/nbtree/nbtinsert.c         |    2 +-
 src/backend/access/nbtree/nbtpage.c           |    2 +-
 src/backend/access/nbtree/nbtree.c            |   56 +-
 src/backend/access/nbtree/nbtsearch.c         |  788 ++++------
 src/backend/access/nbtree/nbtskip.c           | 1317 +++++++++++++++++
 src/backend/access/nbtree/nbtsort.c           |    2 +-
 src/backend/access/nbtree/nbtutils.c          |  821 +++++++++-
 src/backend/access/spgist/spgutils.c          |    3 +
 src/backend/commands/explain.c                |   29 +
 src/backend/executor/execScan.c               |   35 +-
 src/backend/executor/nodeBitmapIndexscan.c    |   21 +-
 src/backend/executor/nodeIndexonlyscan.c      |   69 +-
 src/backend/executor/nodeIndexscan.c          |   71 +-
 src/backend/nodes/copyfuncs.c                 |    5 +
 src/backend/nodes/outfuncs.c                  |    6 +
 src/backend/nodes/readfuncs.c                 |    5 +
 src/backend/optimizer/path/costsize.c         |    1 +
 src/backend/optimizer/plan/createplan.c       |   38 +-
 src/backend/optimizer/plan/planner.c          |   64 +
 src/backend/optimizer/util/pathnode.c         |   40 +
 src/backend/optimizer/util/plancat.c          |    3 +
 src/backend/utils/misc/guc.c                  |    9 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/backend/utils/sort/tuplesort.c            |    4 +-
 src/include/access/amapi.h                    |   19 +
 src/include/access/genam.h                    |   16 +
 src/include/access/nbtree.h                   |  140 +-
 src/include/executor/executor.h               |    4 +
 src/include/nodes/execnodes.h                 |    7 +
 src/include/nodes/pathnodes.h                 |    6 +
 src/include/nodes/plannodes.h                 |    5 +
 src/include/optimizer/cost.h                  |    1 +
 src/include/optimizer/pathnode.h              |    5 +
 src/interfaces/libpq/encnames.c               |    1 +
 src/interfaces/libpq/wchar.c                  |    1 +
 src/test/regress/expected/select_distinct.out |  601 ++++++++
 src/test/regress/expected/sysviews.out        |    3 +-
 src/test/regress/sql/select_distinct.sql      |  248 ++++
 49 files changed, 4253 insertions(+), 543 deletions(-)
 create mode 100644 src/backend/access/nbtree/nbtskip.c
 create mode 120000 src/interfaces/libpq/encnames.c
 create mode 120000 src/interfaces/libpq/wchar.c

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index ceaaa27168..553965beba 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -2504,7 +2504,7 @@ bt_rootdescend(BtreeCheckState *state, IndexTuple itup)
 	Buffer		lbuf;
 	bool		exists;
 
-	key = _bt_mkscankey(state->rel, itup);
+	key = _bt_mkscankey(state->rel, itup, NULL);
 	Assert(key->heapkeyspace && key->scantid != NULL);
 
 	/*
@@ -2936,7 +2936,7 @@ bt_mkscankey_pivotsearch(Relation rel, IndexTuple itup)
 {
 	BTScanInsert skey;
 
-	skey = _bt_mkscankey(rel, itup);
+	skey = _bt_mkscankey(rel, itup, NULL);
 	skey->pivotsearch = true;
 
 	return skey;
diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 0104d02f67..f7bdfc959a 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -133,6 +133,9 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = blbulkdelete;
 	amroutine->amvacuumcleanup = blvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
+	amroutine->ambeginskipscan = NULL;
+	amroutine->amgetskiptuple = NULL;
 	amroutine->amcostestimate = blcostestimate;
 	amroutine->amoptions = bloptions;
 	amroutine->amproperty = NULL;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 9cc5281f01..7624110ea4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4597,6 +4597,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-indexskipscan" xreflabel="enable_indexskipscan">
+      <term><varname>enable_indexskipscan</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_indexskipscan</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of index-skip-scan plan
+        types (see <xref linkend="indexes-index-skip-scans"/>). The default is
+        <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-material" xreflabel="enable_material">
       <term><varname>enable_material</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 37f8d8760a..150038c7f0 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -148,6 +148,9 @@ typedef struct IndexAmRoutine
     amendscan_function amendscan;
     ammarkpos_function ammarkpos;       /* can be NULL */
     amrestrpos_function amrestrpos;     /* can be NULL */
+    amskip_function amskip;                        /* can be NULL */
+    ambeginscan_skip_function ambeginskipscan;     /* can be NULL */
+    amgettuple_with_skip_function amgetskiptuple;  /* can be NULL */
 
     /* interface functions to support parallel index scans */
     amestimateparallelscan_function amestimateparallelscan;    /* can be NULL */
@@ -674,6 +677,122 @@ amrestrpos (IndexScanDesc scan);
    struct may be set to NULL.
   </para>
 
+  <para>
+<programlisting>
+bool
+amskip (IndexScanDesc scan,
+        ScanDirection prefixDir,
+	ScanDirection postfixDir);
+</programlisting>
+  Skip past all tuples where the first 'prefix' columns have the same value as
+  the last tuple returned in the current scan. The arguments are:
+
+   <variablelist>
+    <varlistentry>
+     <term><parameter>scan</parameter></term>
+     <listitem>
+      <para>
+       Index scan information
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>prefixDir</parameter></term>
+     <listitem>
+      <para>
+       The direction in which the prefix part of the tuple is advancing.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>postfixDir</parameter></term>
+     <listitem>
+      <para>
+        The direction in which the postfix (everything after the prefix) of the tuple is advancing.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+
+  </para>
+  <para>
+<programlisting>
+IndexScanDesc
+ambeginscan_skip (Relation indexRelation,
+             int nkeys,
+	     int norderbys,
+	     int prefix);
+</programlisting>
+   Prepare for an index scan.  The <literal>nkeys</literal> and <literal>norderbys</literal>
+   parameters indicate the number of quals and ordering operators that will be
+   used in the scan; these may be useful for space allocation purposes.
+   Note that the actual values of the scan keys aren't provided yet.
+   The result must be a palloc'd struct.
+   For implementation reasons the index access method
+   <emphasis>must</emphasis> create this struct by calling
+   <function>RelationGetIndexScan()</function>.  In most cases
+   <function>ambeginscan</function> does little beyond making that call and perhaps
+   acquiring locks;
+   the interesting parts of index-scan startup are in <function>amrescan</function>.
+   If this is a skip scan, prefix must indicate the length of the prefix that can be
+   skipped over. Prefix can be set to -1 to disable skipping, which will yield an
+   identical scan to a regular call to <function>ambeginscan</function>.
+  </para>
+  <para>
+  <programlisting>
+  boolean
+  amgettuple_skip (IndexScanDesc scan,
+              ScanDirection prefixDir,
+	      ScanDirection postfixDir);
+  </programlisting>
+     Fetch the next tuple in the given scan, moving in the given
+     directions. Directions are specified by the direction of the prefix we're moving in,
+     of which the size of the prefix has been specified in the <function>btbeginscan_skip</function>
+     call. This direction can be different in DISTINCT scans when fetching backwards
+     from a cursor.
+     Returns true if a tuple was
+     obtained, false if no matching tuples remain.  In the true case the tuple
+     TID is stored into the <literal>scan</literal> structure.  Note that
+     <quote>success</quote> means only that the index contains an entry that matches
+     the scan keys, not that the tuple necessarily still exists in the heap or
+     will pass the caller's snapshot test.  On success, <function>amgettuple</function>
+     must also set <literal>scan-&gt;xs_recheck</literal> to true or false.
+     False means it is certain that the index entry matches the scan keys.
+     true means this is not certain, and the conditions represented by the
+     scan keys must be rechecked against the heap tuple after fetching it.
+     This provision supports <quote>lossy</quote> index operators.
+     Note that rechecking will extend only to the scan conditions; a partial
+     index predicate (if any) is never rechecked by <function>amgettuple</function>
+     callers.
+    </para>
+
+    <para>
+     If the index supports <link linkend="indexes-index-only-scans">index-only
+     scans</link> (i.e., <function>amcanreturn</function> returns true for it),
+     then on success the AM must also check <literal>scan-&gt;xs_want_itup</literal>,
+     and if that is true it must return the originally indexed data for the
+     index entry.  The data can be returned in the form of an
+     <structname>IndexTuple</structname> pointer stored at <literal>scan-&gt;xs_itup</literal>,
+     with tuple descriptor <literal>scan-&gt;xs_itupdesc</literal>; or in the form of
+     a <structname>HeapTuple</structname> pointer stored at <literal>scan-&gt;xs_hitup</literal>,
+     with tuple descriptor <literal>scan-&gt;xs_hitupdesc</literal>.  (The latter
+     format should be used when reconstructing data that might possibly not fit
+     into an <structname>IndexTuple</structname>.)  In either case,
+     management of the data referenced by the pointer is the access method's
+     responsibility.  The data must remain good at least until the next
+     <function>amgettuple</function>, <function>amrescan</function>, or <function>amendscan</function>
+     call for the scan.
+    </para>
+
+    <para>
+     The <function>amgettuple</function> function need only be provided if the access
+     method supports <quote>plain</quote> index scans.  If it doesn't, the
+     <structfield>amgettuple</structfield> field in its <structname>IndexAmRoutine</structname>
+     struct must be set to NULL.
+    </para>
+
   <para>
    In addition to supporting ordinary index scans, some types of index
    may wish to support <firstterm>parallel index scans</firstterm>, which allow
@@ -689,7 +808,7 @@ amrestrpos (IndexScanDesc scan);
    functions may be implemented to support parallel index scans:
   </para>
 
-  <para>
+    <para>
 <programlisting>
 Size
 amestimateparallelscan (void);
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 86539a781c..b4349039e7 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1298,6 +1298,34 @@ SELECT target FROM tests WHERE subject = 'some-subject' AND success;
    and later will recognize such cases and allow index-only scans to be
    generated, but older versions will not.
   </para>
+
+  <sect2 id="indexes-index-skip-scans">
+    <title>Index Skip Scans</title>
+
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index</primary>
+      <secondary>index-skip scans</secondary>
+    </indexterm>
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index-skip scan</primary>
+    </indexterm>
+
+    <para>
+     When the rows retrieved from an index scan are then deduplicated by
+     eliminating rows matching on a prefix of index keys (e.g. when using
+     <literal>SELECT DISTINCT</literal>), the planner will consider
+     skipping groups of rows with a matching key prefix. When a row with
+     a particular prefix is found, remaining rows with the same key prefix
+     are skipped.  The larger the number of rows with the same key prefix
+     rows (i.e. the lower the number of distinct key prefixes in the index),
+     the more efficient this is.
+    </para>
+    <para>
+      Additionally, a skip scan can be considered in regular <literal>SELECT</literal>
+      queries. When filtering on an non-leading attribute of an index, the planner
+      may choose a skip scan.
+    </para>
+  </sect2>
  </sect1>
 
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index c481838389..94440781cf 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -114,6 +114,9 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = brinbulkdelete;
 	amroutine->amvacuumcleanup = brinvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
+	amroutine->ambeginskipscan = NULL;
+	amroutine->amgetskiptuple = NULL;
 	amroutine->amcostestimate = brincostestimate;
 	amroutine->amoptions = brinoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index a7e55caf28..ffbac1d1b8 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -65,6 +65,9 @@ ginhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = ginbulkdelete;
 	amroutine->amvacuumcleanup = ginvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
+	amroutine->ambeginskipscan = NULL;
+	amroutine->amgetskiptuple = NULL;
 	amroutine->amcostestimate = gincostestimate;
 	amroutine->amoptions = ginoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 90c46e86a1..0d3691324c 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -86,6 +86,9 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = gistbulkdelete;
 	amroutine->amvacuumcleanup = gistvacuumcleanup;
 	amroutine->amcanreturn = gistcanreturn;
+	amroutine->amskip = NULL;
+	amroutine->ambeginskipscan = NULL;
+	amroutine->amgetskiptuple = NULL;
 	amroutine->amcostestimate = gistcostestimate;
 	amroutine->amoptions = gistoptions;
 	amroutine->amproperty = gistproperty;
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 4871b7ff4d..a95a48d57d 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -83,6 +83,9 @@ hashhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = hashbulkdelete;
 	amroutine->amvacuumcleanup = hashvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
+	amroutine->ambeginskipscan = NULL;
+	amroutine->amgetskiptuple = NULL;
 	amroutine->amcostestimate = hashcostestimate;
 	amroutine->amoptions = hashoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index a5210d0b34..695d2e1273 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -14,7 +14,9 @@
  *		index_open		- open an index relation by relation OID
  *		index_close		- close an index relation
  *		index_beginscan - start a scan of an index with amgettuple
+ *		index_beginscan_skip - start a scan of an index with amgettuple and skipping
  *		index_beginscan_bitmap - start a scan of an index with amgetbitmap
+ *		index_beginscan_bitmap_skip - start a skip scan of an index with amgetbitmap
  *		index_rescan	- restart a scan of an index
  *		index_endscan	- end a scan
  *		index_insert	- insert an index tuple into a relation
@@ -25,14 +27,17 @@
  *		index_parallelrescan  - (re)start a parallel scan of an index
  *		index_beginscan_parallel - join parallel index scan
  *		index_getnext_tid	- get the next TID from a scan
+ *		index_getnext_tid_skip	- get the next TID from a skip scan
  *		index_fetch_heap		- get the scan's next heap tuple
  *		index_getnext_slot	- get the next tuple from a scan
+ *		index_getnext_slot	- get the next tuple from a skip scan
  *		index_getbitmap - get all tuples from a scan
  *		index_bulk_delete	- bulk deletion of index tuples
  *		index_vacuum_cleanup	- post-deletion cleanup of an index
  *		index_can_return	- does index support index-only scans?
  *		index_getprocid - get a support procedure OID
  *		index_getprocinfo - get a support procedure's lookup info
+ *		index_skip		- advance past duplicate key values in a scan
  *
  * NOTES
  *		This file contains the index_ routines which used
@@ -216,6 +221,78 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+static IndexScanDesc
+index_beginscan_internal_skip(Relation indexRelation,
+						 int nkeys, int norderbys, int prefix, Snapshot snapshot,
+						 ParallelIndexScanDesc pscan, bool temp_snap)
+{
+	IndexScanDesc scan;
+
+	RELATION_CHECKS;
+	CHECK_REL_PROCEDURE(ambeginskipscan);
+
+	if (!(indexRelation->rd_indam->ampredlocks))
+		PredicateLockRelation(indexRelation, snapshot);
+
+	/*
+	 * We hold a reference count to the relcache entry throughout the scan.
+	 */
+	RelationIncrementReferenceCount(indexRelation);
+
+	/*
+	 * Tell the AM to open a scan.
+	 */
+	scan = indexRelation->rd_indam->ambeginskipscan(indexRelation, nkeys,
+												norderbys, prefix);
+	/* Initialize information for parallel scan. */
+	scan->parallel_scan = pscan;
+	scan->xs_temp_snap = temp_snap;
+
+	return scan;
+}
+
+IndexScanDesc
+index_beginscan_skip(Relation heapRelation,
+				Relation indexRelation,
+				Snapshot snapshot,
+				int nkeys, int norderbys, int prefix)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_internal_skip(indexRelation, nkeys, norderbys, prefix, snapshot, NULL, false);
+
+	/*
+	 * Save additional parameters into the scandesc.  Everything else was set
+	 * up by RelationGetIndexScan.
+	 */
+	scan->heapRelation = heapRelation;
+	scan->xs_snapshot = snapshot;
+
+	/* prepare to fetch index matches from table */
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+
+	return scan;
+}
+
+IndexScanDesc
+index_beginscan_bitmap_skip(Relation indexRelation,
+					   Snapshot snapshot,
+					   int nkeys,
+					   int prefix)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_internal_skip(indexRelation, nkeys, 0, prefix, snapshot, NULL, false);
+
+	/*
+	 * Save additional parameters into the scandesc.  Everything else was set
+	 * up by RelationGetIndexScan.
+	 */
+	scan->xs_snapshot = snapshot;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -544,6 +621,45 @@ index_getnext_tid(IndexScanDesc scan, ScanDirection direction)
 	return &scan->xs_heaptid;
 }
 
+ItemPointer
+index_getnext_tid_skip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	bool		found;
+
+	SCAN_CHECKS;
+	CHECK_SCAN_PROCEDURE(amgetskiptuple);
+
+	Assert(TransactionIdIsValid(RecentGlobalXmin));
+
+	/*
+	 * The AM's amgettuple proc finds the next index entry matching the scan
+	 * keys, and puts the TID into scan->xs_heaptid.  It should also set
+	 * scan->xs_recheck and possibly scan->xs_itup/scan->xs_hitup, though we
+	 * pay no attention to those fields here.
+	 */
+	found = scan->indexRelation->rd_indam->amgetskiptuple(scan, prefixDir, postfixDir);
+
+	/* Reset kill flag immediately for safety */
+	scan->kill_prior_tuple = false;
+	scan->xs_heap_continue = false;
+
+	/* If we're out of index entries, we're done */
+	if (!found)
+	{
+		/* release resources (like buffer pins) from table accesses */
+		if (scan->xs_heapfetch)
+			table_index_fetch_reset(scan->xs_heapfetch);
+
+		return NULL;
+	}
+	Assert(ItemPointerIsValid(&scan->xs_heaptid));
+
+	pgstat_count_index_tuples(scan->indexRelation, 1);
+
+	/* Return the TID of the tuple we found. */
+	return &scan->xs_heaptid;
+}
+
 /* ----------------
  *		index_fetch_heap - get the scan's next heap tuple
  *
@@ -635,6 +751,38 @@ index_getnext_slot(IndexScanDesc scan, ScanDirection direction, TupleTableSlot *
 	return false;
 }
 
+bool
+index_getnext_slot_skip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir, TupleTableSlot *slot)
+{
+	for (;;)
+	{
+		if (!scan->xs_heap_continue)
+		{
+			ItemPointer tid;
+
+			/* Time to fetch the next TID from the index */
+			tid = index_getnext_tid_skip(scan, prefixDir, postfixDir);
+
+			/* If we're out of index entries, we're done */
+			if (tid == NULL)
+				break;
+
+			Assert(ItemPointerEquals(tid, &scan->xs_heaptid));
+		}
+
+		/*
+		 * Fetch the next (or only) visible heap tuple for this index entry.
+		 * If we don't find anything, loop around and grab the next TID from
+		 * the index.
+		 */
+		Assert(ItemPointerIsValid(&scan->xs_heaptid));
+		if (index_fetch_heap(scan, slot))
+			return true;
+	}
+
+	return false;
+}
+
 /* ----------------
  *		index_getbitmap - get all tuples at once from an index scan
  *
@@ -730,6 +878,21 @@ index_can_return(Relation indexRelation, int attno)
 	return indexRelation->rd_indam->amcanreturn(indexRelation, attno);
 }
 
+/* ----------------
+ *		index_skip
+ *
+ *		Skip past all tuples where the first 'prefix' columns have the
+ *		same value as the last tuple returned in the current scan.
+ * ----------------
+ */
+bool
+index_skip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	SCAN_CHECKS;
+
+	return scan->indexRelation->rd_indam->amskip(scan, prefixDir, postfixDir);
+}
+
 /* ----------------
  *		index_getprocid
  *
diff --git a/src/backend/access/nbtree/Makefile b/src/backend/access/nbtree/Makefile
index d69808e78c..da96ac00a6 100644
--- a/src/backend/access/nbtree/Makefile
+++ b/src/backend/access/nbtree/Makefile
@@ -19,6 +19,7 @@ OBJS = \
 	nbtpage.o \
 	nbtree.o \
 	nbtsearch.o \
+	nbtskip.o \
 	nbtsort.o \
 	nbtsplitloc.o \
 	nbtutils.o \
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index 00df0e1b88..749c6e3744 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -89,7 +89,7 @@ _bt_doinsert(Relation rel, IndexTuple itup,
 	bool		checkingunique = (checkUnique != UNIQUE_CHECK_NO);
 
 	/* we need an insertion scan key to do our search, so build one */
-	itup_key = _bt_mkscankey(rel, itup);
+	itup_key = _bt_mkscankey(rel, itup, NULL);
 
 	if (checkingunique)
 	{
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 39b8f17f4b..4d48b8bd63 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -1638,7 +1638,7 @@ _bt_pagedel(Relation rel, Buffer buf)
 				}
 
 				/* we need an insertion scan key for the search, so build one */
-				itup_key = _bt_mkscankey(rel, targetkey);
+				itup_key = _bt_mkscankey(rel, targetkey, NULL);
 				/* find the leftmost leaf page with matching pivot/high key */
 				itup_key->pivotsearch = true;
 				stack = _bt_search(rel, itup_key, &lbuf, BT_READ, NULL);
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 4bb16297c3..2b9e045ae0 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -136,14 +136,17 @@ bthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = btbulkdelete;
 	amroutine->amvacuumcleanup = btvacuumcleanup;
 	amroutine->amcanreturn = btcanreturn;
+	amroutine->amskip = btskip;
 	amroutine->amcostestimate = btcostestimate;
 	amroutine->amoptions = btoptions;
 	amroutine->amproperty = btproperty;
 	amroutine->ambuildphasename = btbuildphasename;
 	amroutine->amvalidate = btvalidate;
 	amroutine->ambeginscan = btbeginscan;
+	amroutine->ambeginskipscan = btbeginscan_skip;
 	amroutine->amrescan = btrescan;
 	amroutine->amgettuple = btgettuple;
+	amroutine->amgetskiptuple = btgettuple_skip;
 	amroutine->amgetbitmap = btgetbitmap;
 	amroutine->amendscan = btendscan;
 	amroutine->ammarkpos = btmarkpos;
@@ -219,6 +222,15 @@ btinsert(Relation rel, Datum *values, bool *isnull,
  */
 bool
 btgettuple(IndexScanDesc scan, ScanDirection dir)
+{
+	return btgettuple_skip(scan, dir, dir);
+}
+
+/*
+ *	btgettuple() -- Get the next tuple in the scan.
+ */
+bool
+btgettuple_skip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir)
 {
 	BTScanOpaque so = (BTScanOpaque) scan->opaque;
 	bool		res;
@@ -237,7 +249,7 @@ btgettuple(IndexScanDesc scan, ScanDirection dir)
 		if (so->numArrayKeys < 0)
 			return false;
 
-		_bt_start_array_keys(scan, dir);
+		_bt_start_array_keys(scan, prefixDir);
 	}
 
 	/* This loop handles advancing to the next array elements, if any */
@@ -249,7 +261,7 @@ btgettuple(IndexScanDesc scan, ScanDirection dir)
 		 * _bt_first() to get the first item in the scan.
 		 */
 		if (!BTScanPosIsValid(so->currPos))
-			res = _bt_first(scan, dir);
+			res = _bt_first(scan, prefixDir, postfixDir);
 		else
 		{
 			/*
@@ -276,14 +288,14 @@ btgettuple(IndexScanDesc scan, ScanDirection dir)
 			/*
 			 * Now continue the scan.
 			 */
-			res = _bt_next(scan, dir);
+			res = _bt_next(scan, prefixDir, postfixDir);
 		}
 
 		/* If we have a tuple, return it ... */
 		if (res)
 			break;
 		/* ... otherwise see if we have more array keys to deal with */
-	} while (so->numArrayKeys && _bt_advance_array_keys(scan, dir));
+	} while (so->numArrayKeys && _bt_advance_array_keys(scan, prefixDir));
 
 	return res;
 }
@@ -314,7 +326,7 @@ btgetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 	do
 	{
 		/* Fetch the first page & tuple */
-		if (_bt_first(scan, ForwardScanDirection))
+		if (_bt_first(scan, ForwardScanDirection, ForwardScanDirection))
 		{
 			/* Save tuple ID, and continue scanning */
 			heapTid = &scan->xs_heaptid;
@@ -330,7 +342,7 @@ btgetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 				if (++so->currPos.itemIndex > so->currPos.lastItem)
 				{
 					/* let _bt_next do the heavy lifting */
-					if (!_bt_next(scan, ForwardScanDirection))
+					if (!_bt_next(scan, ForwardScanDirection, ForwardScanDirection))
 						break;
 				}
 
@@ -351,6 +363,16 @@ btgetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
  */
 IndexScanDesc
 btbeginscan(Relation rel, int nkeys, int norderbys)
+{
+	return btbeginscan_skip(rel, nkeys, norderbys, -1);
+}
+
+
+/*
+ *	btbeginscan() -- start a scan on a btree index
+ */
+IndexScanDesc
+btbeginscan_skip(Relation rel, int nkeys, int norderbys, int skipPrefix)
 {
 	IndexScanDesc scan;
 	BTScanOpaque so;
@@ -385,10 +407,18 @@ btbeginscan(Relation rel, int nkeys, int norderbys)
 	 */
 	so->currTuples = so->markTuples = NULL;
 
+	so->skipData = NULL;
+
 	scan->xs_itupdesc = RelationGetDescr(rel);
 
 	scan->opaque = so;
 
+	if (skipPrefix > 0)
+	{
+		so->skipData = (BTSkip) palloc0(sizeof(BTSkipData));
+		so->skipData->prefix = skipPrefix;
+	}
+
 	return scan;
 }
 
@@ -452,6 +482,15 @@ btrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 	_bt_preprocess_array_keys(scan);
 }
 
+/*
+ * btskip() -- skip to the beginning of the next key prefix
+ */
+bool
+btskip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	return _bt_skip(scan, prefixDir, postfixDir);
+}
+
 /*
  *	btendscan() -- close down a scan
  */
@@ -485,6 +524,8 @@ btendscan(IndexScanDesc scan)
 	if (so->currTuples != NULL)
 		pfree(so->currTuples);
 	/* so->markTuples should not be pfree'd, see btrescan */
+	if (_bt_skip_enabled(so))
+		pfree(so->skipData);
 	pfree(so);
 }
 
@@ -568,6 +609,9 @@ btrestrpos(IndexScanDesc scan)
 			if (so->currTuples)
 				memcpy(so->currTuples, so->markTuples,
 					   so->markPos.nextTupleOffset);
+			if (so->skipData)
+				memcpy(&so->skipData->curPos, &so->skipData->markPos,
+					   sizeof(BTSkipPosData));
 		}
 		else
 			BTScanPosInvalidate(so->currPos);
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index 8ff49ce6d6..f0c042c9ba 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -17,19 +17,17 @@
 
 #include "access/nbtree.h"
 #include "access/relscan.h"
+#include "catalog/catalog.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/predicate.h"
+#include "utils/guc.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 
 
-static void _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp);
-static OffsetNumber _bt_binsrch(Relation rel, BTScanInsert key, Buffer buf);
 static int	_bt_binsrch_posting(BTScanInsert key, Page page,
 								OffsetNumber offnum);
-static bool _bt_readpage(IndexScanDesc scan, ScanDirection dir,
-						 OffsetNumber offnum);
 static void _bt_saveitem(BTScanOpaque so, int itemIndex,
 						 OffsetNumber offnum, IndexTuple itup);
 static int	_bt_setuppostingitems(BTScanOpaque so, int itemIndex,
@@ -38,14 +36,12 @@ static int	_bt_setuppostingitems(BTScanOpaque so, int itemIndex,
 static inline void _bt_savepostingitem(BTScanOpaque so, int itemIndex,
 									   OffsetNumber offnum,
 									   ItemPointer heapTid, int tupleOffset);
-static bool _bt_steppage(IndexScanDesc scan, ScanDirection dir);
-static bool _bt_readnextpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir);
 static bool _bt_parallel_readpage(IndexScanDesc scan, BlockNumber blkno,
 								  ScanDirection dir);
-static Buffer _bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot);
 static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir);
-static inline void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir);
-
+static inline bool _bt_checkkeys_extended(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
+										  ScanDirection dir, bool isRegularMode,
+										  bool *continuescan, int *prefixskipindex);
 
 /*
  *	_bt_drop_lock_and_maybe_pin()
@@ -61,7 +57,7 @@ static inline void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir);
  * will remain in shared memory for as long as it takes to scan the index
  * buffer page.
  */
-static void
+void
 _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp)
 {
 	LockBuffer(sp->buf, BUFFER_LOCK_UNLOCK);
@@ -344,7 +340,7 @@ _bt_moveright(Relation rel,
  * the given page.  _bt_binsrch() has no lock or refcount side effects
  * on the buffer.
  */
-static OffsetNumber
+OffsetNumber
 _bt_binsrch(Relation rel,
 			BTScanInsert key,
 			Buffer buf)
@@ -850,25 +846,23 @@ _bt_compare(Relation rel,
  * in locating the scan start position.
  */
 bool
-_bt_first(IndexScanDesc scan, ScanDirection dir)
+_bt_first(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir)
 {
 	Relation	rel = scan->indexRelation;
 	BTScanOpaque so = (BTScanOpaque) scan->opaque;
 	Buffer		buf;
 	BTStack		stack;
 	OffsetNumber offnum;
-	StrategyNumber strat;
-	bool		nextkey;
 	bool		goback;
 	BTScanInsertData inskey;
 	ScanKey		startKeys[INDEX_MAX_KEYS];
 	ScanKeyData notnullkeys[INDEX_MAX_KEYS];
 	int			keysCount = 0;
-	int			i;
 	bool		status = true;
 	StrategyNumber strat_total;
 	BTScanPosItem *currItem;
 	BlockNumber blkno;
+	IndexTuple itup;
 
 	Assert(!BTScanPosIsValid(so->currPos));
 
@@ -905,184 +899,13 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
 		}
 		else if (blkno != InvalidBlockNumber)
 		{
-			if (!_bt_parallel_readpage(scan, blkno, dir))
+			if (!_bt_parallel_readpage(scan, blkno, prefixDir))
 				return false;
 			goto readcomplete;
 		}
 	}
 
-	/*----------
-	 * Examine the scan keys to discover where we need to start the scan.
-	 *
-	 * We want to identify the keys that can be used as starting boundaries;
-	 * these are =, >, or >= keys for a forward scan or =, <, <= keys for
-	 * a backwards scan.  We can use keys for multiple attributes so long as
-	 * the prior attributes had only =, >= (resp. =, <=) keys.  Once we accept
-	 * a > or < boundary or find an attribute with no boundary (which can be
-	 * thought of as the same as "> -infinity"), we can't use keys for any
-	 * attributes to its right, because it would break our simplistic notion
-	 * of what initial positioning strategy to use.
-	 *
-	 * When the scan keys include cross-type operators, _bt_preprocess_keys
-	 * may not be able to eliminate redundant keys; in such cases we will
-	 * arbitrarily pick a usable one for each attribute.  This is correct
-	 * but possibly not optimal behavior.  (For example, with keys like
-	 * "x >= 4 AND x >= 5" we would elect to scan starting at x=4 when
-	 * x=5 would be more efficient.)  Since the situation only arises given
-	 * a poorly-worded query plus an incomplete opfamily, live with it.
-	 *
-	 * When both equality and inequality keys appear for a single attribute
-	 * (again, only possible when cross-type operators appear), we *must*
-	 * select one of the equality keys for the starting point, because
-	 * _bt_checkkeys() will stop the scan as soon as an equality qual fails.
-	 * For example, if we have keys like "x >= 4 AND x = 10" and we elect to
-	 * start at x=4, we will fail and stop before reaching x=10.  If multiple
-	 * equality quals survive preprocessing, however, it doesn't matter which
-	 * one we use --- by definition, they are either redundant or
-	 * contradictory.
-	 *
-	 * Any regular (not SK_SEARCHNULL) key implies a NOT NULL qualifier.
-	 * If the index stores nulls at the end of the index we'll be starting
-	 * from, and we have no boundary key for the column (which means the key
-	 * we deduced NOT NULL from is an inequality key that constrains the other
-	 * end of the index), then we cons up an explicit SK_SEARCHNOTNULL key to
-	 * use as a boundary key.  If we didn't do this, we might find ourselves
-	 * traversing a lot of null entries at the start of the scan.
-	 *
-	 * In this loop, row-comparison keys are treated the same as keys on their
-	 * first (leftmost) columns.  We'll add on lower-order columns of the row
-	 * comparison below, if possible.
-	 *
-	 * The selected scan keys (at most one per index column) are remembered by
-	 * storing their addresses into the local startKeys[] array.
-	 *----------
-	 */
-	strat_total = BTEqualStrategyNumber;
-	if (so->numberOfKeys > 0)
-	{
-		AttrNumber	curattr;
-		ScanKey		chosen;
-		ScanKey		impliesNN;
-		ScanKey		cur;
-
-		/*
-		 * chosen is the so-far-chosen key for the current attribute, if any.
-		 * We don't cast the decision in stone until we reach keys for the
-		 * next attribute.
-		 */
-		curattr = 1;
-		chosen = NULL;
-		/* Also remember any scankey that implies a NOT NULL constraint */
-		impliesNN = NULL;
-
-		/*
-		 * Loop iterates from 0 to numberOfKeys inclusive; we use the last
-		 * pass to handle after-last-key processing.  Actual exit from the
-		 * loop is at one of the "break" statements below.
-		 */
-		for (cur = so->keyData, i = 0;; cur++, i++)
-		{
-			if (i >= so->numberOfKeys || cur->sk_attno != curattr)
-			{
-				/*
-				 * Done looking at keys for curattr.  If we didn't find a
-				 * usable boundary key, see if we can deduce a NOT NULL key.
-				 */
-				if (chosen == NULL && impliesNN != NULL &&
-					((impliesNN->sk_flags & SK_BT_NULLS_FIRST) ?
-					 ScanDirectionIsForward(dir) :
-					 ScanDirectionIsBackward(dir)))
-				{
-					/* Yes, so build the key in notnullkeys[keysCount] */
-					chosen = &notnullkeys[keysCount];
-					ScanKeyEntryInitialize(chosen,
-										   (SK_SEARCHNOTNULL | SK_ISNULL |
-											(impliesNN->sk_flags &
-											 (SK_BT_DESC | SK_BT_NULLS_FIRST))),
-										   curattr,
-										   ((impliesNN->sk_flags & SK_BT_NULLS_FIRST) ?
-											BTGreaterStrategyNumber :
-											BTLessStrategyNumber),
-										   InvalidOid,
-										   InvalidOid,
-										   InvalidOid,
-										   (Datum) 0);
-				}
-
-				/*
-				 * If we still didn't find a usable boundary key, quit; else
-				 * save the boundary key pointer in startKeys.
-				 */
-				if (chosen == NULL)
-					break;
-				startKeys[keysCount++] = chosen;
-
-				/*
-				 * Adjust strat_total, and quit if we have stored a > or <
-				 * key.
-				 */
-				strat = chosen->sk_strategy;
-				if (strat != BTEqualStrategyNumber)
-				{
-					strat_total = strat;
-					if (strat == BTGreaterStrategyNumber ||
-						strat == BTLessStrategyNumber)
-						break;
-				}
-
-				/*
-				 * Done if that was the last attribute, or if next key is not
-				 * in sequence (implying no boundary key is available for the
-				 * next attribute).
-				 */
-				if (i >= so->numberOfKeys ||
-					cur->sk_attno != curattr + 1)
-					break;
-
-				/*
-				 * Reset for next attr.
-				 */
-				curattr = cur->sk_attno;
-				chosen = NULL;
-				impliesNN = NULL;
-			}
-
-			/*
-			 * Can we use this key as a starting boundary for this attr?
-			 *
-			 * If not, does it imply a NOT NULL constraint?  (Because
-			 * SK_SEARCHNULL keys are always assigned BTEqualStrategyNumber,
-			 * *any* inequality key works for that; we need not test.)
-			 */
-			switch (cur->sk_strategy)
-			{
-				case BTLessStrategyNumber:
-				case BTLessEqualStrategyNumber:
-					if (chosen == NULL)
-					{
-						if (ScanDirectionIsBackward(dir))
-							chosen = cur;
-						else
-							impliesNN = cur;
-					}
-					break;
-				case BTEqualStrategyNumber:
-					/* override any non-equality choice */
-					chosen = cur;
-					break;
-				case BTGreaterEqualStrategyNumber:
-				case BTGreaterStrategyNumber:
-					if (chosen == NULL)
-					{
-						if (ScanDirectionIsForward(dir))
-							chosen = cur;
-						else
-							impliesNN = cur;
-					}
-					break;
-			}
-		}
-	}
+	keysCount = _bt_choose_scan_keys(so->keyData, so->numberOfKeys, prefixDir, startKeys, notnullkeys, &strat_total, 0);
 
 	/*
 	 * If we found no usable boundary keys, we have to start from one end of
@@ -1093,260 +916,112 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
 	{
 		bool		match;
 
-		match = _bt_endpoint(scan, dir);
-
-		if (!match)
+		if (!_bt_skip_enabled(so))
 		{
-			/* No match, so mark (parallel) scan finished */
-			_bt_parallel_done(scan);
-		}
+			match = _bt_endpoint(scan, prefixDir);
 
-		return match;
-	}
+			if (!match)
+			{
+				/* No match, so mark (parallel) scan finished */
+				_bt_parallel_done(scan);
+			}
 
-	/*
-	 * We want to start the scan somewhere within the index.  Set up an
-	 * insertion scankey we can use to search for the boundary point we
-	 * identified above.  The insertion scankey is built using the keys
-	 * identified by startKeys[].  (Remaining insertion scankey fields are
-	 * initialized after initial-positioning strategy is finalized.)
-	 */
-	Assert(keysCount <= INDEX_MAX_KEYS);
-	for (i = 0; i < keysCount; i++)
-	{
-		ScanKey		cur = startKeys[i];
+			return match;
+		}
+		else
+		{
+			Relation	rel = scan->indexRelation;
+			Buffer		buf;
+			Page		page;
+			BTPageOpaque opaque;
+			OffsetNumber start;
+			BTSkipCompareResult cmp = {0};
 
-		Assert(cur->sk_attno == i + 1);
+			_bt_skip_create_scankeys(rel, so);
 
-		if (cur->sk_flags & SK_ROW_HEADER)
-		{
 			/*
-			 * Row comparison header: look to the first row member instead.
-			 *
-			 * The member scankeys are already in insertion format (ie, they
-			 * have sk_func = 3-way-comparison function), but we have to watch
-			 * out for nulls, which _bt_preprocess_keys didn't check. A null
-			 * in the first row member makes the condition unmatchable, just
-			 * like qual_ok = false.
+			 * Scan down to the leftmost or rightmost leaf page and position
+			 * the scan on the leftmost or rightmost item on that page.
+			 * Start the skip scan from there to find the first matching item
 			 */
-			ScanKey		subkey = (ScanKey) DatumGetPointer(cur->sk_argument);
+			buf = _bt_get_endpoint(rel, 0, ScanDirectionIsBackward(prefixDir), scan->xs_snapshot);
 
-			Assert(subkey->sk_flags & SK_ROW_MEMBER);
-			if (subkey->sk_flags & SK_ISNULL)
+			if (!BufferIsValid(buf))
 			{
-				_bt_parallel_done(scan);
+				/*
+				 * Empty index. Lock the whole relation, as nothing finer to lock
+				 * exists.
+				 */
+				PredicateLockRelation(rel, scan->xs_snapshot);
+				BTScanPosInvalidate(so->currPos);
 				return false;
 			}
-			memcpy(inskey.scankeys + i, subkey, sizeof(ScanKeyData));
 
-			/*
-			 * If the row comparison is the last positioning key we accepted,
-			 * try to add additional keys from the lower-order row members.
-			 * (If we accepted independent conditions on additional index
-			 * columns, we use those instead --- doesn't seem worth trying to
-			 * determine which is more restrictive.)  Note that this is OK
-			 * even if the row comparison is of ">" or "<" type, because the
-			 * condition applied to all but the last row member is effectively
-			 * ">=" or "<=", and so the extra keys don't break the positioning
-			 * scheme.  But, by the same token, if we aren't able to use all
-			 * the row members, then the part of the row comparison that we
-			 * did use has to be treated as just a ">=" or "<=" condition, and
-			 * so we'd better adjust strat_total accordingly.
-			 */
-			if (i == keysCount - 1)
+			PredicateLockPage(rel, BufferGetBlockNumber(buf), scan->xs_snapshot);
+			page = BufferGetPage(buf);
+			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+			Assert(P_ISLEAF(opaque));
+
+			if (ScanDirectionIsForward(prefixDir))
 			{
-				bool		used_all_subkeys = false;
+				/* There could be dead pages to the left, so not this: */
+				/* Assert(P_LEFTMOST(opaque)); */
 
-				Assert(!(subkey->sk_flags & SK_ROW_END));
-				for (;;)
-				{
-					subkey++;
-					Assert(subkey->sk_flags & SK_ROW_MEMBER);
-					if (subkey->sk_attno != keysCount + 1)
-						break;	/* out-of-sequence, can't use it */
-					if (subkey->sk_strategy != cur->sk_strategy)
-						break;	/* wrong direction, can't use it */
-					if (subkey->sk_flags & SK_ISNULL)
-						break;	/* can't use null keys */
-					Assert(keysCount < INDEX_MAX_KEYS);
-					memcpy(inskey.scankeys + keysCount, subkey,
-						   sizeof(ScanKeyData));
-					keysCount++;
-					if (subkey->sk_flags & SK_ROW_END)
-					{
-						used_all_subkeys = true;
-						break;
-					}
-				}
-				if (!used_all_subkeys)
-				{
-					switch (strat_total)
-					{
-						case BTLessStrategyNumber:
-							strat_total = BTLessEqualStrategyNumber;
-							break;
-						case BTGreaterStrategyNumber:
-							strat_total = BTGreaterEqualStrategyNumber;
-							break;
-					}
-				}
-				break;			/* done with outer loop */
+				start = P_FIRSTDATAKEY(opaque);
 			}
-		}
-		else
-		{
-			/*
-			 * Ordinary comparison key.  Transform the search-style scan key
-			 * to an insertion scan key by replacing the sk_func with the
-			 * appropriate btree comparison function.
-			 *
-			 * If scankey operator is not a cross-type comparison, we can use
-			 * the cached comparison function; otherwise gotta look it up in
-			 * the catalogs.  (That can't lead to infinite recursion, since no
-			 * indexscan initiated by syscache lookup will use cross-data-type
-			 * operators.)
-			 *
-			 * We support the convention that sk_subtype == InvalidOid means
-			 * the opclass input type; this is a hack to simplify life for
-			 * ScanKeyInit().
-			 */
-			if (cur->sk_subtype == rel->rd_opcintype[i] ||
-				cur->sk_subtype == InvalidOid)
+			else if (ScanDirectionIsBackward(prefixDir))
 			{
-				FmgrInfo   *procinfo;
-
-				procinfo = index_getprocinfo(rel, cur->sk_attno, BTORDER_PROC);
-				ScanKeyEntryInitializeWithInfo(inskey.scankeys + i,
-											   cur->sk_flags,
-											   cur->sk_attno,
-											   InvalidStrategy,
-											   cur->sk_subtype,
-											   cur->sk_collation,
-											   procinfo,
-											   cur->sk_argument);
+				Assert(P_RIGHTMOST(opaque));
+
+				start = PageGetMaxOffsetNumber(page);
 			}
 			else
 			{
-				RegProcedure cmp_proc;
-
-				cmp_proc = get_opfamily_proc(rel->rd_opfamily[i],
-											 rel->rd_opcintype[i],
-											 cur->sk_subtype,
-											 BTORDER_PROC);
-				if (!RegProcedureIsValid(cmp_proc))
-					elog(ERROR, "missing support function %d(%u,%u) for attribute %d of index \"%s\"",
-						 BTORDER_PROC, rel->rd_opcintype[i], cur->sk_subtype,
-						 cur->sk_attno, RelationGetRelationName(rel));
-				ScanKeyEntryInitialize(inskey.scankeys + i,
-									   cur->sk_flags,
-									   cur->sk_attno,
-									   InvalidStrategy,
-									   cur->sk_subtype,
-									   cur->sk_collation,
-									   cmp_proc,
-									   cur->sk_argument);
+				elog(ERROR, "invalid scan direction: %d", (int) prefixDir);
 			}
-		}
-	}
 
-	/*----------
-	 * Examine the selected initial-positioning strategy to determine exactly
-	 * where we need to start the scan, and set flag variables to control the
-	 * code below.
-	 *
-	 * If nextkey = false, _bt_search and _bt_binsrch will locate the first
-	 * item >= scan key.  If nextkey = true, they will locate the first
-	 * item > scan key.
-	 *
-	 * If goback = true, we will then step back one item, while if
-	 * goback = false, we will start the scan on the located item.
-	 *----------
-	 */
-	switch (strat_total)
-	{
-		case BTLessStrategyNumber:
-
-			/*
-			 * Find first item >= scankey, then back up one to arrive at last
-			 * item < scankey.  (Note: this positioning strategy is only used
-			 * for a backward scan, so that is always the correct starting
-			 * position.)
-			 */
-			nextkey = false;
-			goback = true;
-			break;
-
-		case BTLessEqualStrategyNumber:
-
-			/*
-			 * Find first item > scankey, then back up one to arrive at last
-			 * item <= scankey.  (Note: this positioning strategy is only used
-			 * for a backward scan, so that is always the correct starting
-			 * position.)
-			 */
-			nextkey = true;
-			goback = true;
-			break;
-
-		case BTEqualStrategyNumber:
-
-			/*
-			 * If a backward scan was specified, need to start with last equal
-			 * item not first one.
+			/* remember which buffer we have pinned */
+			so->currPos.buf = buf;
+			so->currPos.currPage = BufferGetBlockNumber(so->currPos.buf);
+
+			itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, start));
+			/* in some cases, we can (or have to) skip further inside the prefix.
+			 * we can do this if we have extra quals becoming available, eg.
+			 * WHERE b=2 on an index on (a,b).
+			 * We must, if this is not regular mode (prefixDir!=postfixDir).
+			 * Because this means we're at the end of the prefix, while we should be
+			 * at the beginning.
 			 */
-			if (ScanDirectionIsBackward(dir))
+			if (_bt_has_extra_quals_after_skip(so->skipData, postfixDir, 0) ||
+					!_bt_skip_is_regular_mode(prefixDir, postfixDir))
 			{
-				/*
-				 * This is the same as the <= strategy.  We will check at the
-				 * end whether the found item is actually =.
-				 */
-				nextkey = true;
-				goback = true;
+				_bt_skip_extra_conditions(scan, &itup, &start, prefixDir, postfixDir, &cmp);
 			}
-			else
+			/* now find the next matching tuple */
+			match = _bt_skip_find_next(scan, itup, start, prefixDir, postfixDir);
+			if (!match)
 			{
-				/*
-				 * This is the same as the >= strategy.  We will check at the
-				 * end whether the found item is actually =.
-				 */
-				nextkey = false;
-				goback = false;
+				if (_bt_skip_is_always_valid(so))
+					_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
+				return false;
 			}
-			break;
 
-		case BTGreaterEqualStrategyNumber:
+			_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
 
-			/*
-			 * Find first item >= scankey.  (This is only used for forward
-			 * scans.)
-			 */
-			nextkey = false;
-			goback = false;
-			break;
-
-		case BTGreaterStrategyNumber:
-
-			/*
-			 * Find first item > scankey.  (This is only used for forward
-			 * scans.)
-			 */
-			nextkey = true;
-			goback = false;
-			break;
+			currItem = &so->currPos.items[so->currPos.itemIndex];
+			scan->xs_heaptid = currItem->heapTid;
+			if (scan->xs_want_itup)
+				scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
 
-		default:
-			/* can't get here, but keep compiler quiet */
-			elog(ERROR, "unrecognized strat_total: %d", (int) strat_total);
-			return false;
+			return true;
+		}
 	}
 
-	/* Initialize remaining insertion scan key fields */
-	_bt_metaversion(rel, &inskey.heapkeyspace, &inskey.allequalimage);
-	inskey.anynullkeys = false; /* unused */
-	inskey.nextkey = nextkey;
-	inskey.pivotsearch = false;
-	inskey.scantid = NULL;
-	inskey.keysz = keysCount;
+	if (!_bt_create_insertion_scan_key(rel, prefixDir, startKeys, keysCount, &inskey, &strat_total,  &goback))
+	{
+		_bt_parallel_done(scan);
+		return false;
+	}
 
 	/*
 	 * Use the manufactured insertion scan key to descend the tree and
@@ -1378,7 +1053,7 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
 		PredicateLockPage(rel, BufferGetBlockNumber(buf),
 						  scan->xs_snapshot);
 
-	_bt_initialize_more_data(so, dir);
+	_bt_initialize_more_data(so, prefixDir);
 
 	/* position to the precise item on the page */
 	offnum = _bt_binsrch(rel, &inskey, buf);
@@ -1408,23 +1083,79 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
 	Assert(!BTScanPosIsValid(so->currPos));
 	so->currPos.buf = buf;
 
-	/*
-	 * Now load data from the first page of the scan.
-	 */
-	if (!_bt_readpage(scan, dir, offnum))
+	if (_bt_skip_enabled(so))
 	{
-		/*
-		 * There's no actually-matching data on this page.  Try to advance to
-		 * the next page.  Return false if there's no matching data at all.
+		Page page;
+		BTPageOpaque opaque;
+		OffsetNumber minoff;
+		bool match;
+		BTSkipCompareResult cmp = {0};
+
+		/* first create the skip scan keys */
+		_bt_skip_create_scankeys(rel, so);
+
+		/* remember which page we have pinned */
+		so->currPos.currPage = BufferGetBlockNumber(so->currPos.buf);
+
+		page = BufferGetPage(so->currPos.buf);
+		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+		minoff = P_FIRSTDATAKEY(opaque);
+		/* _binsrch + goback parameter can leave the offnum before the first item on the page
+		 * or after the last item on the page. if that is the case we need to either step
+		 * back or forward one page
 		 */
-		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
-		if (!_bt_steppage(scan, dir))
+		if (offnum < minoff)
+		{
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			if (!_bt_step_back_page(scan, &itup, &offnum))
+				return false;
+		}
+		else if (offnum > PageGetMaxOffsetNumber(page))
+		{
+			BlockNumber next = opaque->btpo_next;
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			if (!_bt_step_forward_page(scan, next, &itup, &offnum))
+				return false;
+		}
+
+		itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
+		/* check if we can skip even more because we can use new conditions */
+		if (_bt_has_extra_quals_after_skip(so->skipData, postfixDir, inskey.keysz) ||
+				!_bt_skip_is_regular_mode(prefixDir, postfixDir))
+		{
+			_bt_skip_extra_conditions(scan, &itup, &offnum, prefixDir, postfixDir, &cmp);
+		}
+		/* now find the tuple */
+		match = _bt_skip_find_next(scan, itup, offnum, prefixDir, postfixDir);
+		if (!match)
+		{
+			if (_bt_skip_is_always_valid(so))
+				_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
 			return false;
+		}
+
+		_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
 	}
 	else
 	{
-		/* Drop the lock, and maybe the pin, on the current page */
-		_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
+		/*
+		 * Now load data from the first page of the scan.
+		 */
+		if (!_bt_readpage(scan, prefixDir, &offnum, true))
+		{
+			/*
+			 * There's no actually-matching data on this page.  Try to advance to
+			 * the next page.  Return false if there's no matching data at all.
+			 */
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			if (!_bt_steppage(scan, prefixDir))
+				return false;
+		}
+		else
+		{
+			/* Drop the lock, and maybe the pin, on the current page */
+			_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
+		}
 	}
 
 readcomplete:
@@ -1452,29 +1183,113 @@ readcomplete:
  *		so->currPos.buf to InvalidBuffer.
  */
 bool
-_bt_next(IndexScanDesc scan, ScanDirection dir)
+_bt_next(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir)
 {
 	BTScanOpaque so = (BTScanOpaque) scan->opaque;
 	BTScanPosItem *currItem;
 
-	/*
-	 * Advance to next tuple on current page; or if there's no more, try to
-	 * step to the next page with data.
-	 */
-	if (ScanDirectionIsForward(dir))
+	if (!_bt_skip_enabled(so))
 	{
-		if (++so->currPos.itemIndex > so->currPos.lastItem)
+		/*
+		 * Advance to next tuple on current page; or if there's no more, try to
+		 * step to the next page with data.
+		 */
+		if (ScanDirectionIsForward(prefixDir))
 		{
-			if (!_bt_steppage(scan, dir))
-				return false;
+			if (++so->currPos.itemIndex > so->currPos.lastItem)
+			{
+				if (!_bt_steppage(scan, prefixDir))
+					return false;
+			}
+		}
+		else
+		{
+			if (--so->currPos.itemIndex < so->currPos.firstItem)
+			{
+				if (!_bt_steppage(scan, prefixDir))
+					return false;
+			}
 		}
 	}
 	else
 	{
-		if (--so->currPos.itemIndex < so->currPos.firstItem)
+		bool match;
+		IndexTuple itup = NULL;
+		OffsetNumber offnum = InvalidOffsetNumber;
+
+		if (ScanDirectionIsForward(postfixDir))
 		{
-			if (!_bt_steppage(scan, dir))
-				return false;
+			if (++so->currPos.itemIndex > so->currPos.lastItem)
+			{
+				if (prefixDir != so->skipData->curPos.nextDirection)
+				{
+					/* this happens when doing a cursor scan and changing
+					 * direction in the meantime. eg. first fetch forwards,
+					 * then backwards.
+					 * we *always* just go to the next page instead of skipping,
+					 * because that's the only safe option.
+					 */
+					so->skipData->curPos.nextAction = SkipStateNext;
+					so->skipData->curPos.nextDirection = prefixDir;
+				}
+
+				if (so->skipData->curPos.nextAction == SkipStateNext)
+				{
+					/* we should just go forwards one page, no skipping is necessary */
+					if (!_bt_step_forward_page(scan, so->currPos.nextPage, &itup, &offnum))
+						return false;
+				}
+				else if (so->skipData->curPos.nextAction == SkipStateStop)
+				{
+					/* we've reached the end of the index, or we cannot find any more keys */
+					BTScanPosUnpinIfPinned(so->currPos);
+					BTScanPosInvalidate(so->currPos);
+					return false;
+				}
+
+				/* now find the next tuple */
+				match = _bt_skip_find_next(scan, itup, offnum, prefixDir, postfixDir);
+				if (!match)
+				{
+					if (_bt_skip_is_always_valid(so))
+						_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
+					return false;
+				}
+				_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
+			}
+		}
+		else
+		{
+			if (--so->currPos.itemIndex < so->currPos.firstItem)
+			{
+				if (prefixDir != so->skipData->curPos.nextDirection)
+				{
+					so->skipData->curPos.nextAction = SkipStateNext;
+					so->skipData->curPos.nextDirection = prefixDir;
+				}
+
+				if (so->skipData->curPos.nextAction == SkipStateNext)
+				{
+					if (!_bt_step_back_page(scan, &itup, &offnum))
+						return false;
+				}
+				else if (so->skipData->curPos.nextAction == SkipStateStop)
+				{
+					BTScanPosUnpinIfPinned(so->currPos);
+					BTScanPosInvalidate(so->currPos);
+					return false;
+				}
+
+				/* now find the next tuple */
+				match = _bt_skip_find_next(scan, itup, offnum, prefixDir, postfixDir);
+				if (!match)
+				{
+					if (_bt_skip_is_always_valid(so))
+						_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
+					return false;
+				}
+				_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
+			}
 		}
 	}
 
@@ -1506,8 +1321,8 @@ _bt_next(IndexScanDesc scan, ScanDirection dir)
  *
  * Returns true if any matching items found on the page, false if none.
  */
-static bool
-_bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
+bool
+_bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber *offnum, bool isRegularMode)
 {
 	BTScanOpaque so = (BTScanOpaque) scan->opaque;
 	Page		page;
@@ -1517,6 +1332,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 	int			itemIndex;
 	bool		continuescan;
 	int			indnatts;
+	int			prefixskipindex;
 
 	/*
 	 * We must have the buffer pinned and locked, but the usual macro can't be
@@ -1575,11 +1391,11 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 		/* load items[] in ascending order */
 		itemIndex = 0;
 
-		offnum = Max(offnum, minoff);
+		*offnum = Max(*offnum, minoff);
 
-		while (offnum <= maxoff)
+		while (*offnum <= maxoff)
 		{
-			ItemId		iid = PageGetItemId(page, offnum);
+			ItemId		iid = PageGetItemId(page, *offnum);
 			IndexTuple	itup;
 
 			/*
@@ -1588,19 +1404,19 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 			 */
 			if (scan->ignore_killed_tuples && ItemIdIsDead(iid))
 			{
-				offnum = OffsetNumberNext(offnum);
+				*offnum = OffsetNumberNext(*offnum);
 				continue;
 			}
 
 			itup = (IndexTuple) PageGetItem(page, iid);
 
-			if (_bt_checkkeys(scan, itup, indnatts, dir, &continuescan))
+			if (_bt_checkkeys_extended(scan, itup, indnatts, dir, isRegularMode, &continuescan, &prefixskipindex))
 			{
 				/* tuple passes all scan key conditions */
 				if (!BTreeTupleIsPosting(itup))
 				{
 					/* Remember it */
-					_bt_saveitem(so, itemIndex, offnum, itup);
+					_bt_saveitem(so, itemIndex, *offnum, itup);
 					itemIndex++;
 				}
 				else
@@ -1612,26 +1428,30 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 					 * TID
 					 */
 					tupleOffset =
-						_bt_setuppostingitems(so, itemIndex, offnum,
+						_bt_setuppostingitems(so, itemIndex, *offnum,
 											  BTreeTupleGetPostingN(itup, 0),
 											  itup);
 					itemIndex++;
 					/* Remember additional TIDs */
 					for (int i = 1; i < BTreeTupleGetNPosting(itup); i++)
 					{
-						_bt_savepostingitem(so, itemIndex, offnum,
+						_bt_savepostingitem(so, itemIndex, *offnum,
 											BTreeTupleGetPostingN(itup, i),
 											tupleOffset);
 						itemIndex++;
 					}
 				}
 			}
+
+			*offnum = OffsetNumberNext(*offnum);
+
 			/* When !continuescan, there can't be any more matches, so stop */
 			if (!continuescan)
 				break;
-
-			offnum = OffsetNumberNext(offnum);
+			if (!isRegularMode && prefixskipindex != -1)
+				break;
 		}
+		*offnum = OffsetNumberPrev(*offnum);
 
 		/*
 		 * We don't need to visit page to the right when the high key
@@ -1651,7 +1471,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 			int			truncatt;
 
 			truncatt = BTreeTupleGetNAtts(itup, scan->indexRelation);
-			_bt_checkkeys(scan, itup, truncatt, dir, &continuescan);
+			_bt_checkkeys(scan, itup, truncatt, dir, &continuescan, NULL);
 		}
 
 		if (!continuescan)
@@ -1667,11 +1487,11 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 		/* load items[] in descending order */
 		itemIndex = MaxTIDsPerBTreePage;
 
-		offnum = Min(offnum, maxoff);
+		*offnum = Min(*offnum, maxoff);
 
-		while (offnum >= minoff)
+		while (*offnum >= minoff)
 		{
-			ItemId		iid = PageGetItemId(page, offnum);
+			ItemId		iid = PageGetItemId(page, *offnum);
 			IndexTuple	itup;
 			bool		tuple_alive;
 			bool		passes_quals;
@@ -1688,10 +1508,10 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 			 */
 			if (scan->ignore_killed_tuples && ItemIdIsDead(iid))
 			{
-				Assert(offnum >= P_FIRSTDATAKEY(opaque));
-				if (offnum > P_FIRSTDATAKEY(opaque))
+				Assert(*offnum >= P_FIRSTDATAKEY(opaque));
+				if (*offnum > P_FIRSTDATAKEY(opaque))
 				{
-					offnum = OffsetNumberPrev(offnum);
+					*offnum = OffsetNumberPrev(*offnum);
 					continue;
 				}
 
@@ -1702,8 +1522,8 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 
 			itup = (IndexTuple) PageGetItem(page, iid);
 
-			passes_quals = _bt_checkkeys(scan, itup, indnatts, dir,
-										 &continuescan);
+			passes_quals = _bt_checkkeys_extended(scan, itup, indnatts, dir,
+												  isRegularMode, &continuescan, &prefixskipindex);
 			if (passes_quals && tuple_alive)
 			{
 				/* tuple passes all scan key conditions */
@@ -1711,7 +1531,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 				{
 					/* Remember it */
 					itemIndex--;
-					_bt_saveitem(so, itemIndex, offnum, itup);
+					_bt_saveitem(so, itemIndex, *offnum, itup);
 				}
 				else
 				{
@@ -1729,28 +1549,32 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 					 */
 					itemIndex--;
 					tupleOffset =
-						_bt_setuppostingitems(so, itemIndex, offnum,
+						_bt_setuppostingitems(so, itemIndex, *offnum,
 											  BTreeTupleGetPostingN(itup, 0),
 											  itup);
 					/* Remember additional TIDs */
 					for (int i = 1; i < BTreeTupleGetNPosting(itup); i++)
 					{
 						itemIndex--;
-						_bt_savepostingitem(so, itemIndex, offnum,
+						_bt_savepostingitem(so, itemIndex, *offnum,
 											BTreeTupleGetPostingN(itup, i),
 											tupleOffset);
 					}
 				}
 			}
+
+			*offnum = OffsetNumberPrev(*offnum);
+
 			if (!continuescan)
 			{
 				/* there can't be any more matches, so stop */
 				so->currPos.moreLeft = false;
 				break;
 			}
-
-			offnum = OffsetNumberPrev(offnum);
+			if (!isRegularMode && prefixskipindex != -1)
+				break;
 		}
+		*offnum = OffsetNumberNext(*offnum);
 
 		Assert(itemIndex >= 0);
 		so->currPos.firstItem = itemIndex;
@@ -1858,7 +1682,7 @@ _bt_savepostingitem(BTScanOpaque so, int itemIndex, OffsetNumber offnum,
  * read lock, on that page.  If we do not hold the pin, we set so->currPos.buf
  * to InvalidBuffer.  We return true to indicate success.
  */
-static bool
+bool
 _bt_steppage(IndexScanDesc scan, ScanDirection dir)
 {
 	BTScanOpaque so = (BTScanOpaque) scan->opaque;
@@ -1886,6 +1710,9 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
 		if (so->markTuples)
 			memcpy(so->markTuples, so->currTuples,
 				   so->currPos.nextTupleOffset);
+		if (so->skipData)
+			memcpy(&so->skipData->markPos, &so->skipData->curPos,
+				   sizeof(BTSkipPosData));
 		so->markPos.itemIndex = so->markItemIndex;
 		so->markItemIndex = -1;
 	}
@@ -1965,13 +1792,14 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
  * If there are no more matching records in the given direction, we drop all
  * locks and pins, set so->currPos.buf to InvalidBuffer, and return false.
  */
-static bool
+bool
 _bt_readnextpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir)
 {
 	BTScanOpaque so = (BTScanOpaque) scan->opaque;
 	Relation	rel;
 	Page		page;
 	BTPageOpaque opaque;
+	OffsetNumber offnum;
 	bool		status = true;
 
 	rel = scan->indexRelation;
@@ -2003,7 +1831,8 @@ _bt_readnextpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir)
 				PredicateLockPage(rel, blkno, scan->xs_snapshot);
 				/* see if there are any matches on this page */
 				/* note that this will clear moreRight if we can stop */
-				if (_bt_readpage(scan, dir, P_FIRSTDATAKEY(opaque)))
+				offnum = P_FIRSTDATAKEY(opaque);
+				if (_bt_readpage(scan, dir, &offnum, true))
 					break;
 			}
 			else if (scan->parallel_scan != NULL)
@@ -2105,7 +1934,8 @@ _bt_readnextpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir)
 				PredicateLockPage(rel, BufferGetBlockNumber(so->currPos.buf), scan->xs_snapshot);
 				/* see if there are any matches on this page */
 				/* note that this will clear moreLeft if we can stop */
-				if (_bt_readpage(scan, dir, PageGetMaxOffsetNumber(page)))
+				offnum = PageGetMaxOffsetNumber(page);
+				if (_bt_readpage(scan, dir, &offnum, true))
 					break;
 			}
 			else if (scan->parallel_scan != NULL)
@@ -2173,7 +2003,7 @@ _bt_parallel_readpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir)
  * to be half-dead; the caller should check that condition and step left
  * again if it's important.
  */
-static Buffer
+Buffer
 _bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot)
 {
 	Page		page;
@@ -2437,7 +2267,7 @@ _bt_endpoint(IndexScanDesc scan, ScanDirection dir)
 	/*
 	 * Now load data from the first page of the scan.
 	 */
-	if (!_bt_readpage(scan, dir, start))
+	if (!_bt_readpage(scan, dir, &start, true))
 	{
 		/*
 		 * There's no actually-matching data on this page.  Try to advance to
@@ -2466,7 +2296,7 @@ _bt_endpoint(IndexScanDesc scan, ScanDirection dir)
  * _bt_initialize_more_data() -- initialize moreLeft/moreRight appropriately
  * for scan direction
  */
-static inline void
+inline void
 _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir)
 {
 	/* initialize moreLeft/moreRight appropriately for scan direction */
@@ -2483,3 +2313,25 @@ _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir)
 	so->numKilled = 0;			/* just paranoia */
 	so->markItemIndex = -1;		/* ditto */
 }
+
+/* Forward the call to either _bt_checkkeys, which is a simple
+ * and fastest way of checking keys, or to _bt_checkkeys_skip,
+ * which is a slower way to check the keys, but it will return extra
+ * information about whether or not we should stop reading the current page
+ * and skip. The expensive checking is only necessary when !isRegularMode, eg.
+ * when prefixDir!=postfixDir, which only happens when scanning from cursors backwards
+ */
+static inline bool
+_bt_checkkeys_extended(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
+					   ScanDirection dir, bool isRegularMode,
+					   bool *continuescan, int *prefixskipindex)
+{
+	if (isRegularMode)
+	{
+		return _bt_checkkeys(scan, tuple, tupnatts, dir, continuescan, prefixskipindex);
+	}
+	else
+	{
+		return _bt_checkkeys_skip(scan, tuple, tupnatts, dir, continuescan, prefixskipindex);
+	}
+}
diff --git a/src/backend/access/nbtree/nbtskip.c b/src/backend/access/nbtree/nbtskip.c
new file mode 100644
index 0000000000..7850230b9f
--- /dev/null
+++ b/src/backend/access/nbtree/nbtskip.c
@@ -0,0 +1,1317 @@
+/*-------------------------------------------------------------------------
+ *
+ * nbtskip.c
+ *	  Search code related to skip scan for postgres btrees.
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/nbtree/nbtskip.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/nbtree.h"
+#include "access/relscan.h"
+#include "catalog/catalog.h"
+#include "miscadmin.h"
+#include "utils/guc.h"
+#include "storage/predicate.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+
+static inline void _bt_update_scankey_with_tuple(BTScanInsert scankeys,
+											Relation indexRel, IndexTuple itup, int numattrs);
+static inline bool _bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key, Buffer buf);
+static inline int32 _bt_compare_until(Relation rel, BTScanInsert key, IndexTuple itup, int prefix);
+static inline void
+_bt_determine_next_action(IndexScanDesc scan, BTSkipCompareResult *cmp, OffsetNumber firstOffnum,
+						  OffsetNumber lastOffnum, ScanDirection postfixDir, BTSkipState *nextAction);
+static inline void
+_bt_determine_next_action_after_skip(BTScanOpaque so, BTSkipCompareResult *cmp, ScanDirection prefixDir,
+									 ScanDirection postfixDir, int skipped, BTSkipState *nextAction);
+static inline void
+_bt_determine_next_action_after_skip_extra(BTScanOpaque so, BTSkipCompareResult *cmp, BTSkipState *nextAction);
+static inline void _bt_copy_scankey(BTScanInsert to, BTScanInsert from, int numattrs);
+static inline IndexTuple _bt_get_tuple_from_offset(BTScanOpaque so, OffsetNumber curTupleOffnum);
+static void _bt_skip_update_scankey_after_read(IndexScanDesc scan, IndexTuple curTuple,
+											   ScanDirection prefixDir, ScanDirection postfixDir);
+static void _bt_skip_update_scankey_for_prefix_skip(IndexScanDesc scan, Relation indexRel,
+										int prefix, IndexTuple itup, ScanDirection prefixDir);
+static bool _bt_try_in_page_skip(IndexScanDesc scan, ScanDirection prefixDir);
+
+/*
+ * returns whether we're at the end of a scan.
+ * the scan position can be invalid even though we still
+ * should continue the scan. this happens for example when
+ * we're scanning with prefixDir!=postfixDir. when looking at the first
+ * prefix, we traverse the items within the prefix from max to min.
+ * if none of them match, we actually run off the start of the index,
+ * meaning none of the tuples within this prefix match. the scan pos becomes
+ * invalid, however, we do need to look further to the next prefix.
+ * therefore, this function still returns true in this particular case.
+ */
+static inline bool
+_bt_skip_is_valid(BTScanOpaque so, ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	return BTScanPosIsValid(so->currPos) ||
+			(!_bt_skip_is_regular_mode(prefixDir, postfixDir) &&
+			 so->skipData->curPos.nextAction != SkipStateStop);
+}
+
+/* try finding the next tuple to skip to within the local tuple storage.
+ * local tuple storage is filled during _bt_readpage with all matching
+ * tuples on that page. if we can find the next prefix here it saves
+ * us doing a scan from root.
+ * Note that this optimization only works with _bt_regular_mode == true
+ * If this is not the case, the local tuple workspace will always only
+ * contain tuples of one specific prefix (_bt_readpage will stop at
+ * the end of a prefx)
+ */
+static bool
+_bt_try_in_page_skip(IndexScanDesc scan, ScanDirection prefixDir)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+	BTScanPosItem *currItem;
+	BTSkip skip = so->skipData;
+	IndexTuple itup = NULL;
+	bool goback;
+	int low, high, starthigh, startlow;
+	int32		result,
+				cmpval;
+	BTScanInsert key = &so->skipData->curPos.skipScanKey;
+
+	currItem = &so->currPos.items[so->currPos.itemIndex];
+	itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+
+	_bt_skip_update_scankey_for_prefix_skip(scan, scan->indexRelation, skip->prefix, itup, prefixDir);
+
+	_bt_set_bsearch_flags(key->scankeys[key->keysz - 1].sk_strategy, prefixDir, &key->nextkey, &goback);
+
+	/* Requesting nextkey semantics while using scantid seems nonsensical */
+	Assert(!key->nextkey || key->scantid == NULL);
+	/* scantid-set callers must use _bt_binsrch_insert() on leaf pages */
+
+	startlow = low = ScanDirectionIsForward(prefixDir) ? so->currPos.itemIndex + 1 : so->currPos.firstItem;
+	starthigh = high = ScanDirectionIsForward(prefixDir) ? so->currPos.lastItem : so->currPos.itemIndex - 1;
+
+	/*
+	 * If there are no keys on the page, return the first available slot. Note
+	 * this covers two cases: the page is really empty (no keys), or it
+	 * contains only a high key.  The latter case is possible after vacuuming.
+	 * This can never happen on an internal page, however, since they are
+	 * never empty (an internal page must have children).
+	 */
+	if (unlikely(high < low))
+		return false;
+
+	/*
+	 * Binary search to find the first key on the page >= scan key, or first
+	 * key > scankey when nextkey is true.
+	 *
+	 * For nextkey=false (cmpval=1), the loop invariant is: all slots before
+	 * 'low' are < scan key, all slots at or after 'high' are >= scan key.
+	 *
+	 * For nextkey=true (cmpval=0), the loop invariant is: all slots before
+	 * 'low' are <= scan key, all slots at or after 'high' are > scan key.
+	 *
+	 * We can fall out when high == low.
+	 */
+	high++;						/* establish the loop invariant for high */
+
+	cmpval = key->nextkey ? 0 : 1;	/* select comparison value */
+
+	while (high > low)
+	{
+		int mid = low + ((high - low) / 2);
+
+		/* We have low <= mid < high, so mid points at a real slot */
+
+		currItem = &so->currPos.items[mid];
+		itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+		result = _bt_compare_until(scan->indexRelation, key, itup, skip->prefix);
+
+		if (result >= cmpval)
+			low = mid + 1;
+		else
+			high = mid;
+	}
+
+	if (high > starthigh)
+		return false;
+
+	if (goback)
+	{
+		low--;
+		if (low < startlow)
+			return false;
+	}
+
+	so->currPos.itemIndex = low;
+
+	return true;
+}
+
+/*
+ *  _bt_skip() -- Skip items that have the same prefix as the most recently
+ * 				  fetched index tuple.
+ *
+ * in: pinned, not locked
+ * out: pinned, not locked (unless end of scan, then unpinned)
+ */
+bool
+_bt_skip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+	BTScanPosItem *currItem;
+	IndexTuple itup = NULL;
+	OffsetNumber curTupleOffnum = InvalidOffsetNumber;
+	BTSkipCompareResult cmp;
+	BTSkip skip = so->skipData;
+	OffsetNumber first;
+
+	/* in page skip only works when prefixDir == postfixDir */
+	if (!_bt_skip_is_regular_mode(prefixDir, postfixDir) || !_bt_try_in_page_skip(scan, prefixDir))
+	{
+		currItem = &so->currPos.items[so->currPos.itemIndex];
+		itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+
+		so->skipData->curPos.nextSkipIndex = so->skipData->prefix;
+		_bt_skip_once(scan, &itup, &curTupleOffnum, true, prefixDir, postfixDir);
+		_bt_skip_until_match(scan, &itup, &curTupleOffnum, prefixDir, postfixDir);
+		if (!_bt_skip_is_always_valid(so))
+			return false;
+
+		first = curTupleOffnum;
+		_bt_readpage(scan, postfixDir, &curTupleOffnum, _bt_skip_is_regular_mode(prefixDir, postfixDir));
+		if (DEBUG1 >= log_min_messages || DEBUG1 >= client_min_messages)
+		{
+			print_itup(BufferGetBlockNumber(so->currPos.buf), _bt_get_tuple_from_offset(so, first), NULL, scan->indexRelation,
+						"first item on page compared after skip");
+			print_itup(BufferGetBlockNumber(so->currPos.buf), _bt_get_tuple_from_offset(so, curTupleOffnum), NULL, scan->indexRelation,
+						"last item on page compared after skip");
+		}
+		_bt_compare_current_item(scan, _bt_get_tuple_from_offset(so, curTupleOffnum),
+								 IndexRelationGetNumberOfAttributes(scan->indexRelation),
+								 postfixDir, _bt_skip_is_regular_mode(prefixDir, postfixDir), &cmp);
+		_bt_determine_next_action(scan, &cmp, first, curTupleOffnum, postfixDir, &skip->curPos.nextAction);
+		skip->curPos.nextDirection = prefixDir;
+		skip->curPos.nextSkipIndex = cmp.prefixSkipIndex;
+		_bt_skip_update_scankey_after_read(scan, _bt_get_tuple_from_offset(so, curTupleOffnum), prefixDir, postfixDir);
+
+		_bt_drop_lock_and_maybe_pin(scan, &so->currPos);
+	}
+
+	/* prepare for the call to _bt_next, because _bt_next increments this to get to the tuple we want to be at */
+	if (ScanDirectionIsForward(postfixDir))
+		so->currPos.itemIndex--;
+	else
+		so->currPos.itemIndex++;
+
+	return true;
+}
+
+static IndexTuple
+_bt_get_tuple_from_offset(BTScanOpaque so, OffsetNumber curTupleOffnum)
+{
+	Page page = BufferGetPage(so->currPos.buf);
+	return (IndexTuple) PageGetItem(page, PageGetItemId(page, curTupleOffnum));
+}
+
+static void
+_bt_determine_next_action(IndexScanDesc scan, BTSkipCompareResult *cmp, OffsetNumber firstOffnum, OffsetNumber lastOffnum, ScanDirection postfixDir, BTSkipState *nextAction)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+
+	if (cmp->fullKeySkip)
+		*nextAction = SkipStateStop;
+	else if (ScanDirectionIsForward(postfixDir))
+	{
+		OffsetNumber firstItem = firstOffnum, lastItem = lastOffnum;
+		if (cmp->prefixSkip)
+		{
+			*nextAction = SkipStateSkip;
+		}
+		else
+		{
+			IndexTuple toCmp;
+			if (so->currPos.lastItem >= so->currPos.firstItem)
+				toCmp = _bt_get_tuple_from_offset(so, so->currPos.items[so->currPos.lastItem].indexOffset);
+			else
+				toCmp = _bt_get_tuple_from_offset(so, firstItem);
+			_bt_update_scankey_with_tuple(&so->skipData->currentTupleKey,
+										  scan->indexRelation, toCmp, RelationGetNumberOfAttributes(scan->indexRelation));
+			if (_bt_has_extra_quals_after_skip(so->skipData, postfixDir, so->skipData->prefix) && !cmp->equal &&
+					(cmp->prefixCmpResult != 0 ||
+					 _bt_compare_until(scan->indexRelation, &so->skipData->currentTupleKey,
+									   _bt_get_tuple_from_offset(so, lastItem), so->skipData->prefix) != 0))
+				*nextAction = SkipStateSkipExtra;
+			else
+				*nextAction = SkipStateNext;
+		}
+	}
+	else
+	{
+		OffsetNumber firstItem = lastOffnum, lastItem = firstOffnum;
+		if (cmp->prefixSkip)
+		{
+			*nextAction = SkipStateSkip;
+		}
+		else
+		{
+			IndexTuple toCmp;
+			if (so->currPos.lastItem >= so->currPos.firstItem)
+				toCmp = _bt_get_tuple_from_offset(so, so->currPos.items[so->currPos.firstItem].indexOffset);
+			else
+				toCmp = _bt_get_tuple_from_offset(so, lastItem);
+			_bt_update_scankey_with_tuple(&so->skipData->currentTupleKey,
+										  scan->indexRelation, toCmp, RelationGetNumberOfAttributes(scan->indexRelation));
+			if (_bt_has_extra_quals_after_skip(so->skipData, postfixDir, so->skipData->prefix) && !cmp->equal &&
+					(cmp->prefixCmpResult != 0 ||
+					 _bt_compare_until(scan->indexRelation, &so->skipData->currentTupleKey,
+									   _bt_get_tuple_from_offset(so, firstItem), so->skipData->prefix) != 0))
+				*nextAction = SkipStateSkipExtra;
+			else
+				*nextAction = SkipStateNext;
+		}
+	}
+}
+
+static inline bool
+_bt_should_prefix_skip(BTSkipCompareResult *cmp)
+{
+	return cmp->prefixSkip || cmp->prefixCmpResult != 0;
+}
+
+static inline void
+_bt_determine_next_action_after_skip(BTScanOpaque so, BTSkipCompareResult *cmp, ScanDirection prefixDir,
+									 ScanDirection postfixDir, int skipped, BTSkipState *nextAction)
+{
+	if (!_bt_skip_is_always_valid(so) || cmp->fullKeySkip)
+		*nextAction = SkipStateStop;
+	else if (cmp->equal && _bt_skip_is_regular_mode(prefixDir, postfixDir))
+		*nextAction = SkipStateNext;
+	else if (_bt_should_prefix_skip(cmp) && _bt_skip_is_regular_mode(prefixDir, postfixDir) &&
+			 ((ScanDirectionIsForward(prefixDir) && cmp->skCmpResult == -1) ||
+			  (ScanDirectionIsBackward(prefixDir) && cmp->skCmpResult == 1)))
+		*nextAction = SkipStateSkip;
+	else if (!_bt_skip_is_regular_mode(prefixDir, postfixDir) ||
+			 _bt_has_extra_quals_after_skip(so->skipData, postfixDir, skipped) ||
+			 cmp->prefixCmpResult != 0)
+		*nextAction = SkipStateSkipExtra;
+	else
+		*nextAction = SkipStateNext;
+}
+
+static inline void
+_bt_determine_next_action_after_skip_extra(BTScanOpaque so, BTSkipCompareResult *cmp, BTSkipState *nextAction)
+{
+	if (!_bt_skip_is_always_valid(so) || cmp->fullKeySkip)
+		*nextAction = SkipStateStop;
+	else if (cmp->equal)
+		*nextAction = SkipStateNext;
+	else if (_bt_should_prefix_skip(cmp))
+		*nextAction = SkipStateSkip;
+	else
+		*nextAction = SkipStateNext;
+}
+
+/* just a debug function that prints a scankey. will be removed for final patch */
+static inline void
+_print_skey(IndexScanDesc scan, BTScanInsert scanKey)
+{
+	Oid			typOutput;
+	bool		varlenatype;
+	char	   *val;
+	int i;
+	Relation rel = scan->indexRelation;
+
+	for (i = 0; i < scanKey->keysz; i++)
+	{
+		ScanKey cur = &scanKey->scankeys[i];
+		if (!IsCatalogRelation(rel))
+		{
+			if (!(cur->sk_flags & SK_ISNULL))
+			{
+				if (cur->sk_subtype != InvalidOid)
+					getTypeOutputInfo(cur->sk_subtype,
+									  &typOutput, &varlenatype);
+				else
+					getTypeOutputInfo(rel->rd_opcintype[i],
+									  &typOutput, &varlenatype);
+				val = OidOutputFunctionCall(typOutput, cur->sk_argument);
+				if (val)
+				{
+					elog(DEBUG1, "%s sk attr %d val: %s (%s, %s)",
+						 RelationGetRelationName(rel), i, val,
+						 (cur->sk_flags & SK_BT_NULLS_FIRST) != 0 ? "NULLS FIRST" : "NULLS LAST",
+						 (cur->sk_flags & SK_BT_DESC) != 0 ? "DESC" : "ASC");
+					pfree(val);
+				}
+			}
+			else
+			{
+				elog(DEBUG1, "%s sk attr %d val: NULL (%s, %s)",
+					 RelationGetRelationName(rel), i,
+					 (cur->sk_flags & SK_BT_NULLS_FIRST) != 0 ? "NULLS FIRST" : "NULLS LAST",
+					 (cur->sk_flags & SK_BT_DESC) != 0 ? "DESC" : "ASC");
+			}
+		}
+	}
+}
+
+bool
+_bt_checkkeys_skip(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
+				   ScanDirection dir, bool *continuescan, int *prefixskipindex)
+{
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+
+	bool match = _bt_checkkeys(scan, tuple, tupnatts, dir, continuescan, prefixskipindex);
+	int prefixCmpResult = _bt_compare_until(scan->indexRelation, &skip->curPos.skipScanKey, tuple, skip->prefix);
+	if (*prefixskipindex == -1 && prefixCmpResult != 0)
+	{
+		*prefixskipindex = skip->prefix;
+		return false;
+	}
+	else
+	{
+		bool newcont;
+		_bt_checkkeys_threeway(scan, tuple, tupnatts, dir, &newcont, prefixskipindex);
+		if (*prefixskipindex == -1 && prefixCmpResult != 0)
+		{
+			*prefixskipindex = skip->prefix;
+			return false;
+		}
+	}
+	return match;
+}
+
+/*
+ * Compare a scankey with a given tuple but only the first prefix columns
+ * This function returns 0 if the first 'prefix' columns are equal
+ * -1 if key < itup for the first prefix columns
+ * 1 if key > itup for the first prefix columns
+ */
+int32
+_bt_compare_until(Relation rel,
+			BTScanInsert key,
+			IndexTuple itup,
+			int prefix)
+{
+	TupleDesc	itupdesc = RelationGetDescr(rel);
+	ScanKey		scankey;
+	int			ncmpkey;
+
+	Assert(key->keysz <= IndexRelationGetNumberOfKeyAttributes(rel));
+
+	ncmpkey = Min(prefix, key->keysz);
+	scankey = key->scankeys;
+	for (int i = 1; i <= ncmpkey; i++)
+	{
+		Datum		datum;
+		bool		isNull;
+		int32		result;
+
+		datum = index_getattr(itup, scankey->sk_attno, itupdesc, &isNull);
+
+		/* see comments about NULLs handling in btbuild */
+		if (scankey->sk_flags & SK_ISNULL)	/* key is NULL */
+		{
+			if (isNull)
+				result = 0;		/* NULL "=" NULL */
+			else if (scankey->sk_flags & SK_BT_NULLS_FIRST)
+				result = -1;	/* NULL "<" NOT_NULL */
+			else
+				result = 1;		/* NULL ">" NOT_NULL */
+		}
+		else if (isNull)		/* key is NOT_NULL and item is NULL */
+		{
+			if (scankey->sk_flags & SK_BT_NULLS_FIRST)
+				result = 1;		/* NOT_NULL ">" NULL */
+			else
+				result = -1;	/* NOT_NULL "<" NULL */
+		}
+		else
+		{
+			/*
+			 * The sk_func needs to be passed the index value as left arg and
+			 * the sk_argument as right arg (they might be of different
+			 * types).  Since it is convenient for callers to think of
+			 * _bt_compare as comparing the scankey to the index item, we have
+			 * to flip the sign of the comparison result.  (Unless it's a DESC
+			 * column, in which case we *don't* flip the sign.)
+			 */
+			result = DatumGetInt32(FunctionCall2Coll(&scankey->sk_func,
+													 scankey->sk_collation,
+													 datum,
+													 scankey->sk_argument));
+
+			if (!(scankey->sk_flags & SK_BT_DESC))
+				INVERT_COMPARE_RESULT(result);
+		}
+
+		/* if the keys are unequal, return the difference */
+		if (result != 0)
+			return result;
+
+		scankey++;
+	}
+	return 0;
+}
+
+
+/*
+ * Create initial scankeys for skipping and stores them in the skipData
+ * structure
+ */
+void
+_bt_skip_create_scankeys(Relation rel, BTScanOpaque so)
+{
+	int keysCount;
+	BTSkip skip = so->skipData;
+	StrategyNumber stratTotal;
+	ScanKey		keyPointers[INDEX_MAX_KEYS];
+	bool goback;
+	/* we need to create both forward and backward keys because the scan direction
+	 * may change at any moment in scans with a cursor.
+	 * we could technically delay creation of the second until first use as an optimization
+	 * but that is not implemented yet.
+	 */
+	keysCount = _bt_choose_scan_keys(so->keyData, so->numberOfKeys, ForwardScanDirection,
+									 keyPointers, skip->fwdNotNullKeys, &stratTotal, skip->prefix);
+	_bt_create_insertion_scan_key(rel, ForwardScanDirection, keyPointers, keysCount,
+								  &skip->fwdScanKey, &stratTotal, &goback);
+
+	keysCount = _bt_choose_scan_keys(so->keyData, so->numberOfKeys, BackwardScanDirection,
+									 keyPointers, skip->bwdNotNullKeys, &stratTotal, skip->prefix);
+	_bt_create_insertion_scan_key(rel, BackwardScanDirection, keyPointers, keysCount,
+								  &skip->bwdScanKey, &stratTotal, &goback);
+
+	_bt_metaversion(rel, &skip->curPos.skipScanKey.heapkeyspace,
+					&skip->curPos.skipScanKey.allequalimage);
+	skip->curPos.skipScanKey.anynullkeys = false; /* unused */
+	skip->curPos.skipScanKey.nextkey = false;
+	skip->curPos.skipScanKey.pivotsearch = false;
+	skip->curPos.skipScanKey.scantid = NULL;
+	skip->curPos.skipScanKey.keysz = 0;
+
+	/* setup scankey for the current tuple as well. it's not necessarily that
+	 * we will use the data from the current tuple already,
+	 * but we need the rest of the data structure to be set up correctly
+	 * for when we use it to create skip->curPos.skipScanKey keys later
+	 */
+	_bt_mkscankey(rel, NULL, &skip->currentTupleKey);
+}
+
+/*
+ * _bt_scankey_within_page() -- check if the provided scankey could be found
+ * 								within a page, specified by the buffer.
+ */
+static inline bool
+_bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+						Buffer buf)
+{
+	/* @todo: optimization is still possible here to
+	 * only check either the low or the high, depending on
+	 * which direction *we came from* AND which direction
+	 * *we are planning to scan*
+	 */
+	OffsetNumber low, high;
+	Page page = BufferGetPage(buf);
+	BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+	int			ans_lo, ans_hi;
+
+	low = P_FIRSTDATAKEY(opaque);
+	high = PageGetMaxOffsetNumber(page);
+
+	if (unlikely(high < low))
+		return false;
+
+	ans_lo = _bt_compare(scan->indexRelation,
+					   key, page, low);
+	ans_hi = _bt_compare(scan->indexRelation,
+					   key, page, high);
+	if (key->nextkey)
+	{
+		/* sk < last && sk >= first */
+		return ans_lo >= 0 && ans_hi == -1;
+	}
+	else
+	{
+		/* sk <= last && sk > first */
+		return ans_lo == 1 && ans_hi <= 0;
+	}
+}
+
+/* in: pinned and locked, out: pinned and locked (unless end of scan) */
+static void
+_bt_skip_find(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum,
+			  BTScanInsert scanKey, ScanDirection dir)
+{
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	OffsetNumber offnum;
+	BTStack stack;
+	Buffer buf;
+	bool goback;
+	Page		page;
+	BTPageOpaque opaque;
+	OffsetNumber minoff;
+	Relation rel = scan->indexRelation;
+	bool fromroot = true;
+
+	_bt_set_bsearch_flags(scanKey->scankeys[scanKey->keysz - 1].sk_strategy, dir, &scanKey->nextkey, &goback);
+
+	if ((DEBUG1 >= log_min_messages || DEBUG1 >= client_min_messages) && !IsCatalogRelation(rel))
+	{
+		if (*curTuple != NULL)
+			print_itup(BufferGetBlockNumber(so->currPos.buf), *curTuple, NULL, rel,
+						"before btree search");
+
+		elog(DEBUG1, "%s searching tree with %d keys, nextkey=%d, goback=%d",
+			 RelationGetRelationName(rel), scanKey->keysz, scanKey->nextkey,
+			 goback);
+
+		_print_skey(scan, scanKey);
+	}
+
+	if (*curTupleOffnum == InvalidOffsetNumber)
+	{
+		BTScanPosUnpinIfPinned(so->currPos);
+	}
+	else
+	{
+		if (_bt_scankey_within_page(scan, scanKey, so->currPos.buf))
+		{
+			elog(DEBUG1, "sk found within current page");
+
+			offnum = _bt_binsrch(scan->indexRelation, scanKey, so->currPos.buf);
+			fromroot = false;
+		}
+		else
+		{
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			ReleaseBuffer(so->currPos.buf);
+			so->currPos.buf = InvalidBuffer;
+		}
+	}
+
+	/*
+	 * We haven't found scan key within the current page, so let's scan from
+	 * the root. Use _bt_search and _bt_binsrch to get the buffer and offset
+	 * number
+	 */
+	if (fromroot)
+	{
+		stack = _bt_search(scan->indexRelation, scanKey,
+						   &buf, BT_READ, scan->xs_snapshot);
+		_bt_freestack(stack);
+		so->currPos.buf = buf;
+
+		offnum = _bt_binsrch(scan->indexRelation, scanKey, buf);
+
+		/* Lock the page for SERIALIZABLE transactions */
+		PredicateLockPage(scan->indexRelation, BufferGetBlockNumber(so->currPos.buf),
+						  scan->xs_snapshot);
+	}
+
+	page = BufferGetPage(so->currPos.buf);
+	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+
+	if (goback)
+	{
+		offnum = OffsetNumberPrev(offnum);
+		minoff = P_FIRSTDATAKEY(opaque);
+		if (offnum < minoff)
+		{
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			if (!_bt_step_back_page(scan, curTuple, curTupleOffnum))
+				return;
+			page = BufferGetPage(so->currPos.buf);
+			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+			offnum = PageGetMaxOffsetNumber(page);
+		}
+	}
+	else if (offnum > PageGetMaxOffsetNumber(page))
+	{
+		BlockNumber next = opaque->btpo_next;
+		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		if (!_bt_step_forward_page(scan, next, curTuple, curTupleOffnum))
+			return;
+		page = BufferGetPage(so->currPos.buf);
+		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+		offnum = P_FIRSTDATAKEY(opaque);
+	}
+
+	/* We know in which direction to look */
+	_bt_initialize_more_data(so, dir);
+
+	*curTupleOffnum = offnum;
+	*curTuple = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
+	so->currPos.currPage = BufferGetBlockNumber(so->currPos.buf);
+
+	if (DEBUG1 >= log_min_messages || DEBUG1 >= client_min_messages)
+		print_itup(BufferGetBlockNumber(so->currPos.buf), *curTuple, NULL, rel,
+					"after btree search");
+}
+
+static inline bool
+_bt_step_one_page(IndexScanDesc scan, ScanDirection dir, IndexTuple *curTuple,
+				  OffsetNumber *curTupleOffnum)
+{
+	if (ScanDirectionIsForward(dir))
+	{
+		BTScanOpaque so = (BTScanOpaque) scan->opaque;
+		return _bt_step_forward_page(scan, so->currPos.nextPage, curTuple, curTupleOffnum);
+	}
+	else
+	{
+		return _bt_step_back_page(scan, curTuple, curTupleOffnum);
+	}
+}
+
+/* in: possibly pinned, but unlocked, out: pinned and locked */
+bool
+_bt_step_forward_page(IndexScanDesc scan, BlockNumber next, IndexTuple *curTuple,
+					  OffsetNumber *curTupleOffnum)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+	Relation rel = scan->indexRelation;
+	BlockNumber blkno = next;
+	Page page;
+	BTPageOpaque opaque;
+
+	Assert(BTScanPosIsValid(so->currPos));
+
+	/* Before leaving current page, deal with any killed items */
+	if (so->numKilled > 0)
+		_bt_killitems(scan);
+
+	/*
+	 * Before we modify currPos, make a copy of the page data if there was a
+	 * mark position that needs it.
+	 */
+	if (so->markItemIndex >= 0)
+	{
+		/* bump pin on current buffer for assignment to mark buffer */
+		if (BTScanPosIsPinned(so->currPos))
+			IncrBufferRefCount(so->currPos.buf);
+		memcpy(&so->markPos, &so->currPos,
+			   offsetof(BTScanPosData, items[1]) +
+			   so->currPos.lastItem * sizeof(BTScanPosItem));
+		if (so->markTuples)
+			memcpy(so->markTuples, so->currTuples,
+				   so->currPos.nextTupleOffset);
+		so->markPos.itemIndex = so->markItemIndex;
+		if (so->skipData)
+			memcpy(&so->skipData->markPos, &so->skipData->curPos,
+				   sizeof(BTSkipPosData));
+		so->markItemIndex = -1;
+	}
+
+	/* Remember we left a page with data */
+	so->currPos.moreLeft = true;
+
+	/* release the previous buffer, if pinned */
+	BTScanPosUnpinIfPinned(so->currPos);
+
+	{
+		for (;;)
+		{
+			/*
+			 * if we're at end of scan, give up and mark parallel scan as
+			 * done, so that all the workers can finish their scan
+			 */
+			if (blkno == P_NONE)
+			{
+				_bt_parallel_done(scan);
+				BTScanPosInvalidate(so->currPos);
+				return false;
+			}
+
+			/* check for interrupts while we're not holding any buffer lock */
+			CHECK_FOR_INTERRUPTS();
+			/* step right one page */
+			so->currPos.buf = _bt_getbuf(rel, blkno, BT_READ);
+			page = BufferGetPage(so->currPos.buf);
+			TestForOldSnapshot(scan->xs_snapshot, rel, page);
+			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+			/* check for deleted page */
+			if (!P_IGNORE(opaque))
+			{
+				PredicateLockPage(rel, blkno, scan->xs_snapshot);
+				*curTupleOffnum = P_FIRSTDATAKEY(opaque);
+				*curTuple = _bt_get_tuple_from_offset(so, *curTupleOffnum);
+				break;
+			}
+
+			blkno = opaque->btpo_next;
+			_bt_relbuf(rel, so->currPos.buf);
+		}
+	}
+
+	return true;
+}
+
+/* in: possibly pinned, but unlocked, out: pinned and locked */
+bool
+_bt_step_back_page(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+
+	Assert(BTScanPosIsValid(so->currPos));
+
+	/* Before leaving current page, deal with any killed items */
+	if (so->numKilled > 0)
+		_bt_killitems(scan);
+
+	/*
+	 * Before we modify currPos, make a copy of the page data if there was a
+	 * mark position that needs it.
+	 */
+	if (so->markItemIndex >= 0)
+	{
+		/* bump pin on current buffer for assignment to mark buffer */
+		if (BTScanPosIsPinned(so->currPos))
+			IncrBufferRefCount(so->currPos.buf);
+		memcpy(&so->markPos, &so->currPos,
+			   offsetof(BTScanPosData, items[1]) +
+			   so->currPos.lastItem * sizeof(BTScanPosItem));
+		if (so->markTuples)
+			memcpy(so->markTuples, so->currTuples,
+				   so->currPos.nextTupleOffset);
+		if (so->skipData)
+			memcpy(&so->skipData->markPos, &so->skipData->curPos,
+				   sizeof(BTSkipPosData));
+		so->markPos.itemIndex = so->markItemIndex;
+		so->markItemIndex = -1;
+	}
+
+	/* Remember we left a page with data */
+	so->currPos.moreRight = true;
+
+	/* Not parallel, so just use our own notion of the current page */
+
+	{
+		Relation	rel;
+		Page		page;
+		BTPageOpaque opaque;
+
+		rel = scan->indexRelation;
+
+		if (BTScanPosIsPinned(so->currPos))
+			LockBuffer(so->currPos.buf, BT_READ);
+		else
+			so->currPos.buf = _bt_getbuf(rel, so->currPos.currPage, BT_READ);
+
+		for (;;)
+		{
+			/* Step to next physical page */
+			so->currPos.buf = _bt_walk_left(rel, so->currPos.buf,
+											scan->xs_snapshot);
+
+			/* if we're physically at end of index, return failure */
+			if (so->currPos.buf == InvalidBuffer)
+			{
+				BTScanPosInvalidate(so->currPos);
+				return false;
+			}
+
+			/*
+			 * Okay, we managed to move left to a non-deleted page. Done if
+			 * it's not half-dead and contains matching tuples. Else loop back
+			 * and do it all again.
+			 */
+			page = BufferGetPage(so->currPos.buf);
+			TestForOldSnapshot(scan->xs_snapshot, rel, page);
+			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+			if (!P_IGNORE(opaque))
+			{
+				PredicateLockPage(rel, BufferGetBlockNumber(so->currPos.buf), scan->xs_snapshot);
+				*curTupleOffnum = PageGetMaxOffsetNumber(page);
+				*curTuple = _bt_get_tuple_from_offset(so, *curTupleOffnum);
+				break;
+			}
+		}
+	}
+
+	return true;
+}
+
+/* holds lock as long as curTupleOffnum != InvalidOffsetNumber */
+bool
+_bt_skip_find_next(IndexScanDesc scan, IndexTuple curTuple, OffsetNumber curTupleOffnum,
+				   ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+	BTSkipCompareResult cmp;
+
+	while (_bt_skip_is_valid(so, prefixDir, postfixDir))
+	{
+		bool found;
+		_bt_skip_until_match(scan, &curTuple, &curTupleOffnum, prefixDir, postfixDir);
+
+		while (_bt_skip_is_always_valid(so))
+		{
+			OffsetNumber first = curTupleOffnum;
+			found = _bt_readpage(scan, postfixDir, &curTupleOffnum,
+								 _bt_skip_is_regular_mode(prefixDir, postfixDir));
+			if (DEBUG1 >= log_min_messages || DEBUG1 >= client_min_messages)
+			{
+				print_itup(BufferGetBlockNumber(so->currPos.buf),
+						   _bt_get_tuple_from_offset(so, first), NULL, scan->indexRelation,
+							"first item on page compared");
+				print_itup(BufferGetBlockNumber(so->currPos.buf),
+						   _bt_get_tuple_from_offset(so, curTupleOffnum), NULL, scan->indexRelation,
+							"last item on page compared");
+			}
+			_bt_compare_current_item(scan, _bt_get_tuple_from_offset(so, curTupleOffnum),
+									 IndexRelationGetNumberOfAttributes(scan->indexRelation),
+									 postfixDir, _bt_skip_is_regular_mode(prefixDir, postfixDir), &cmp);
+			_bt_determine_next_action(scan, &cmp, first, curTupleOffnum,
+									  postfixDir, &skip->curPos.nextAction);
+			skip->curPos.nextDirection = prefixDir;
+			skip->curPos.nextSkipIndex = cmp.prefixSkipIndex;
+
+			if (found)
+			{
+				_bt_skip_update_scankey_after_read(scan, _bt_get_tuple_from_offset(so, curTupleOffnum),
+												   prefixDir, postfixDir);
+				return true;
+			}
+			else if (skip->curPos.nextAction == SkipStateNext)
+			{
+				if (curTupleOffnum != InvalidOffsetNumber)
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+				if (!_bt_step_one_page(scan, postfixDir, &curTuple, &curTupleOffnum))
+					return false;
+			}
+			else if (skip->curPos.nextAction == SkipStateSkip || skip->curPos.nextAction == SkipStateSkipExtra)
+			{
+				curTuple = _bt_get_tuple_from_offset(so, curTupleOffnum);
+				_bt_skip_update_scankey_after_read(scan, curTuple, prefixDir, postfixDir);
+				LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+				curTupleOffnum = InvalidOffsetNumber;
+				curTuple = NULL;
+				break;
+			}
+			else if (skip->curPos.nextAction == SkipStateStop)
+			{
+				LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+				BTScanPosUnpinIfPinned(so->currPos);
+				BTScanPosInvalidate(so->currPos);
+				return false;
+			}
+			else
+			{
+				Assert(false);
+			}
+		}
+	}
+	return false;
+}
+
+void
+_bt_skip_until_match(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum,
+					 ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+	while (_bt_skip_is_valid(so, prefixDir, postfixDir) &&
+		   (skip->curPos.nextAction == SkipStateSkip || skip->curPos.nextAction == SkipStateSkipExtra))
+	{
+		_bt_skip_once(scan, curTuple, curTupleOffnum,
+					  skip->curPos.nextAction == SkipStateSkip, prefixDir, postfixDir);
+	}
+}
+
+void
+_bt_compare_current_item(IndexScanDesc scan, IndexTuple tuple, int tupnatts, ScanDirection dir,
+						 bool isRegularMode, BTSkipCompareResult* cmp)
+{
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+
+	if (_bt_skip_is_always_valid(so))
+	{
+		bool continuescan = true;
+
+		cmp->equal = _bt_checkkeys(scan, tuple, tupnatts, dir, &continuescan, &cmp->prefixSkipIndex);
+		cmp->fullKeySkip = !continuescan;
+		/* prefix can be smaller than scankey due to extra quals being added
+		 * therefore we need to compare both. @todo this can be optimized into one function call */
+		cmp->prefixCmpResult = _bt_compare_until(scan->indexRelation, &skip->curPos.skipScanKey, tuple, skip->prefix);
+		cmp->skCmpResult = _bt_compare_until(scan->indexRelation,
+											 &skip->curPos.skipScanKey, tuple, skip->curPos.skipScanKey.keysz);
+		if (cmp->prefixSkipIndex == -1)
+		{
+			cmp->prefixSkipIndex = skip->prefix;
+			cmp->prefixSkip = ScanDirectionIsForward(dir) ? cmp->prefixCmpResult < 0 : cmp->prefixCmpResult > 0;
+		}
+		else
+		{
+			int newskip = -1;
+			_bt_checkkeys_threeway(scan, tuple, tupnatts, dir, &continuescan, &newskip);
+			if (newskip != -1)
+			{
+				cmp->prefixSkip = true;
+				cmp->prefixSkipIndex = newskip;
+			}
+			else
+			{
+				cmp->prefixSkip = ScanDirectionIsForward(dir) ? cmp->prefixCmpResult < 0 : cmp->prefixCmpResult > 0;
+				cmp->prefixSkipIndex = skip->prefix;
+			}
+		}
+
+		if (DEBUG1 >= log_min_messages || DEBUG1 >= client_min_messages)
+		{
+			print_itup(BufferGetBlockNumber(so->currPos.buf), tuple, NULL, scan->indexRelation,
+						"compare item");
+			_print_skey(scan, &skip->curPos.skipScanKey);
+			elog(DEBUG1, "result: eq: %d fkskip: %d pfxskip: %d prefixcmpres: %d prefixskipidx: %d", cmp->equal, cmp->fullKeySkip,
+				 _bt_should_prefix_skip(cmp), cmp->prefixCmpResult, cmp->prefixSkipIndex);
+		}
+	}
+	else
+	{
+		/* we cannot stop the scan if !isRegularMode - then we do need to skip to the next prefix */
+		cmp->fullKeySkip = isRegularMode;
+		cmp->equal = false;
+		cmp->prefixCmpResult = -2;
+		cmp->prefixSkip = true;
+		cmp->prefixSkipIndex = skip->prefix;
+		cmp->skCmpResult = -2;
+	}
+}
+
+void
+_bt_skip_once(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum,
+			  bool forceSkip, ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+	BTSkipCompareResult cmp;
+	bool doskip = forceSkip;
+	int skipIndex = skip->curPos.nextSkipIndex;
+	skip->curPos.nextAction = SkipStateSkipExtra;
+
+	while (doskip)
+	{
+		int toskip = skipIndex;
+		if (*curTuple != NULL)
+		{
+			if (skip->prefix <= skipIndex || !_bt_skip_is_regular_mode(prefixDir, postfixDir))
+			{
+				toskip = skip->prefix;
+			}
+
+			_bt_skip_update_scankey_for_prefix_skip(scan, scan->indexRelation,
+													toskip, *curTuple, prefixDir);
+		}
+
+		_bt_skip_find(scan, curTuple, curTupleOffnum, &skip->curPos.skipScanKey, prefixDir);
+
+		if (_bt_skip_is_always_valid(so))
+		{
+			_bt_skip_update_scankey_for_extra_skip(scan, scan->indexRelation,
+												   prefixDir, prefixDir, true, *curTuple);
+			_bt_compare_current_item(scan, *curTuple,
+									 IndexRelationGetNumberOfAttributes(scan->indexRelation),
+									 prefixDir,
+									 _bt_skip_is_regular_mode(prefixDir, postfixDir), &cmp);
+			skipIndex = cmp.prefixSkipIndex;
+			_bt_determine_next_action_after_skip(so, &cmp, prefixDir,
+												 postfixDir, toskip, &skip->curPos.nextAction);
+		}
+		else
+		{
+			skip->curPos.nextAction = SkipStateStop;
+		}
+		doskip = skip->curPos.nextAction == SkipStateSkip;
+	}
+	if (skip->curPos.nextAction != SkipStateStop && skip->curPos.nextAction != SkipStateNext)
+		_bt_skip_extra_conditions(scan, curTuple, curTupleOffnum, prefixDir, postfixDir, &cmp);
+}
+
+void
+_bt_skip_extra_conditions(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum,
+						  ScanDirection prefixDir, ScanDirection postfixDir, BTSkipCompareResult *cmp)
+{
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+	bool regularMode = _bt_skip_is_regular_mode(prefixDir, postfixDir);
+	if (_bt_skip_is_always_valid(so))
+	{
+		do
+		{
+			if (*curTuple != NULL)
+				_bt_skip_update_scankey_for_extra_skip(scan, scan->indexRelation,
+													   postfixDir, prefixDir, false, *curTuple);
+			_bt_skip_find(scan, curTuple, curTupleOffnum, &skip->curPos.skipScanKey, postfixDir);
+			_bt_compare_current_item(scan, *curTuple,
+									 IndexRelationGetNumberOfAttributes(scan->indexRelation),
+									 postfixDir, _bt_skip_is_regular_mode(prefixDir, postfixDir), cmp);
+		} while (regularMode && cmp->prefixCmpResult != 0 && !cmp->equal && !cmp->fullKeySkip);
+		skip->curPos.nextSkipIndex = cmp->prefixSkipIndex;
+	}
+	_bt_determine_next_action_after_skip_extra(so, cmp, &skip->curPos.nextAction);
+}
+
+static void
+_bt_skip_update_scankey_after_read(IndexScanDesc scan, IndexTuple curTuple,
+								   ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+	if (skip->curPos.nextAction == SkipStateSkip)
+	{
+		int toskip = skip->curPos.nextSkipIndex;
+		if (skip->prefix <= skip->curPos.nextSkipIndex ||
+				!_bt_skip_is_regular_mode(prefixDir, postfixDir))
+		{
+			toskip = skip->prefix;
+		}
+
+		if (_bt_skip_is_regular_mode(prefixDir, postfixDir))
+			_bt_skip_update_scankey_for_prefix_skip(scan, scan->indexRelation,
+													toskip, curTuple, prefixDir);
+		else
+			_bt_skip_update_scankey_for_prefix_skip(scan, scan->indexRelation,
+													toskip, NULL, prefixDir);
+	}
+	else if (skip->curPos.nextAction == SkipStateSkipExtra)
+	{
+		_bt_skip_update_scankey_for_extra_skip(scan, scan->indexRelation,
+											   postfixDir, prefixDir, false, curTuple);
+	}
+}
+
+static inline int
+_bt_compare_one(ScanKey scankey, Datum datum2, bool isNull2)
+{
+	int32		result;
+	Datum datum1 = scankey->sk_argument;
+	bool isNull1 = scankey->sk_flags & SK_ISNULL;
+	/* see comments about NULLs handling in btbuild */
+	if (isNull1)	/* key is NULL */
+	{
+		if (isNull2)
+			result = 0;		/* NULL "=" NULL */
+		else if (scankey->sk_flags & SK_BT_NULLS_FIRST)
+			result = -1;	/* NULL "<" NOT_NULL */
+		else
+			result = 1;		/* NULL ">" NOT_NULL */
+	}
+	else if (isNull2)		/* key is NOT_NULL and item is NULL */
+	{
+		if (scankey->sk_flags & SK_BT_NULLS_FIRST)
+			result = 1;		/* NOT_NULL ">" NULL */
+		else
+			result = -1;	/* NOT_NULL "<" NULL */
+	}
+	else
+	{
+		/*
+		 * The sk_func needs to be passed the index value as left arg and
+		 * the sk_argument as right arg (they might be of different
+		 * types).  Since it is convenient for callers to think of
+		 * _bt_compare as comparing the scankey to the index item, we have
+		 * to flip the sign of the comparison result.  (Unless it's a DESC
+		 * column, in which case we *don't* flip the sign.)
+		 */
+		result = DatumGetInt32(FunctionCall2Coll(&scankey->sk_func,
+												 scankey->sk_collation,
+												 datum2,
+												 datum1));
+
+		if (!(scankey->sk_flags & SK_BT_DESC))
+			INVERT_COMPARE_RESULT(result);
+	}
+	return result;
+}
+
+/*
+ * set up new values for the existing scankeys
+ * based on the current index tuple
+ */
+static inline void
+_bt_update_scankey_with_tuple(BTScanInsert insertKey, Relation indexRel, IndexTuple itup, int numattrs)
+{
+	TupleDesc		itupdesc;
+	int				i;
+	ScanKey			scankeys = insertKey->scankeys;
+
+	insertKey->keysz = numattrs;
+	itupdesc = RelationGetDescr(indexRel);
+	for (i = 0; i < numattrs; i++)
+	{
+		Datum datum;
+		bool null;
+		int flags;
+
+		datum = index_getattr(itup, i + 1, itupdesc, &null);
+		flags = (null ? SK_ISNULL : 0) |
+				(indexRel->rd_indoption[i] << SK_BT_INDOPTION_SHIFT);
+		scankeys[i].sk_flags = flags;
+		scankeys[i].sk_argument = datum;
+	}
+}
+
+/* copy the elements important to a skip from one insertion sk to another */
+static inline void
+_bt_copy_scankey(BTScanInsert to, BTScanInsert from, int numattrs)
+{
+	memcpy(to->scankeys, from->scankeys, sizeof(ScanKeyData) * (unsigned long)numattrs);
+	to->nextkey = from->nextkey;
+	to->keysz = numattrs;
+}
+
+/*
+ * Updates the existing scankey for skipping to the next prefix
+ * alwaysUsePrefix determines how many attrs the scankey will have
+ * when true, it will always have skip->prefix number of attributes,
+ * otherwise, the value can be less, which will be determined by the comparison
+ * result with the current tuple.
+ * for example, a SELECT * FROM tbl WHERE b<2, index (a,b,c) and when skipping with prefix size=2
+ * if we encounter the tuple (1,3,1) - this does not match the qual b<2. however, we also know that
+ * it is not useful to skip to any next qual with prefix=2 (eg. (1,4)), because that will definitely not
+ * match either. However, we do want to skip to eg. (2,0). Therefore, we skip over prefix=1 in this case.
+ *
+ * the provided itup may be null. this happens when we don't want to use the current tuple to update
+ * the scankey, but instead want to use the existing curPos.skipScanKey to fill currentTupleKey. this accounts
+ * for some edge cases.
+ */
+static void
+_bt_skip_update_scankey_for_prefix_skip(IndexScanDesc scan, Relation indexRel,
+										int prefix, IndexTuple itup, ScanDirection prefixDir)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+	/* we use skip->prefix is alwaysUsePrefix is set or if skip->prefix is smaller than whatever the
+	 * comparison result provided, such that we never skip more than skip->prefix
+	 */
+	int numattrs = prefix;
+
+	if (itup != NULL)
+	{
+		_bt_update_scankey_with_tuple(&skip->currentTupleKey, indexRel, itup, numattrs);
+		_bt_copy_scankey(&skip->curPos.skipScanKey, &skip->currentTupleKey, numattrs);
+	}
+	else
+	{
+		skip->curPos.skipScanKey.keysz = numattrs;
+		_bt_copy_scankey(&skip->currentTupleKey, &skip->curPos.skipScanKey, numattrs);
+	}
+	/* update strategy for last attribute as we will use this to determine the rest of the
+	 * rest of the flags (goback) when doing the actual tree search
+	 */
+	skip->currentTupleKey.scankeys[numattrs - 1].sk_strategy =
+			skip->curPos.skipScanKey.scankeys[numattrs - 1].sk_strategy =
+			ScanDirectionIsForward(prefixDir) ? BTGreaterStrategyNumber : BTLessStrategyNumber;
+}
+
+/* update the scankey for skipping the 'extra' conditions, opportunities
+ * that arise when we have just skipped to a new prefix and can try to skip
+ * within the prefix to the right tuple by using extra quals when available
+ *
+ * @todo as an optimization it should be possible to optimize calls to this function
+ * and to _bt_skip_update_scankey_for_prefix_skip to some more specific functions that
+ * will need to do less copying of data.
+ */
+void
+_bt_skip_update_scankey_for_extra_skip(IndexScanDesc scan, Relation indexRel, ScanDirection curDir,
+									   ScanDirection prefixDir, bool prioritizeEqual, IndexTuple itup)
+{
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	BTSkip skip = so->skipData;
+	BTScanInsert toCopy;
+	int i, left, lastNonTuple = skip->prefix;
+
+	/* first make sure that currentTupleKey is correct at all times */
+	_bt_skip_update_scankey_for_prefix_skip(scan, indexRel, skip->prefix, itup, prefixDir);
+	/* then do the actual work to setup curPos.skipScanKey - distinguish between work that depends on overallDir
+	 * (those attributes between attribute number 1 and 'prefix' inclusive)
+	 * and work that depends on curDir
+	 * (those attributes between attribute number 'prefix' + 1 and fwdScanKey.keysz inclusive)
+	 */
+	if (ScanDirectionIsForward(prefixDir))
+	{
+		/*
+		 * if overallDir is Forward, we need to choose between fwdScanKey or
+		 * currentTupleKey. we need to choose the most restrictive one -
+		 * in most cases this means choosing eg. a>5 over a=2 when scanning forward,
+		 * unless prioritizeEqual is set. this is done for certain special cases
+		 */
+		for (i = 0; i < skip->prefix; i++)
+		{
+			ScanKey scankey = &skip->fwdScanKey.scankeys[i];
+			ScanKey scankeyItem = &skip->currentTupleKey.scankeys[i];
+			if (scankey->sk_attno != 0 && (_bt_compare_one(scankey, scankeyItem->sk_argument, scankeyItem->sk_flags & SK_ISNULL) > 0
+										   || (prioritizeEqual && scankey->sk_strategy == BTEqualStrategyNumber)))
+			{
+				memcpy(skip->curPos.skipScanKey.scankeys + i, scankey, sizeof(ScanKeyData));
+				lastNonTuple = i;
+			}
+			else
+			{
+				if (lastNonTuple < i)
+					break;
+				memcpy(skip->curPos.skipScanKey.scankeys + i, scankeyItem, sizeof(ScanKeyData));
+			}
+			/* for now choose equal here - it could actually be improved a bit @todo by choosing the strategy
+			 * from the scankeys, but it doesn't matter a lot
+			 */
+			skip->curPos.skipScanKey.scankeys[i].sk_strategy = BTEqualStrategyNumber;
+		}
+	}
+	else
+	{
+		/* similar for backward but in opposite direction */
+		for (i = 0; i < skip->prefix; i++)
+		{
+			ScanKey scankey = &skip->bwdScanKey.scankeys[i];
+			ScanKey scankeyItem = &skip->currentTupleKey.scankeys[i];
+			if (scankey->sk_attno != 0 && (_bt_compare_one(scankey, scankeyItem->sk_argument, scankeyItem->sk_flags & SK_ISNULL) < 0
+										   || (prioritizeEqual && scankey->sk_strategy == BTEqualStrategyNumber)))
+			{
+				memcpy(skip->curPos.skipScanKey.scankeys + i, scankey, sizeof(ScanKeyData));
+				lastNonTuple = i;
+			}
+			else
+			{
+				if (lastNonTuple < i)
+					break;
+				memcpy(skip->curPos.skipScanKey.scankeys + i, scankeyItem, sizeof(ScanKeyData));
+			}
+			skip->curPos.skipScanKey.scankeys[i].sk_strategy = BTEqualStrategyNumber;
+		}
+	}
+
+	/*
+	 * the remaining keys are the quals after the prefix
+	 */
+	if (ScanDirectionIsForward(curDir))
+		toCopy = &skip->fwdScanKey;
+	else
+		toCopy = &skip->bwdScanKey;
+
+	if (lastNonTuple >= skip->prefix - 1)
+	{
+		left = toCopy->keysz - skip->prefix;
+		if (left > 0)
+		{
+			memcpy(skip->curPos.skipScanKey.scankeys + skip->prefix, toCopy->scankeys + i, sizeof(ScanKeyData) * (unsigned long)left);
+		}
+		skip->curPos.skipScanKey.keysz = toCopy->keysz;
+	}
+	else
+	{
+		skip->curPos.skipScanKey.keysz = lastNonTuple + 1;
+	}
+}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 9111e2789c..135953da5f 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -554,7 +554,7 @@ _bt_leafbuild(BTSpool *btspool, BTSpool *btspool2)
 
 	wstate.heap = btspool->heap;
 	wstate.index = btspool->index;
-	wstate.inskey = _bt_mkscankey(wstate.index, NULL);
+	wstate.inskey = _bt_mkscankey(wstate.index, NULL, NULL);
 	/* _bt_mkscankey() won't set allequalimage without metapage */
 	wstate.inskey->allequalimage = _bt_allequalimage(wstate.index, true);
 	wstate.btws_use_wal = RelationNeedsWAL(wstate.index);
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 54afa6f417..d5d30ac5b6 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -49,10 +49,10 @@ static bool _bt_compare_scankey_args(IndexScanDesc scan, ScanKey op,
 									 ScanKey leftarg, ScanKey rightarg,
 									 bool *result);
 static bool _bt_fix_scankey_strategy(ScanKey skey, int16 *indoption);
-static void _bt_mark_scankey_required(ScanKey skey);
+static void _bt_mark_scankey_required(ScanKey skey, int forwardReqFlag, int backwardReqFlag);
 static bool _bt_check_rowcompare(ScanKey skey,
 								 IndexTuple tuple, int tupnatts, TupleDesc tupdesc,
-								 ScanDirection dir, bool *continuescan);
+								 ScanDirection dir, bool *continuescan, int *prefixskipindex);
 static int	_bt_keep_natts(Relation rel, IndexTuple lastleft,
 						   IndexTuple firstright, BTScanInsert itup_key);
 
@@ -87,9 +87,8 @@ static int	_bt_keep_natts(Relation rel, IndexTuple lastleft,
  *		field themselves.
  */
 BTScanInsert
-_bt_mkscankey(Relation rel, IndexTuple itup)
+_bt_mkscankey(Relation rel, IndexTuple itup, BTScanInsert key)
 {
-	BTScanInsert key;
 	ScanKey		skey;
 	TupleDesc	itupdesc;
 	int			indnkeyatts;
@@ -109,8 +108,10 @@ _bt_mkscankey(Relation rel, IndexTuple itup)
 	 * Truncated attributes and non-key attributes are omitted from the final
 	 * scan key.
 	 */
-	key = palloc(offsetof(BTScanInsertData, scankeys) +
-				 sizeof(ScanKeyData) * indnkeyatts);
+	if (key == NULL)
+		key = palloc(offsetof(BTScanInsertData, scankeys) +
+					 sizeof(ScanKeyData) * indnkeyatts);
+
 	if (itup)
 		_bt_metaversion(rel, &key->heapkeyspace, &key->allequalimage);
 	else
@@ -155,7 +156,7 @@ _bt_mkscankey(Relation rel, IndexTuple itup)
 		ScanKeyEntryInitializeWithInfo(&skey[i],
 									   flags,
 									   (AttrNumber) (i + 1),
-									   InvalidStrategy,
+									   BTEqualStrategyNumber,
 									   InvalidOid,
 									   rel->rd_indcollation[i],
 									   procinfo,
@@ -745,7 +746,7 @@ _bt_preprocess_keys(IndexScanDesc scan)
 	int			numberOfKeys = scan->numberOfKeys;
 	int16	   *indoption = scan->indexRelation->rd_indoption;
 	int			new_numberOfKeys;
-	int			numberOfEqualCols;
+	int			numberOfEqualCols, numberOfEqualColsSincePrefix;
 	ScanKey		inkeys;
 	ScanKey		outkeys;
 	ScanKey		cur;
@@ -754,6 +755,7 @@ _bt_preprocess_keys(IndexScanDesc scan)
 	int			i,
 				j;
 	AttrNumber	attno;
+	int			prefix = 0;
 
 	/* initialize result variables */
 	so->qual_ok = true;
@@ -762,6 +764,11 @@ _bt_preprocess_keys(IndexScanDesc scan)
 	if (numberOfKeys < 1)
 		return;					/* done if qual-less scan */
 
+	if (_bt_skip_enabled(so))
+	{
+		prefix = so->skipData->prefix;
+	}
+
 	/*
 	 * Read so->arrayKeyData if array keys are present, else scan->keyData
 	 */
@@ -786,7 +793,9 @@ _bt_preprocess_keys(IndexScanDesc scan)
 		so->numberOfKeys = 1;
 		/* We can mark the qual as required if it's for first index col */
 		if (cur->sk_attno == 1)
-			_bt_mark_scankey_required(outkeys);
+			_bt_mark_scankey_required(outkeys, SK_BT_REQFWD, SK_BT_REQBKWD);
+		if (cur->sk_attno <= prefix + 1)
+			_bt_mark_scankey_required(outkeys, SK_BT_REQSKIPFWD, SK_BT_REQSKIPBKWD);
 		return;
 	}
 
@@ -795,6 +804,8 @@ _bt_preprocess_keys(IndexScanDesc scan)
 	 */
 	new_numberOfKeys = 0;
 	numberOfEqualCols = 0;
+	numberOfEqualColsSincePrefix = 0;
+
 
 	/*
 	 * Initialize for processing of keys for attr 1.
@@ -830,6 +841,8 @@ _bt_preprocess_keys(IndexScanDesc scan)
 		if (i == numberOfKeys || cur->sk_attno != attno)
 		{
 			int			priorNumberOfEqualCols = numberOfEqualCols;
+			int			priorNumberOfEqualColsSincePrefix = numberOfEqualColsSincePrefix;
+
 
 			/* check input keys are correctly ordered */
 			if (i < numberOfKeys && cur->sk_attno < attno)
@@ -880,6 +893,8 @@ _bt_preprocess_keys(IndexScanDesc scan)
 				}
 				/* track number of attrs for which we have "=" keys */
 				numberOfEqualCols++;
+				if (attno > prefix)
+					numberOfEqualColsSincePrefix++;
 			}
 
 			/* try to keep only one of <, <= */
@@ -929,7 +944,9 @@ _bt_preprocess_keys(IndexScanDesc scan)
 
 					memcpy(outkey, xform[j], sizeof(ScanKeyData));
 					if (priorNumberOfEqualCols == attno - 1)
-						_bt_mark_scankey_required(outkey);
+						_bt_mark_scankey_required(outkey, SK_BT_REQFWD, SK_BT_REQBKWD);
+					if (attno <= prefix || priorNumberOfEqualColsSincePrefix == attno - prefix - 1)
+						_bt_mark_scankey_required(outkey, SK_BT_REQSKIPFWD, SK_BT_REQSKIPBKWD);
 				}
 			}
 
@@ -954,7 +971,9 @@ _bt_preprocess_keys(IndexScanDesc scan)
 
 			memcpy(outkey, cur, sizeof(ScanKeyData));
 			if (numberOfEqualCols == attno - 1)
-				_bt_mark_scankey_required(outkey);
+				_bt_mark_scankey_required(outkey, SK_BT_REQFWD, SK_BT_REQBKWD);
+			if (attno <= prefix || numberOfEqualColsSincePrefix == attno - prefix - 1)
+				_bt_mark_scankey_required(outkey, SK_BT_REQSKIPFWD, SK_BT_REQSKIPBKWD);
 
 			/*
 			 * We don't support RowCompare using equality; such a qual would
@@ -997,7 +1016,9 @@ _bt_preprocess_keys(IndexScanDesc scan)
 
 				memcpy(outkey, cur, sizeof(ScanKeyData));
 				if (numberOfEqualCols == attno - 1)
-					_bt_mark_scankey_required(outkey);
+					_bt_mark_scankey_required(outkey, SK_BT_REQFWD, SK_BT_REQBKWD);
+				if (attno <= prefix || numberOfEqualColsSincePrefix == attno - prefix - 1)
+					_bt_mark_scankey_required(outkey, SK_BT_REQSKIPFWD, SK_BT_REQSKIPBKWD);
 			}
 		}
 	}
@@ -1295,7 +1316,7 @@ _bt_fix_scankey_strategy(ScanKey skey, int16 *indoption)
  * anyway on a rescan.  Something to keep an eye on though.
  */
 static void
-_bt_mark_scankey_required(ScanKey skey)
+_bt_mark_scankey_required(ScanKey skey, int forwardReqFlag, int backwardReqFlag)
 {
 	int			addflags;
 
@@ -1303,14 +1324,14 @@ _bt_mark_scankey_required(ScanKey skey)
 	{
 		case BTLessStrategyNumber:
 		case BTLessEqualStrategyNumber:
-			addflags = SK_BT_REQFWD;
+			addflags = forwardReqFlag;
 			break;
 		case BTEqualStrategyNumber:
-			addflags = SK_BT_REQFWD | SK_BT_REQBKWD;
+			addflags = forwardReqFlag | backwardReqFlag;
 			break;
 		case BTGreaterEqualStrategyNumber:
 		case BTGreaterStrategyNumber:
-			addflags = SK_BT_REQBKWD;
+			addflags = backwardReqFlag;
 			break;
 		default:
 			elog(ERROR, "unrecognized StrategyNumber: %d",
@@ -1353,17 +1374,22 @@ _bt_mark_scankey_required(ScanKey skey)
  */
 bool
 _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
-			  ScanDirection dir, bool *continuescan)
+			  ScanDirection dir, bool *continuescan, int *prefixSkipIndex)
 {
 	TupleDesc	tupdesc;
 	BTScanOpaque so;
 	int			keysz;
 	int			ikey;
 	ScanKey		key;
+	int pfx;
+
+	if (prefixSkipIndex == NULL)
+		prefixSkipIndex = &pfx;
 
 	Assert(BTreeTupleGetNAtts(tuple, scan->indexRelation) == tupnatts);
 
 	*continuescan = true;		/* default assumption */
+	*prefixSkipIndex = -1;
 
 	tupdesc = RelationGetDescr(scan->indexRelation);
 	so = (BTScanOpaque) scan->opaque;
@@ -1392,7 +1418,7 @@ _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
 		if (key->sk_flags & SK_ROW_HEADER)
 		{
 			if (_bt_check_rowcompare(key, tuple, tupnatts, tupdesc, dir,
-									 continuescan))
+									 continuescan, prefixSkipIndex))
 				continue;
 			return false;
 		}
@@ -1429,6 +1455,13 @@ _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
 					 ScanDirectionIsBackward(dir))
 				*continuescan = false;
 
+			if ((key->sk_flags & SK_BT_REQSKIPFWD) &&
+				ScanDirectionIsForward(dir))
+				*prefixSkipIndex = key->sk_attno - 1;
+			else if ((key->sk_flags & SK_BT_REQSKIPBKWD) &&
+					 ScanDirectionIsBackward(dir))
+				*prefixSkipIndex = key->sk_attno - 1;
+
 			/*
 			 * In any case, this indextuple doesn't match the qual.
 			 */
@@ -1452,6 +1485,10 @@ _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
 				if ((key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) &&
 					ScanDirectionIsBackward(dir))
 					*continuescan = false;
+
+				if ((key->sk_flags & (SK_BT_REQSKIPFWD | SK_BT_REQSKIPBKWD)) &&
+					ScanDirectionIsBackward(dir))
+					*prefixSkipIndex = key->sk_attno - 1;
 			}
 			else
 			{
@@ -1468,6 +1505,9 @@ _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
 				if ((key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) &&
 					ScanDirectionIsForward(dir))
 					*continuescan = false;
+				if ((key->sk_flags & (SK_BT_REQSKIPFWD | SK_BT_REQSKIPBKWD)) &&
+									ScanDirectionIsBackward(dir))
+									*prefixSkipIndex = key->sk_attno - 1;
 			}
 
 			/*
@@ -1498,6 +1538,206 @@ _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
 					 ScanDirectionIsBackward(dir))
 				*continuescan = false;
 
+			if ((key->sk_flags & SK_BT_REQSKIPFWD) &&
+				ScanDirectionIsForward(dir))
+				*prefixSkipIndex = key->sk_attno - 1;
+			else if ((key->sk_flags & SK_BT_REQSKIPBKWD) &&
+					 ScanDirectionIsBackward(dir))
+				*prefixSkipIndex = key->sk_attno - 1;
+
+			/*
+			 * In any case, this indextuple doesn't match the qual.
+			 */
+			return false;
+		}
+	}
+
+	/* If we get here, the tuple passes all index quals. */
+	return true;
+}
+
+bool
+_bt_checkkeys_threeway(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
+			  ScanDirection dir, bool *continuescan, int *prefixSkipIndex)
+{
+	TupleDesc	tupdesc;
+	BTScanOpaque so;
+	int			keysz;
+	int			ikey;
+	ScanKey		key;
+	int pfx;
+	BTScanInsert keys;
+
+	if (prefixSkipIndex == NULL)
+		prefixSkipIndex = &pfx;
+
+	Assert(BTreeTupleGetNAtts(tuple, scan->indexRelation) == tupnatts);
+
+	*continuescan = true;		/* default assumption */
+	*prefixSkipIndex = -1;
+
+	tupdesc = RelationGetDescr(scan->indexRelation);
+	so = (BTScanOpaque) scan->opaque;
+	if (ScanDirectionIsForward(dir))
+		keys = &so->skipData->bwdScanKey;
+	else
+		keys = &so->skipData->fwdScanKey;
+
+	keysz = keys->keysz;
+
+	for (key = keys->scankeys, ikey = 0; ikey < keysz; key++, ikey++)
+	{
+		Datum		datum;
+		bool		isNull;
+		int		cmpresult;
+
+		if (key->sk_attno == 0)
+			continue;
+
+		if (key->sk_attno > tupnatts)
+		{
+			/*
+			 * This attribute is truncated (must be high key).  The value for
+			 * this attribute in the first non-pivot tuple on the page to the
+			 * right could be any possible value.  Assume that truncated
+			 * attribute passes the qual.
+			 */
+			Assert(ScanDirectionIsForward(dir));
+			continue;
+		}
+
+		/* row-comparison keys need special processing */
+		Assert((key->sk_flags & SK_ROW_HEADER) == 0);
+
+		datum = index_getattr(tuple,
+							  key->sk_attno,
+							  tupdesc,
+							  &isNull);
+
+		if (key->sk_flags & SK_ISNULL)
+		{
+			/* Handle IS NULL/NOT NULL tests */
+			if (key->sk_flags & SK_SEARCHNULL)
+			{
+				if (isNull)
+					continue;	/* tuple satisfies this qual */
+			}
+			else
+			{
+				Assert(key->sk_flags & SK_SEARCHNOTNULL);
+				if (!isNull)
+					continue;	/* tuple satisfies this qual */
+			}
+
+			/*
+			 * Tuple fails this qual.  If it's a required qual for the current
+			 * scan direction, then we can conclude no further tuples will
+			 * pass, either.
+			 */
+			if ((key->sk_flags & SK_BT_REQFWD) &&
+				ScanDirectionIsForward(dir))
+				*continuescan = false;
+			else if ((key->sk_flags & SK_BT_REQBKWD) &&
+					 ScanDirectionIsBackward(dir))
+				*continuescan = false;
+
+			if ((key->sk_flags & SK_BT_REQSKIPFWD) &&
+				ScanDirectionIsForward(dir))
+				*prefixSkipIndex = key->sk_attno - 1;
+			else if ((key->sk_flags & SK_BT_REQSKIPBKWD) &&
+					 ScanDirectionIsBackward(dir))
+				*prefixSkipIndex = key->sk_attno - 1;
+
+			/*
+			 * In any case, this indextuple doesn't match the qual.
+			 */
+			return false;
+		}
+
+		if (isNull)
+		{
+			if (key->sk_flags & SK_BT_NULLS_FIRST)
+			{
+				/*
+				 * Since NULLs are sorted before non-NULLs, we know we have
+				 * reached the lower limit of the range of values for this
+				 * index attr.  On a backward scan, we can stop if this qual
+				 * is one of the "must match" subset.  We can stop regardless
+				 * of whether the qual is > or <, so long as it's required,
+				 * because it's not possible for any future tuples to pass. On
+				 * a forward scan, however, we must keep going, because we may
+				 * have initially positioned to the start of the index.
+				 */
+				if ((key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) &&
+					ScanDirectionIsBackward(dir))
+					*continuescan = false;
+
+				if ((key->sk_flags & (SK_BT_REQSKIPFWD | SK_BT_REQSKIPBKWD)) &&
+					ScanDirectionIsBackward(dir))
+					*prefixSkipIndex = key->sk_attno - 1;
+			}
+			else
+			{
+				/*
+				 * Since NULLs are sorted after non-NULLs, we know we have
+				 * reached the upper limit of the range of values for this
+				 * index attr.  On a forward scan, we can stop if this qual is
+				 * one of the "must match" subset.  We can stop regardless of
+				 * whether the qual is > or <, so long as it's required,
+				 * because it's not possible for any future tuples to pass. On
+				 * a backward scan, however, we must keep going, because we
+				 * may have initially positioned to the end of the index.
+				 */
+				if ((key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) &&
+					ScanDirectionIsForward(dir))
+					*continuescan = false;
+				if ((key->sk_flags & (SK_BT_REQSKIPFWD | SK_BT_REQSKIPBKWD)) &&
+					ScanDirectionIsBackward(dir))
+					*prefixSkipIndex = key->sk_attno - 1;
+			}
+
+			/*
+			 * In any case, this indextuple doesn't match the qual.
+			 */
+			return false;
+		}
+
+
+		/* Perform the test --- three-way comparison not bool operator */
+		cmpresult = DatumGetInt32(FunctionCall2Coll(&key->sk_func,
+													key->sk_collation,
+													datum,
+													key->sk_argument));
+
+		if (key->sk_flags & SK_BT_DESC)
+			INVERT_COMPARE_RESULT(cmpresult);
+
+		if (cmpresult != 0)
+		{
+			/*
+			 * Tuple fails this qual.  If it's a required qual for the current
+			 * scan direction, then we can conclude no further tuples will
+			 * pass, either.
+			 *
+			 * Note: because we stop the scan as soon as any required equality
+			 * qual fails, it is critical that equality quals be used for the
+			 * initial positioning in _bt_first() when they are available. See
+			 * comments in _bt_first().
+			 */
+			if ((key->sk_flags & SK_BT_REQFWD) &&
+				ScanDirectionIsForward(dir) && cmpresult > 0)
+				*continuescan = false;
+			else if ((key->sk_flags & SK_BT_REQBKWD) &&
+					 ScanDirectionIsBackward(dir) && cmpresult < 0)
+				*continuescan = false;
+
+			if ((key->sk_flags & SK_BT_REQSKIPFWD) &&
+				ScanDirectionIsForward(dir) && cmpresult > 0)
+				*prefixSkipIndex = key->sk_attno - 1;
+			else if ((key->sk_flags & SK_BT_REQSKIPBKWD) &&
+					 ScanDirectionIsBackward(dir) && cmpresult < 0)
+				*prefixSkipIndex = key->sk_attno - 1;
+
 			/*
 			 * In any case, this indextuple doesn't match the qual.
 			 */
@@ -1520,7 +1760,7 @@ _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
  */
 static bool
 _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
-					 TupleDesc tupdesc, ScanDirection dir, bool *continuescan)
+					 TupleDesc tupdesc, ScanDirection dir, bool *continuescan, int *prefixSkipIndex)
 {
 	ScanKey		subkey = (ScanKey) DatumGetPointer(skey->sk_argument);
 	int32		cmpresult = 0;
@@ -1576,6 +1816,10 @@ _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
 				if ((subkey->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) &&
 					ScanDirectionIsBackward(dir))
 					*continuescan = false;
+
+				if ((subkey->sk_flags & (SK_BT_REQSKIPFWD | SK_BT_REQBKWD) &&
+					ScanDirectionIsBackward(dir)))
+					*prefixSkipIndex = subkey->sk_attno - 1;
 			}
 			else
 			{
@@ -1592,6 +1836,10 @@ _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
 				if ((subkey->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) &&
 					ScanDirectionIsForward(dir))
 					*continuescan = false;
+
+				if ((subkey->sk_flags & (SK_BT_REQSKIPFWD | SK_BT_REQBKWD) &&
+					ScanDirectionIsForward(dir)))
+					*prefixSkipIndex = subkey->sk_attno - 1;
 			}
 
 			/*
@@ -1616,6 +1864,13 @@ _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
 			else if ((subkey->sk_flags & SK_BT_REQBKWD) &&
 					 ScanDirectionIsBackward(dir))
 				*continuescan = false;
+
+			if ((subkey->sk_flags & SK_BT_REQSKIPFWD) &&
+				ScanDirectionIsForward(dir))
+				*prefixSkipIndex = subkey->sk_attno - 1;
+			else if ((subkey->sk_flags & SK_BT_REQSKIPBKWD) &&
+					 ScanDirectionIsBackward(dir))
+				*prefixSkipIndex = subkey->sk_attno - 1;
 			return false;
 		}
 
@@ -1678,6 +1933,13 @@ _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
 		else if ((subkey->sk_flags & SK_BT_REQBKWD) &&
 				 ScanDirectionIsBackward(dir))
 			*continuescan = false;
+
+		if ((subkey->sk_flags & SK_BT_REQSKIPFWD) &&
+			ScanDirectionIsForward(dir))
+			*prefixSkipIndex = subkey->sk_attno - 1;
+		else if ((subkey->sk_flags & SK_BT_REQSKIPBKWD) &&
+				 ScanDirectionIsBackward(dir))
+			*prefixSkipIndex = subkey->sk_attno - 1;
 	}
 
 	return result;
@@ -2767,3 +3029,524 @@ _bt_allequalimage(Relation rel, bool debugmessage)
 
 	return allequalimage;
 }
+
+void _bt_set_bsearch_flags(StrategyNumber stratTotal, ScanDirection dir, bool* nextkey, bool* goback)
+{
+	/*----------
+	 * Examine the selected initial-positioning strategy to determine exactly
+	 * where we need to start the scan, and set flag variables to control the
+	 * code below.
+	 *
+	 * If nextkey = false, _bt_search and _bt_binsrch will locate the first
+	 * item >= scan key.  If nextkey = true, they will locate the first
+	 * item > scan key.
+	 *
+	 * If goback = true, we will then step back one item, while if
+	 * goback = false, we will start the scan on the located item.
+	 *----------
+	 */
+	switch (stratTotal)
+	{
+		case BTLessStrategyNumber:
+
+			/*
+			 * Find first item >= scankey, then back up one to arrive at last
+			 * item < scankey.  (Note: this positioning strategy is only used
+			 * for a backward scan, so that is always the correct starting
+			 * position.)
+			 */
+			*nextkey = false;
+			*goback = true;
+			break;
+
+		case BTLessEqualStrategyNumber:
+
+			/*
+			 * Find first item > scankey, then back up one to arrive at last
+			 * item <= scankey.  (Note: this positioning strategy is only used
+			 * for a backward scan, so that is always the correct starting
+			 * position.)
+			 */
+			*nextkey = true;
+			*goback = true;
+			break;
+
+		case BTEqualStrategyNumber:
+
+			/*
+			 * If a backward scan was specified, need to start with last equal
+			 * item not first one.
+			 */
+			if (ScanDirectionIsBackward(dir))
+			{
+				/*
+				 * This is the same as the <= strategy.  We will check at the
+				 * end whether the found item is actually =.
+				 */
+				*nextkey = true;
+				*goback = true;
+			}
+			else
+			{
+				/*
+				 * This is the same as the >= strategy.  We will check at the
+				 * end whether the found item is actually =.
+				 */
+				*nextkey = false;
+				*goback = false;
+			}
+			break;
+
+		case BTGreaterEqualStrategyNumber:
+
+			/*
+			 * Find first item >= scankey.  (This is only used for forward
+			 * scans.)
+			 */
+			*nextkey = false;
+			*goback = false;
+			break;
+
+		case BTGreaterStrategyNumber:
+
+			/*
+			 * Find first item > scankey.  (This is only used for forward
+			 * scans.)
+			 */
+			*nextkey = true;
+			*goback = false;
+			break;
+
+		default:
+			/* can't get here, but keep compiler quiet */
+			elog(ERROR, "unrecognized strat_total: %d", (int) stratTotal);
+	}
+}
+
+bool _bt_create_insertion_scan_key(Relation	rel, ScanDirection dir, ScanKey* startKeys, int keysCount, BTScanInsert inskey, StrategyNumber* stratTotal,  bool* goback)
+{
+	int i;
+	bool nextkey;
+
+	/*
+	 * We want to start the scan somewhere within the index.  Set up an
+	 * insertion scankey we can use to search for the boundary point we
+	 * identified above.  The insertion scankey is built using the keys
+	 * identified by startKeys[].  (Remaining insertion scankey fields are
+	 * initialized after initial-positioning strategy is finalized.)
+	 */
+	Assert(keysCount <= INDEX_MAX_KEYS);
+	for (i = 0; i < keysCount; i++)
+	{
+		ScanKey		cur = startKeys[i];
+
+		if (cur == NULL)
+		{
+			inskey->scankeys[i].sk_attno = 0;
+			continue;
+		}
+
+		Assert(cur->sk_attno == i + 1);
+
+		if (cur->sk_flags & SK_ROW_HEADER)
+		{
+			/*
+			 * Row comparison header: look to the first row member instead.
+			 *
+			 * The member scankeys are already in insertion format (ie, they
+			 * have sk_func = 3-way-comparison function), but we have to watch
+			 * out for nulls, which _bt_preprocess_keys didn't check. A null
+			 * in the first row member makes the condition unmatchable, just
+			 * like qual_ok = false.
+			 */
+			ScanKey		subkey = (ScanKey) DatumGetPointer(cur->sk_argument);
+
+			Assert(subkey->sk_flags & SK_ROW_MEMBER);
+			if (subkey->sk_flags & SK_ISNULL)
+			{
+				return false;
+			}
+			memcpy(inskey->scankeys + i, subkey, sizeof(ScanKeyData));
+
+			/*
+			 * If the row comparison is the last positioning key we accepted,
+			 * try to add additional keys from the lower-order row members.
+			 * (If we accepted independent conditions on additional index
+			 * columns, we use those instead --- doesn't seem worth trying to
+			 * determine which is more restrictive.)  Note that this is OK
+			 * even if the row comparison is of ">" or "<" type, because the
+			 * condition applied to all but the last row member is effectively
+			 * ">=" or "<=", and so the extra keys don't break the positioning
+			 * scheme.  But, by the same token, if we aren't able to use all
+			 * the row members, then the part of the row comparison that we
+			 * did use has to be treated as just a ">=" or "<=" condition, and
+			 * so we'd better adjust strat_total accordingly.
+			 */
+			if (i == keysCount - 1)
+			{
+				bool		used_all_subkeys = false;
+
+				Assert(!(subkey->sk_flags & SK_ROW_END));
+				for (;;)
+				{
+					subkey++;
+					Assert(subkey->sk_flags & SK_ROW_MEMBER);
+					if (subkey->sk_attno != keysCount + 1)
+						break;	/* out-of-sequence, can't use it */
+					if (subkey->sk_strategy != cur->sk_strategy)
+						break;	/* wrong direction, can't use it */
+					if (subkey->sk_flags & SK_ISNULL)
+						break;	/* can't use null keys */
+					Assert(keysCount < INDEX_MAX_KEYS);
+					memcpy(inskey->scankeys + keysCount, subkey,
+						   sizeof(ScanKeyData));
+					keysCount++;
+					if (subkey->sk_flags & SK_ROW_END)
+					{
+						used_all_subkeys = true;
+						break;
+					}
+				}
+				if (!used_all_subkeys)
+				{
+					switch (*stratTotal)
+					{
+						case BTLessStrategyNumber:
+							*stratTotal = BTLessEqualStrategyNumber;
+							break;
+						case BTGreaterStrategyNumber:
+							*stratTotal = BTGreaterEqualStrategyNumber;
+							break;
+					}
+				}
+				break;			/* done with outer loop */
+			}
+		}
+		else
+		{
+			/*
+			 * Ordinary comparison key.  Transform the search-style scan key
+			 * to an insertion scan key by replacing the sk_func with the
+			 * appropriate btree comparison function.
+			 *
+			 * If scankey operator is not a cross-type comparison, we can use
+			 * the cached comparison function; otherwise gotta look it up in
+			 * the catalogs.  (That can't lead to infinite recursion, since no
+			 * indexscan initiated by syscache lookup will use cross-data-type
+			 * operators.)
+			 *
+			 * We support the convention that sk_subtype == InvalidOid means
+			 * the opclass input type; this is a hack to simplify life for
+			 * ScanKeyInit().
+			 */
+			if (cur->sk_subtype == rel->rd_opcintype[i] ||
+				cur->sk_subtype == InvalidOid)
+			{
+				FmgrInfo   *procinfo;
+
+				procinfo = index_getprocinfo(rel, cur->sk_attno, BTORDER_PROC);
+				ScanKeyEntryInitializeWithInfo(inskey->scankeys + i,
+											   cur->sk_flags,
+											   cur->sk_attno,
+											   cur->sk_strategy,
+											   cur->sk_subtype,
+											   cur->sk_collation,
+											   procinfo,
+											   cur->sk_argument);
+			}
+			else
+			{
+				RegProcedure cmp_proc;
+
+				cmp_proc = get_opfamily_proc(rel->rd_opfamily[i],
+											 rel->rd_opcintype[i],
+											 cur->sk_subtype,
+											 BTORDER_PROC);
+				if (!RegProcedureIsValid(cmp_proc))
+					elog(ERROR, "missing support function %d(%u,%u) for attribute %d of index \"%s\"",
+						 BTORDER_PROC, rel->rd_opcintype[i], cur->sk_subtype,
+						 cur->sk_attno, RelationGetRelationName(rel));
+				ScanKeyEntryInitialize(inskey->scankeys + i,
+									   cur->sk_flags,
+									   cur->sk_attno,
+									   cur->sk_strategy,
+									   cur->sk_subtype,
+									   cur->sk_collation,
+									   cmp_proc,
+									   cur->sk_argument);
+			}
+		}
+	}
+
+	_bt_set_bsearch_flags(*stratTotal, dir, &nextkey, goback);
+
+	/* Initialize remaining insertion scan key fields */
+	_bt_metaversion(rel, &inskey->heapkeyspace, &inskey->allequalimage);
+	inskey->anynullkeys = false; /* unused */
+	inskey->nextkey = nextkey;
+	inskey->pivotsearch = false;
+	inskey->scantid = NULL;
+	inskey->keysz = keysCount;
+
+	return true;
+}
+
+/*----------
+ * Examine the scan keys to discover where we need to start the scan.
+ *
+ * We want to identify the keys that can be used as starting boundaries;
+ * these are =, >, or >= keys for a forward scan or =, <, <= keys for
+ * a backwards scan.  We can use keys for multiple attributes so long as
+ * the prior attributes had only =, >= (resp. =, <=) keys.  Once we accept
+ * a > or < boundary or find an attribute with no boundary (which can be
+ * thought of as the same as "> -infinity"), we can't use keys for any
+ * attributes to its right, because it would break our simplistic notion
+ * of what initial positioning strategy to use.
+ *
+ * When the scan keys include cross-type operators, _bt_preprocess_keys
+ * may not be able to eliminate redundant keys; in such cases we will
+ * arbitrarily pick a usable one for each attribute.  This is correct
+ * but possibly not optimal behavior.  (For example, with keys like
+ * "x >= 4 AND x >= 5" we would elect to scan starting at x=4 when
+ * x=5 would be more efficient.)  Since the situation only arises given
+ * a poorly-worded query plus an incomplete opfamily, live with it.
+ *
+ * When both equality and inequality keys appear for a single attribute
+ * (again, only possible when cross-type operators appear), we *must*
+ * select one of the equality keys for the starting point, because
+ * _bt_checkkeys() will stop the scan as soon as an equality qual fails.
+ * For example, if we have keys like "x >= 4 AND x = 10" and we elect to
+ * start at x=4, we will fail and stop before reaching x=10.  If multiple
+ * equality quals survive preprocessing, however, it doesn't matter which
+ * one we use --- by definition, they are either redundant or
+ * contradictory.
+ *
+ * Any regular (not SK_SEARCHNULL) key implies a NOT NULL qualifier.
+ * If the index stores nulls at the end of the index we'll be starting
+ * from, and we have no boundary key for the column (which means the key
+ * we deduced NOT NULL from is an inequality key that constrains the other
+ * end of the index), then we cons up an explicit SK_SEARCHNOTNULL key to
+ * use as a boundary key.  If we didn't do this, we might find ourselves
+ * traversing a lot of null entries at the start of the scan.
+ *
+ * In this loop, row-comparison keys are treated the same as keys on their
+ * first (leftmost) columns.  We'll add on lower-order columns of the row
+ * comparison below, if possible.
+ *
+ * The selected scan keys (at most one per index column) are remembered by
+ * storing their addresses into the local startKeys[] array.
+ *----------
+ */
+int _bt_choose_scan_keys(ScanKey scanKeys, int numberOfKeys, ScanDirection dir, ScanKey* startKeys, ScanKeyData* notnullkeys, StrategyNumber* stratTotal, int prefix)
+{
+	StrategyNumber strat;
+	int			keysCount = 0;
+	int			i;
+
+	*stratTotal = BTEqualStrategyNumber;
+	if (numberOfKeys > 0 || prefix > 0)
+	{
+		AttrNumber	curattr;
+		ScanKey		chosen;
+		ScanKey		impliesNN;
+		ScanKey		cur;
+
+		/*
+		 * chosen is the so-far-chosen key for the current attribute, if any.
+		 * We don't cast the decision in stone until we reach keys for the
+		 * next attribute.
+		 */
+		curattr = 1;
+		chosen = NULL;
+		/* Also remember any scankey that implies a NOT NULL constraint */
+		impliesNN = NULL;
+
+		/*
+		 * Loop iterates from 0 to numberOfKeys inclusive; we use the last
+		 * pass to handle after-last-key processing.  Actual exit from the
+		 * loop is at one of the "break" statements below.
+		 */
+		for (cur = scanKeys, i = 0;; cur++, i++)
+		{
+			if (i >= numberOfKeys || cur->sk_attno != curattr)
+			{
+				/*
+				 * Done looking at keys for curattr.  If we didn't find a
+				 * usable boundary key, see if we can deduce a NOT NULL key.
+				 */
+				if (chosen == NULL && impliesNN != NULL &&
+					((impliesNN->sk_flags & SK_BT_NULLS_FIRST) ?
+					 ScanDirectionIsForward(dir) :
+					 ScanDirectionIsBackward(dir)))
+				{
+					/* Yes, so build the key in notnullkeys[keysCount] */
+					chosen = &notnullkeys[keysCount];
+					ScanKeyEntryInitialize(chosen,
+										   (SK_SEARCHNOTNULL | SK_ISNULL |
+											(impliesNN->sk_flags &
+											 (SK_BT_DESC | SK_BT_NULLS_FIRST))),
+										   curattr,
+										   ((impliesNN->sk_flags & SK_BT_NULLS_FIRST) ?
+											BTGreaterStrategyNumber :
+											BTLessStrategyNumber),
+										   InvalidOid,
+										   InvalidOid,
+										   InvalidOid,
+										   (Datum) 0);
+				}
+
+				/*
+				 * If we still didn't find a usable boundary key, quit; else
+				 * save the boundary key pointer in startKeys.
+				 */
+				if (chosen == NULL && curattr > prefix)
+					break;
+				startKeys[keysCount++] = chosen;
+
+				/*
+				 * Adjust strat_total, and quit if we have stored a > or <
+				 * key.
+				 */
+				if (chosen != NULL && curattr > prefix)
+				{
+					strat = chosen->sk_strategy;
+					if (strat != BTEqualStrategyNumber)
+					{
+						*stratTotal = strat;
+						if (strat == BTGreaterStrategyNumber ||
+							strat == BTLessStrategyNumber)
+							break;
+					}
+				}
+
+				/*
+				 * Done if that was the last attribute, or if next key is not
+				 * in sequence (implying no boundary key is available for the
+				 * next attribute).
+				 */
+				if (i >= numberOfKeys)
+				{
+					curattr++;
+					while(curattr <= prefix)
+					{
+						startKeys[keysCount++] = NULL;
+						curattr++;
+					}
+					break;
+				}
+				else if (cur->sk_attno != curattr + 1)
+				{
+					curattr++;
+					while(curattr < cur->sk_attno && curattr <= prefix)
+					{
+						startKeys[keysCount++] = NULL;
+						curattr++;
+					}
+					if (curattr > prefix && curattr != cur->sk_attno)
+						break;
+				}
+				else
+				{
+					curattr++;
+				}
+
+				/*
+				 * Reset for next attr.
+				 */
+				chosen = NULL;
+				impliesNN = NULL;
+			}
+
+			/*
+			 * Can we use this key as a starting boundary for this attr?
+			 *
+			 * If not, does it imply a NOT NULL constraint?  (Because
+			 * SK_SEARCHNULL keys are always assigned BTEqualStrategyNumber,
+			 * *any* inequality key works for that; we need not test.)
+			 */
+			switch (cur->sk_strategy)
+			{
+				case BTLessStrategyNumber:
+				case BTLessEqualStrategyNumber:
+					if (chosen == NULL)
+					{
+						if (ScanDirectionIsBackward(dir))
+							chosen = cur;
+						else
+							impliesNN = cur;
+					}
+					break;
+				case BTEqualStrategyNumber:
+					/* override any non-equality choice */
+					chosen = cur;
+					break;
+				case BTGreaterEqualStrategyNumber:
+				case BTGreaterStrategyNumber:
+					if (chosen == NULL)
+					{
+						if (ScanDirectionIsForward(dir))
+							chosen = cur;
+						else
+							impliesNN = cur;
+					}
+					break;
+			}
+		}
+	}
+	return keysCount;
+}
+
+void print_itup(BlockNumber blk, IndexTuple left, IndexTuple right, Relation rel, char *extra)
+{
+	bool		isnull[INDEX_MAX_KEYS];
+	Datum		values[INDEX_MAX_KEYS];
+	char	   *lkey_desc = NULL;
+	char	   *rkey_desc;
+
+	/* Avoid infinite recursion -- don't instrument catalog indexes */
+	if (!IsCatalogRelation(rel))
+	{
+		TupleDesc	itupdesc = RelationGetDescr(rel);
+		int			natts;
+		int			indnkeyatts = rel->rd_index->indnkeyatts;
+
+		natts = BTreeTupleGetNAtts(left, rel);
+		itupdesc->natts = Min(indnkeyatts, natts);
+		memset(&isnull, 0xFF, sizeof(isnull));
+		index_deform_tuple(left, itupdesc, values, isnull);
+		rel->rd_index->indnkeyatts = natts;
+
+		/*
+		 * Since the regression tests should pass when the instrumentation
+		 * patch is applied, be prepared for BuildIndexValueDescription() to
+		 * return NULL due to security considerations.
+		 */
+		lkey_desc = BuildIndexValueDescription(rel, values, isnull);
+		if (lkey_desc && right)
+		{
+			/*
+			 * Revolting hack: modify tuple descriptor to have number of key
+			 * columns actually present in caller's pivot tuples
+			 */
+			natts = BTreeTupleGetNAtts(right, rel);
+			itupdesc->natts = Min(indnkeyatts, natts);
+			memset(&isnull, 0xFF, sizeof(isnull));
+			index_deform_tuple(right, itupdesc, values, isnull);
+			rel->rd_index->indnkeyatts = natts;
+			rkey_desc = BuildIndexValueDescription(rel, values, isnull);
+			elog(DEBUG1, "%s blk %u sk > %s, sk <= %s %s",
+				 RelationGetRelationName(rel), blk, lkey_desc, rkey_desc,
+				 extra);
+			pfree(rkey_desc);
+		}
+		else
+			elog(DEBUG1, "%s blk %u sk check %s %s",
+				 RelationGetRelationName(rel), blk, lkey_desc, extra);
+
+		/* Cleanup */
+		itupdesc->natts = IndexRelationGetNumberOfAttributes(rel);
+		rel->rd_index->indnkeyatts = indnkeyatts;
+		if (lkey_desc)
+			pfree(lkey_desc);
+	}
+}
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 4924ae1c59..b5db76cf48 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -68,6 +68,9 @@ spghandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = spgbulkdelete;
 	amroutine->amvacuumcleanup = spgvacuumcleanup;
 	amroutine->amcanreturn = spgcanreturn;
+	amroutine->amskip = NULL;
+	amroutine->ambeginskipscan = NULL;
+	amroutine->amgetskiptuple = NULL;
 	amroutine->amcostestimate = spgcostestimate;
 	amroutine->amoptions = spgoptions;
 	amroutine->amproperty = spgproperty;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 58141d8393..6a9e34b6d1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -143,6 +143,7 @@ static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
 static void ExplainIndentText(ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
+static void ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
 
 
@@ -1054,6 +1055,22 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 	return planstate_tree_walker(planstate, ExplainPreScanNode, rels_used);
 }
 
+/*
+ * ExplainIndexSkipScanKeys -
+ *	  Append information about index skip scan to es->str.
+ *
+ * Can be used to print the skip prefix size.
+ */
+static void
+ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es)
+{
+	if (skipPrefixSize > 0)
+	{
+		if (es->format != EXPLAIN_FORMAT_TEXT)
+			ExplainPropertyInteger("Distinct Prefix", NULL, skipPrefixSize, es);
+	}
+}
+
 /*
  * ExplainNode -
  *	  Appends a description of a plan tree to es->str
@@ -1388,6 +1405,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
 
+				if (indexscan->indexdistinct)
+					ExplainIndexSkipScanKeys(indexscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexscan->indexid,
 										indexscan->indexorderdir,
 										es);
@@ -1398,6 +1418,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) plan;
 
+				if (indexonlyscan->indexdistinct)
+					ExplainIndexSkipScanKeys(indexonlyscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexonlyscan->indexid,
 										indexonlyscan->indexorderdir,
 										es);
@@ -1657,6 +1680,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	switch (nodeTag(plan))
 	{
 		case T_IndexScan:
+			if (((IndexScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyText("Skip scan", ((IndexScan *) plan)->indexdistinct ? "Distinct only" : "All", es);
 			show_scan_qual(((IndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexScan *) plan)->indexqualorig)
@@ -1670,6 +1695,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			break;
 		case T_IndexOnlyScan:
+			if (((IndexOnlyScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyText("Skip scan", ((IndexOnlyScan *) plan)->indexdistinct ? "Distinct only" : "All", es);
 			show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexOnlyScan *) plan)->indexqual)
@@ -1686,6 +1713,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 									 planstate->instrument->ntuples2, 0, es);
 			break;
 		case T_BitmapIndexScan:
+			if (((BitmapIndexScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyText("Skip scan", "All", es);
 			show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
 			break;
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 642805d90c..0e77f241f9 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -133,6 +133,14 @@ ExecScanFetch(ScanState *node,
 	return (*accessMtd) (node);
 }
 
+TupleTableSlot *
+ExecScan(ScanState *node,
+		 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
+		 ExecScanRecheckMtd recheckMtd)
+{
+	return ExecScanExtended(node, accessMtd, recheckMtd, NULL);
+}
+
 /* ----------------------------------------------------------------
  *		ExecScan
  *
@@ -155,9 +163,10 @@ ExecScanFetch(ScanState *node,
  * ----------------------------------------------------------------
  */
 TupleTableSlot *
-ExecScan(ScanState *node,
+ExecScanExtended(ScanState *node,
 		 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
-		 ExecScanRecheckMtd recheckMtd)
+		 ExecScanRecheckMtd recheckMtd,
+		 ExecScanSkipMtd skipMtd)
 {
 	ExprContext *econtext;
 	ExprState  *qual;
@@ -170,6 +179,20 @@ ExecScan(ScanState *node,
 	projInfo = node->ps.ps_ProjInfo;
 	econtext = node->ps.ps_ExprContext;
 
+	if (skipMtd != NULL && node->ss_FirstTupleEmitted)
+	{
+		bool cont = skipMtd(node);
+		if (!cont)
+		{
+			node->ss_FirstTupleEmitted = false;
+			return ExecClearTuple(node->ss_ScanTupleSlot);
+		}
+	}
+	else
+	{
+		node->ss_FirstTupleEmitted = true;
+	}
+
 	/* interrupt checks are in ExecScanFetch */
 
 	/*
@@ -178,8 +201,13 @@ ExecScan(ScanState *node,
 	 */
 	if (!qual && !projInfo)
 	{
+		TupleTableSlot *slot;
+
 		ResetExprContext(econtext);
-		return ExecScanFetch(node, accessMtd, recheckMtd);
+		slot = ExecScanFetch(node, accessMtd, recheckMtd);
+		if (TupIsNull(slot))
+			node->ss_FirstTupleEmitted = false;
+		return slot;
 	}
 
 	/*
@@ -206,6 +234,7 @@ ExecScan(ScanState *node,
 		 */
 		if (TupIsNull(slot))
 		{
+			node->ss_FirstTupleEmitted = false;
 			if (projInfo)
 				return ExecClearTuple(projInfo->pi_state.resultslot);
 			else
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 81a1208157..602c64fc91 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -22,13 +22,14 @@
 #include "postgres.h"
 
 #include "access/genam.h"
+#include "access/relscan.h"
 #include "executor/execdebug.h"
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeIndexscan.h"
 #include "miscadmin.h"
+#include "utils/rel.h"
 #include "utils/memutils.h"
 
-
 /* ----------------------------------------------------------------
  *		ExecBitmapIndexScan
  *
@@ -308,10 +309,20 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
 	/*
 	 * Initialize scan descriptor.
 	 */
-	indexstate->biss_ScanDesc =
-		index_beginscan_bitmap(indexstate->biss_RelationDesc,
-							   estate->es_snapshot,
-							   indexstate->biss_NumScanKeys);
+	if (node->indexskipprefixsize > 0)
+	{
+		indexstate->biss_ScanDesc =
+			index_beginscan_bitmap_skip(indexstate->biss_RelationDesc,
+				estate->es_snapshot,
+				indexstate->biss_NumScanKeys,
+				Min(IndexRelationGetNumberOfKeyAttributes(indexstate->biss_RelationDesc),
+					node->indexskipprefixsize));
+	}
+	else
+		indexstate->biss_ScanDesc =
+			index_beginscan_bitmap(indexstate->biss_RelationDesc,
+								   estate->es_snapshot,
+								   indexstate->biss_NumScanKeys);
 
 	/*
 	 * If no run-time keys to calculate, go ahead and pass the scankeys to the
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 5617ac29e7..aadba4a0fe 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -41,6 +41,7 @@
 #include "miscadmin.h"
 #include "storage/bufmgr.h"
 #include "storage/predicate.h"
+#include "storage/itemptr.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -49,6 +50,37 @@ static TupleTableSlot *IndexOnlyNext(IndexOnlyScanState *node);
 static void StoreIndexTuple(TupleTableSlot *slot, IndexTuple itup,
 							TupleDesc itupdesc);
 
+static bool
+IndexOnlySkip(IndexOnlyScanState *node)
+{
+	EState	   *estate;
+	ScanDirection direction;
+	IndexScanDesc scandesc;
+	IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) node->ss.ps.plan;
+
+	if (!node->ioss_Distinct)
+		return true;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+	/* flip direction if this is an overall backward scan */
+	if (ScanDirectionIsBackward(indexonlyscan->indexorderdir))
+	{
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
+	}
+	scandesc = node->ioss_ScanDesc;
+
+	if (!index_skip(scandesc, direction, indexonlyscan->indexorderdir))
+		return false;
+
+	return true;
+}
 
 /* ----------------------------------------------------------------
  *		IndexOnlyNext
@@ -65,6 +97,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
 	ItemPointer tid;
+	ItemPointerData startTid;
+	IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) node->ss.ps.plan;
 
 	/*
 	 * extract necessary information from index scan node
@@ -72,7 +106,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexOnlyScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexonlyscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -90,11 +124,19 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 * serially executing an index only scan that was planned to be
 		 * parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->ioss_RelationDesc,
-								   estate->es_snapshot,
-								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+		if (node->ioss_SkipPrefixSize > 0)
+			scandesc = index_beginscan_skip(node->ss.ss_currentRelation,
+									   node->ioss_RelationDesc,
+									   estate->es_snapshot,
+									   node->ioss_NumScanKeys,
+									   node->ioss_NumOrderByKeys,
+									   Min(IndexRelationGetNumberOfKeyAttributes(node->ioss_RelationDesc), node->ioss_SkipPrefixSize));
+		else
+			scandesc = index_beginscan(node->ss.ss_currentRelation,
+									   node->ioss_RelationDesc,
+									   estate->es_snapshot,
+									   node->ioss_NumScanKeys,
+									   node->ioss_NumOrderByKeys);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -114,11 +156,16 @@ IndexOnlyNext(IndexOnlyScanState *node)
 						 node->ioss_OrderByKeys,
 						 node->ioss_NumOrderByKeys);
 	}
+	else
+	{
+		ItemPointerCopy(&scandesc->xs_heaptid, &startTid);
+	}
 
 	/*
 	 * OK, now that we have what we need, fetch the next tuple.
 	 */
-	while ((tid = index_getnext_tid(scandesc, direction)) != NULL)
+	while ((tid = node->ioss_SkipPrefixSize > 0 ? index_getnext_tid_skip(scandesc, direction, node->ioss_Distinct ? indexonlyscan->indexorderdir : direction) :
+			index_getnext_tid(scandesc, direction)) != NULL)
 	{
 		bool		tuple_from_heap = false;
 
@@ -314,9 +361,10 @@ ExecIndexOnlyScan(PlanState *pstate)
 	if (node->ioss_NumRuntimeKeys != 0 && !node->ioss_RuntimeKeysReady)
 		ExecReScan((PlanState *) node);
 
-	return ExecScan(&node->ss,
+	return ExecScanExtended(&node->ss,
 					(ExecScanAccessMtd) IndexOnlyNext,
-					(ExecScanRecheckMtd) IndexOnlyRecheck);
+					(ExecScanRecheckMtd) IndexOnlyRecheck,
+					node->ioss_Distinct ? (ExecScanSkipMtd) IndexOnlySkip : NULL);
 }
 
 /* ----------------------------------------------------------------
@@ -503,7 +551,10 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	indexstate = makeNode(IndexOnlyScanState);
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
+	indexstate->ss.ss_FirstTupleEmitted = false;
 	indexstate->ss.ps.ExecProcNode = ExecIndexOnlyScan;
+	indexstate->ioss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->ioss_Distinct = node->indexdistinct;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index d0a96a38e0..db3b5a3379 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -69,6 +69,37 @@ static void reorderqueue_push(IndexScanState *node, TupleTableSlot *slot,
 							  Datum *orderbyvals, bool *orderbynulls);
 static HeapTuple reorderqueue_pop(IndexScanState *node);
 
+static bool
+IndexSkip(IndexScanState *node)
+{
+	EState	   *estate;
+	ScanDirection direction;
+	IndexScanDesc scandesc;
+	IndexScan *indexscan = (IndexScan *) node->ss.ps.plan;
+
+	if (!node->iss_Distinct)
+		return true;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+	/* flip direction if this is an overall backward scan */
+	if (ScanDirectionIsBackward(indexscan->indexorderdir))
+	{
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
+	}
+	scandesc = node->iss_ScanDesc;
+
+	if (!index_skip(scandesc, direction, indexscan->indexorderdir))
+		return false;
+
+	return true;
+}
 
 /* ----------------------------------------------------------------
  *		IndexNext
@@ -85,6 +116,7 @@ IndexNext(IndexScanState *node)
 	ScanDirection direction;
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
+	IndexScan *indexscan = (IndexScan *) node->ss.ps.plan;
 
 	/*
 	 * extract necessary information from index scan node
@@ -92,7 +124,7 @@ IndexNext(IndexScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -109,14 +141,25 @@ IndexNext(IndexScanState *node)
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		if (node->iss_SkipPrefixSize > 0)
+			scandesc = index_beginscan_skip(node->ss.ss_currentRelation,
+									   node->iss_RelationDesc,
+									   estate->es_snapshot,
+									   node->iss_NumScanKeys,
+									   node->iss_NumOrderByKeys,
+									   Min(IndexRelationGetNumberOfKeyAttributes(node->iss_RelationDesc), node->iss_SkipPrefixSize));
+		else
+			scandesc = index_beginscan(node->ss.ss_currentRelation,
+									   node->iss_RelationDesc,
+									   estate->es_snapshot,
+									   node->iss_NumScanKeys,
+									   node->iss_NumOrderByKeys);
 
 		node->iss_ScanDesc = scandesc;
 
+		/* Index skip scan assumes xs_want_itup, so set it to true if we skip over distinct */
+		node->iss_ScanDesc->xs_want_itup = indexscan->indexdistinct;
+
 		/*
 		 * If no run-time keys to calculate or they are ready, go ahead and
 		 * pass the scankeys to the index AM.
@@ -130,7 +173,9 @@ IndexNext(IndexScanState *node)
 	/*
 	 * ok, now that we have what we need, fetch the next tuple.
 	 */
-	while (index_getnext_slot(scandesc, direction, slot))
+	while (node->iss_SkipPrefixSize > 0 ?
+		   index_getnext_slot_skip(scandesc, direction, node->iss_Distinct ? indexscan->indexorderdir : direction, slot) :
+		   index_getnext_slot(scandesc, direction, slot))
 	{
 		CHECK_FOR_INTERRUPTS();
 
@@ -530,13 +575,15 @@ ExecIndexScan(PlanState *pstate)
 		ExecReScan((PlanState *) node);
 
 	if (node->iss_NumOrderByKeys > 0)
-		return ExecScan(&node->ss,
+		return ExecScanExtended(&node->ss,
 						(ExecScanAccessMtd) IndexNextWithReorder,
-						(ExecScanRecheckMtd) IndexRecheck);
+						(ExecScanRecheckMtd) IndexRecheck,
+						node->iss_Distinct ? (ExecScanSkipMtd) IndexSkip : NULL);
 	else
-		return ExecScan(&node->ss,
+		return ExecScanExtended(&node->ss,
 						(ExecScanAccessMtd) IndexNext,
-						(ExecScanRecheckMtd) IndexRecheck);
+						(ExecScanRecheckMtd) IndexRecheck,
+						node->iss_Distinct ? (ExecScanSkipMtd) IndexSkip : NULL);
 }
 
 /* ----------------------------------------------------------------
@@ -910,6 +957,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
 	indexstate->ss.ps.ExecProcNode = ExecIndexScan;
+	indexstate->iss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->iss_Distinct = node->indexdistinct;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 1a70625dc8..21af804b4f 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -493,6 +493,8 @@ _copyIndexScan(const IndexScan *from)
 	COPY_NODE_FIELD(indexorderbyorig);
 	COPY_NODE_FIELD(indexorderbyops);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
+	COPY_SCALAR_FIELD(indexdistinct);
 
 	return newnode;
 }
@@ -518,6 +520,8 @@ _copyIndexOnlyScan(const IndexOnlyScan *from)
 	COPY_NODE_FIELD(indexorderby);
 	COPY_NODE_FIELD(indextlist);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
+	COPY_SCALAR_FIELD(indexdistinct);
 
 	return newnode;
 }
@@ -542,6 +546,7 @@ _copyBitmapIndexScan(const BitmapIndexScan *from)
 	COPY_SCALAR_FIELD(isshared);
 	COPY_NODE_FIELD(indexqual);
 	COPY_NODE_FIELD(indexqualorig);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 82fcabd9ee..50297fb9dc 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -562,6 +562,8 @@ _outIndexScan(StringInfo str, const IndexScan *node)
 	WRITE_NODE_FIELD(indexorderbyorig);
 	WRITE_NODE_FIELD(indexorderbyops);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
+	WRITE_INT_FIELD(indexdistinct);
 }
 
 static void
@@ -576,6 +578,9 @@ _outIndexOnlyScan(StringInfo str, const IndexOnlyScan *node)
 	WRITE_NODE_FIELD(indexorderby);
 	WRITE_NODE_FIELD(indextlist);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
+	WRITE_INT_FIELD(indexdistinct);
+
 }
 
 static void
@@ -589,6 +594,7 @@ _outBitmapIndexScan(StringInfo str, const BitmapIndexScan *node)
 	WRITE_BOOL_FIELD(isshared);
 	WRITE_NODE_FIELD(indexqual);
 	WRITE_NODE_FIELD(indexqualorig);
+	WRITE_INT_FIELD(indexskipprefixsize);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index d5b23a3479..3129beb9d7 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1820,6 +1820,8 @@ _readIndexScan(void)
 	READ_NODE_FIELD(indexorderbyorig);
 	READ_NODE_FIELD(indexorderbyops);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
+	READ_INT_FIELD(indexdistinct);
 
 	READ_DONE();
 }
@@ -1839,6 +1841,8 @@ _readIndexOnlyScan(void)
 	READ_NODE_FIELD(indexorderby);
 	READ_NODE_FIELD(indextlist);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
+	READ_INT_FIELD(indexdistinct);
 
 	READ_DONE();
 }
@@ -1857,6 +1861,7 @@ _readBitmapIndexScan(void)
 	READ_BOOL_FIELD(isshared);
 	READ_NODE_FIELD(indexqual);
 	READ_NODE_FIELD(indexqualorig);
+	READ_INT_FIELD(indexskipprefixsize);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 8cf694b61d..9126296bd6 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -125,6 +125,7 @@ int			max_parallel_workers_per_gather = 2;
 bool		enable_seqscan = true;
 bool		enable_indexscan = true;
 bool		enable_indexonlyscan = true;
+bool		enable_indexskipscan = true;
 bool		enable_bitmapscan = true;
 bool		enable_tidscan = true;
 bool		enable_sort = true;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index fc25908dc6..948414bd80 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -175,15 +175,20 @@ static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
 								 Oid indexid, List *indexqual, List *indexqualorig,
 								 List *indexorderby, List *indexorderbyorig,
 								 List *indexorderbyops,
-								 ScanDirection indexscandir);
+								 ScanDirection indexscandir,
+								 int skipprefix,
+								 bool distinct);
 static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 Index scanrelid, Oid indexid,
 										 List *indexqual, List *indexorderby,
 										 List *indextlist,
-										 ScanDirection indexscandir);
+										 ScanDirection indexscandir,
+										 int skipprefix,
+										 bool distinct);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
-											  List *indexqualorig);
+											  List *indexqualorig,
+											  int skipPrefixSize);
 static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 											List *qpqual,
 											Plan *lefttree,
@@ -2914,7 +2919,9 @@ create_indexscan_plan(PlannerInfo *root,
 												fixed_indexquals,
 												fixed_indexorderbys,
 												best_path->indexinfo->indextlist,
-												best_path->indexscandir);
+												best_path->indexscandir,
+												best_path->indexskipprefix,
+												best_path->indexdistinct);
 	else
 		scan_plan = (Scan *) make_indexscan(tlist,
 											qpqual,
@@ -2925,7 +2932,9 @@ create_indexscan_plan(PlannerInfo *root,
 											fixed_indexorderbys,
 											indexorderbys,
 											indexorderbyops,
-											best_path->indexscandir);
+											best_path->indexscandir,
+											best_path->indexskipprefix,
+											best_path->indexdistinct);
 
 	copy_generic_path_info(&scan_plan->plan, &best_path->path);
 
@@ -3215,7 +3224,8 @@ create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
 		plan = (Plan *) make_bitmap_indexscan(iscan->scan.scanrelid,
 											  iscan->indexid,
 											  iscan->indexqual,
-											  iscan->indexqualorig);
+											  iscan->indexqualorig,
+											  iscan->indexskipprefixsize);
 		/* and set its cost/width fields appropriately */
 		plan->startup_cost = 0.0;
 		plan->total_cost = ipath->indextotalcost;
@@ -5186,7 +5196,9 @@ make_indexscan(List *qptlist,
 			   List *indexorderby,
 			   List *indexorderbyorig,
 			   List *indexorderbyops,
-			   ScanDirection indexscandir)
+			   ScanDirection indexscandir,
+			   int skipPrefixSize,
+			   bool distinct)
 {
 	IndexScan  *node = makeNode(IndexScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5203,6 +5215,8 @@ make_indexscan(List *qptlist,
 	node->indexorderbyorig = indexorderbyorig;
 	node->indexorderbyops = indexorderbyops;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
+	node->indexdistinct = distinct;
 
 	return node;
 }
@@ -5215,7 +5229,9 @@ make_indexonlyscan(List *qptlist,
 				   List *indexqual,
 				   List *indexorderby,
 				   List *indextlist,
-				   ScanDirection indexscandir)
+				   ScanDirection indexscandir,
+				   int skipPrefixSize,
+				   bool distinct)
 {
 	IndexOnlyScan *node = makeNode(IndexOnlyScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5230,6 +5246,8 @@ make_indexonlyscan(List *qptlist,
 	node->indexorderby = indexorderby;
 	node->indextlist = indextlist;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
+	node->indexdistinct = distinct;
 
 	return node;
 }
@@ -5238,7 +5256,8 @@ static BitmapIndexScan *
 make_bitmap_indexscan(Index scanrelid,
 					  Oid indexid,
 					  List *indexqual,
-					  List *indexqualorig)
+					  List *indexqualorig,
+					  int skipPrefixSize)
 {
 	BitmapIndexScan *node = makeNode(BitmapIndexScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5251,6 +5270,7 @@ make_bitmap_indexscan(Index scanrelid,
 	node->indexid = indexid;
 	node->indexqual = indexqual;
 	node->indexqualorig = indexqualorig;
+	node->indexskipprefixsize = skipPrefixSize;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 6a7b55abd2..62b5e5e071 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4832,6 +4832,70 @@ create_distinct_paths(PlannerInfo *root,
 												  path,
 												  list_length(root->distinct_pathkeys),
 												  numDistinctRows));
+
+				/* Consider index skip scan as well */
+				if (enable_indexskipscan &&
+					IsA(path, IndexPath) &&
+					((IndexPath *) path)->indexinfo->amcanskip &&
+					root->distinct_pathkeys != NIL)
+				{
+					ListCell   		*lc;
+					IndexOptInfo 	*index = NULL;
+					bool 			different_columns_order = false,
+									multiple_froms = false;
+					int 			i = 0;
+					int 			distinctPrefixKeys;
+
+					Assert(path->pathtype == T_IndexOnlyScan ||
+						   path->pathtype == T_IndexScan);
+
+					index = ((IndexPath *) path)->indexinfo;
+					distinctPrefixKeys = list_length(root->query_uniquekeys);
+
+					/*
+					 * Normally we can think about distinctPrefixKeys as just
+					 * a number of distinct keys. But if lets say we have a
+					 * distinct key a, and the index contains b, a in exactly
+					 * this order. In such situation we need to use position
+					 * of a in the index as distinctPrefixKeys, otherwise skip
+					 * will happen only by the first column.
+					 */
+					foreach(lc, root->query_uniquekeys)
+					{
+						UniqueKey *uniquekey = (UniqueKey *) lfirst(lc);
+						EquivalenceMember *em =
+							lfirst_node(EquivalenceMember,
+										list_head(uniquekey->eq_clause->ec_members));
+						Var *var = (Var *) em->em_expr;
+
+						Assert(i < index->ncolumns);
+
+						for (i = 0; i < index->ncolumns; i++)
+						{
+							if (index->indexkeys[i] == var->varattno)
+							{
+								distinctPrefixKeys = Max(i + 1, distinctPrefixKeys);
+								break;
+							}
+						}
+					}
+
+					/* we can only do this if scanning from one relation */
+					if (path->pathtype == T_IndexScan &&
+						parse->jointree != NULL &&
+						list_length(parse->jointree->fromlist) > 1)
+							multiple_froms = true;
+
+					if (!different_columns_order &&	!multiple_froms)
+					{
+						add_path(distinct_rel, (Path *)
+								 create_skipscan_unique_path(root,
+															 distinct_rel,
+															 path,
+															 distinctPrefixKeys,
+															 numDistinctRows));
+					}
+				}
 			}
 		}
 
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 278436f102..87d39570b5 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2916,6 +2916,46 @@ create_upper_unique_path(PlannerInfo *root,
 	return pathnode;
 }
 
+/*
+ * create_skipscan_unique_path
+ *	  Creates a pathnode the same as an existing IndexPath except based on
+ *	  skipping duplicate values.  This may or may not be cheaper than using
+ *	  create_upper_unique_path.
+ *
+ * The input path must be an IndexPath for an index that supports amskip.
+ */
+IndexPath *
+create_skipscan_unique_path(PlannerInfo *root,
+							RelOptInfo *rel,
+							Path *basepath,
+							int distinctPrefixKeys,
+							double numGroups)
+{
+	IndexPath *pathnode = makeNode(IndexPath);
+
+	Assert(IsA(basepath, IndexPath));
+
+	/* We don't want to modify basepath, so make a copy. */
+	memcpy(pathnode, basepath, sizeof(IndexPath));
+
+	/* The size of the prefix we'll use for skipping. */
+	Assert(pathnode->indexinfo->amcanskip);
+	Assert(distinctPrefixKeys > 0);
+	pathnode->indexskipprefix = distinctPrefixKeys;
+	pathnode->indexdistinct = true;
+
+	/*
+	 * The cost to skip to each distinct value should be roughly the same as
+	 * the cost of finding the first key times the number of distinct values
+	 * we expect to find.
+	 */
+	pathnode->path.startup_cost = basepath->startup_cost;
+	pathnode->path.total_cost = basepath->startup_cost * numGroups;
+	pathnode->path.rows = numGroups;
+
+	return pathnode;
+}
+
 /*
  * create_agg_path
  *	  Creates a pathnode that represents performing aggregation/grouping
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d82fc5ab8b..364e23cbfb 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -271,6 +271,9 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			info->amoptionalkey = amroutine->amoptionalkey;
 			info->amsearcharray = amroutine->amsearcharray;
 			info->amsearchnulls = amroutine->amsearchnulls;
+			info->amcanskip = (amroutine->amskip != NULL &&
+					amroutine->amgetskiptuple != NULL &&
+					amroutine->ambeginskipscan != NULL);
 			info->amcanparallel = amroutine->amcanparallel;
 			info->amhasgettuple = (amroutine->amgettuple != NULL);
 			info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 7d1f1069f1..1b401f837a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -960,6 +960,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_indexskipscan", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of index-skip-scan plans."),
+			NULL
+		},
+		&enable_indexskipscan,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_bitmapscan", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of bitmap-scan plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c7e46592fb..7f7929ecd8 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -354,6 +354,7 @@
 #enable_hashjoin = on
 #enable_indexscan = on
 #enable_indexonlyscan = on
+#enable_indexskipscan = on
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index d02e676aa3..adca0c69d2 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -919,7 +919,7 @@ tuplesort_begin_cluster(TupleDesc tupDesc,
 
 	state->tupDesc = tupDesc;	/* assume we need not copy tupDesc */
 
-	indexScanKey = _bt_mkscankey(indexRel, NULL);
+	indexScanKey = _bt_mkscankey(indexRel, NULL, NULL);
 
 	if (state->indexInfo->ii_Expressions != NULL)
 	{
@@ -1014,7 +1014,7 @@ tuplesort_begin_index_btree(Relation heapRel,
 	state->indexRel = indexRel;
 	state->enforceUnique = enforceUnique;
 
-	indexScanKey = _bt_mkscankey(indexRel, NULL);
+	indexScanKey = _bt_mkscankey(indexRel, NULL, NULL);
 
 	/* Prepare SortSupport data for each column */
 	state->sortKeys = (SortSupport) palloc0(state->nKeys *
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 3b3e22f73d..c6b352d61f 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -119,6 +119,12 @@ typedef IndexScanDesc (*ambeginscan_function) (Relation indexRelation,
 											   int nkeys,
 											   int norderbys);
 
+/* prepare for index scan with skip */
+typedef IndexScanDesc (*ambeginscan_skip_function) (Relation indexRelation,
+											   int nkeys,
+											   int norderbys,
+											   int prefix);
+
 /* (re)start index scan */
 typedef void (*amrescan_function) (IndexScanDesc scan,
 								   ScanKey keys,
@@ -130,6 +136,16 @@ typedef void (*amrescan_function) (IndexScanDesc scan,
 typedef bool (*amgettuple_function) (IndexScanDesc scan,
 									 ScanDirection direction);
 
+/* next valid tuple */
+typedef bool (*amgettuple_with_skip_function) (IndexScanDesc scan,
+											   ScanDirection prefixDir,
+											   ScanDirection postfixDir);
+
+/* skip past duplicates */
+typedef bool (*amskip_function) (IndexScanDesc scan,
+								 ScanDirection prefixDir,
+								 ScanDirection postfixDir);
+
 /* fetch all valid tuples */
 typedef int64 (*amgetbitmap_function) (IndexScanDesc scan,
 									   TIDBitmap *tbm);
@@ -223,12 +239,15 @@ typedef struct IndexAmRoutine
 	ambuildphasename_function ambuildphasename; /* can be NULL */
 	amvalidate_function amvalidate;
 	ambeginscan_function ambeginscan;
+	ambeginscan_skip_function ambeginskipscan;
 	amrescan_function amrescan;
 	amgettuple_function amgettuple; /* can be NULL */
+	amgettuple_with_skip_function amgetskiptuple; /* can be NULL */
 	amgetbitmap_function amgetbitmap;	/* can be NULL */
 	amendscan_function amendscan;
 	ammarkpos_function ammarkpos;	/* can be NULL */
 	amrestrpos_function amrestrpos; /* can be NULL */
+	amskip_function amskip;				/* can be NULL */
 
 	/* interface functions to support parallel index scans */
 	amestimateparallelscan_function amestimateparallelscan; /* can be NULL */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 7e9364a50c..7cea6c1756 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -149,9 +149,17 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_skip(Relation heapRelation,
+									 Relation indexRelation,
+									 Snapshot snapshot,
+									 int nkeys, int norderbys, int prefix);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											int nkeys);
+extern IndexScanDesc index_beginscan_bitmap_skip(Relation indexRelation,
+											Snapshot snapshot,
+											int nkeys,
+											int prefix);
 extern void index_rescan(IndexScanDesc scan,
 						 ScanKey keys, int nkeys,
 						 ScanKey orderbys, int norderbys);
@@ -167,10 +175,16 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  ParallelIndexScanDesc pscan);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
+extern ItemPointer index_getnext_tid_skip(IndexScanDesc scan,
+									 ScanDirection prefixDir,
+									 ScanDirection postfixDir);
 struct TupleTableSlot;
 extern bool index_fetch_heap(IndexScanDesc scan, struct TupleTableSlot *slot);
 extern bool index_getnext_slot(IndexScanDesc scan, ScanDirection direction,
 							   struct TupleTableSlot *slot);
+extern bool index_getnext_slot_skip(IndexScanDesc scan, ScanDirection prefixDir,
+									ScanDirection postfixDir,
+									struct TupleTableSlot *slot);
 extern int64 index_getbitmap(IndexScanDesc scan, TIDBitmap *bitmap);
 
 extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
@@ -180,6 +194,8 @@ extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
 extern IndexBulkDeleteResult *index_vacuum_cleanup(IndexVacuumInfo *info,
 												   IndexBulkDeleteResult *stats);
 extern bool index_can_return(Relation indexRelation, int attno);
+extern bool index_skip(IndexScanDesc scan, ScanDirection prefixDir,
+					   ScanDirection postfixDir);
 extern RegProcedure index_getprocid(Relation irel, AttrNumber attnum,
 									uint16 procnum);
 extern FmgrInfo *index_getprocinfo(Relation irel, AttrNumber attnum,
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 18206a0c65..015bb63df5 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -910,6 +910,53 @@ typedef struct BTArrayKeyInfo
 	Datum	   *elem_values;	/* array of num_elems Datums */
 } BTArrayKeyInfo;
 
+typedef struct BTSkipCompareResult
+{
+	bool		equal;
+	int			prefixCmpResult, skCmpResult;
+	bool		prefixSkip, fullKeySkip;
+	int			prefixSkipIndex;
+} BTSkipCompareResult;
+
+typedef enum BTSkipState
+{
+	SkipStateStop,
+	SkipStateSkip,
+	SkipStateSkipExtra,
+	SkipStateNext
+} BTSkipState;
+
+typedef struct BTSkipPosData
+{
+	BTSkipState nextAction;
+	ScanDirection nextDirection;
+	int nextSkipIndex;
+	BTScanInsertData skipScanKey;
+} BTSkipPosData;
+
+typedef struct BTSkipData
+{
+	/* used to control skipping
+	 * skipScanKey is a combination of currentTupleKey and fwdScanKey/bwdScanKey.
+	 * currentTupleKey contains the scan keys for the current tuple
+	 * fwdScanKey contains the scan keys for quals that would be chosen for a forward scan
+	 * bwdScanKey contains the scan keys for quals that would be chosen for a backward scan
+	 * we need both fwd and bwd, because the scan keys differ for going fwd and bwd
+	 * if a qual would be a>2 and a<5, fwd would have a>2, while bwd would have a<5
+	 */
+	BTScanInsertData	currentTupleKey;
+	BTScanInsertData	fwdScanKey;
+	ScanKeyData			fwdNotNullKeys[INDEX_MAX_KEYS];
+	BTScanInsertData	bwdScanKey;
+	ScanKeyData			bwdNotNullKeys[INDEX_MAX_KEYS];
+	/* length of prefix to skip */
+	int					prefix;
+
+	BTSkipPosData curPos, markPos;
+} BTSkipData;
+
+typedef BTSkipData *BTSkip;
+
 typedef struct BTScanOpaqueData
 {
 	/* these fields are set by _bt_preprocess_keys(): */
@@ -947,6 +994,9 @@ typedef struct BTScanOpaqueData
 	 */
 	int			markItemIndex;	/* itemIndex, or -1 if not valid */
 
+	/* Work space for _bt_skip */
+	BTSkip	skipData;	/* used to control skipping */
+
 	/* keep these last in struct for efficiency */
 	BTScanPosData currPos;		/* current position data */
 	BTScanPosData markPos;		/* marked position, if any */
@@ -961,6 +1011,8 @@ typedef BTScanOpaqueData *BTScanOpaque;
  */
 #define SK_BT_REQFWD	0x00010000	/* required to continue forward scan */
 #define SK_BT_REQBKWD	0x00020000	/* required to continue backward scan */
+#define SK_BT_REQSKIPFWD	0x00040000	/* required to continue forward scan within current prefix */
+#define SK_BT_REQSKIPBKWD	0x00080000	/* required to continue backward scan within current prefix */
 #define SK_BT_INDOPTION_SHIFT  24	/* must clear the above bits */
 #define SK_BT_DESC			(INDOPTION_DESC << SK_BT_INDOPTION_SHIFT)
 #define SK_BT_NULLS_FIRST	(INDOPTION_NULLS_FIRST << SK_BT_INDOPTION_SHIFT)
@@ -1007,9 +1059,12 @@ extern bool btinsert(Relation rel, Datum *values, bool *isnull,
 					 IndexUniqueCheck checkUnique,
 					 struct IndexInfo *indexInfo);
 extern IndexScanDesc btbeginscan(Relation rel, int nkeys, int norderbys);
+extern IndexScanDesc btbeginscan_skip(Relation rel, int nkeys, int norderbys, int skipPrefix);
 extern Size btestimateparallelscan(void);
 extern void btinitparallelscan(void *target);
 extern bool btgettuple(IndexScanDesc scan, ScanDirection dir);
+extern bool btgettuple_skip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir);
+extern bool btskip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir);
 extern int64 btgetbitmap(IndexScanDesc scan, TIDBitmap *tbm);
 extern void btrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 					 ScanKey orderbys, int norderbys);
@@ -1101,15 +1156,79 @@ extern Buffer _bt_moveright(Relation rel, BTScanInsert key, Buffer buf,
 							bool forupdate, BTStack stack, int access, Snapshot snapshot);
 extern OffsetNumber _bt_binsrch_insert(Relation rel, BTInsertState insertstate);
 extern int32 _bt_compare(Relation rel, BTScanInsert key, Page page, OffsetNumber offnum);
-extern bool _bt_first(IndexScanDesc scan, ScanDirection dir);
-extern bool _bt_next(IndexScanDesc scan, ScanDirection dir);
+extern bool _bt_first(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir);
+extern bool _bt_next(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir);
 extern Buffer _bt_get_endpoint(Relation rel, uint32 level, bool rightmost,
 							   Snapshot snapshot);
+extern Buffer _bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot);
+extern OffsetNumber _bt_binsrch(Relation rel, BTScanInsert key, Buffer buf);
+extern void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir);
+extern bool _bt_readpage(IndexScanDesc scan, ScanDirection dir,
+						 OffsetNumber *offnum, bool isRegularMode);
+extern bool _bt_steppage(IndexScanDesc scan, ScanDirection dir);
+extern bool _bt_readnextpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir);
+extern void _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp);
+
+/*
+* prototypes for functions in nbtskip.c
+*/
+static inline bool
+_bt_skip_enabled(BTScanOpaque so)
+{
+	return so->skipData != NULL;
+}
+
+static inline bool
+_bt_skip_is_regular_mode(ScanDirection prefixDir, ScanDirection postfixDir)
+{
+	return prefixDir == postfixDir;
+}
+
+/* returns whether or not we can use extra quals in the scankey after skipping to a prefix */
+static inline bool
+_bt_has_extra_quals_after_skip(BTSkip skip, ScanDirection dir, int prefix)
+{
+	if (ScanDirectionIsForward(dir))
+	{
+		return skip->fwdScanKey.keysz > prefix;
+	}
+	else
+	{
+		return skip->bwdScanKey.keysz > prefix;
+	}
+}
+
+/* alias of BTScanPosIsValid */
+static inline bool
+_bt_skip_is_always_valid(BTScanOpaque so)
+{
+	return BTScanPosIsValid(so->currPos);
+}
+
+extern bool _bt_skip(IndexScanDesc scan, ScanDirection prefixDir, ScanDirection postfixDir);
+extern void _bt_skip_create_scankeys(Relation rel, BTScanOpaque so);
+extern void _bt_skip_update_scankey_for_extra_skip(IndexScanDesc scan, Relation indexRel,
+					ScanDirection curDir, ScanDirection prefixDir, bool prioritizeEqual, IndexTuple itup);
+extern void _bt_skip_once(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum,
+						  bool forceSkip, ScanDirection prefixDir, ScanDirection postfixDir);
+extern void _bt_skip_extra_conditions(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum,
+									  ScanDirection prefixDir, ScanDirection postfixDir, BTSkipCompareResult *cmp);
+extern bool _bt_skip_find_next(IndexScanDesc scan, IndexTuple curTuple, OffsetNumber curTupleOffnum,
+							   ScanDirection prefixDir, ScanDirection postfixDir);
+extern void _bt_skip_until_match(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum,
+								 ScanDirection prefixDir, ScanDirection postfixDir);
+extern bool _bt_has_results(BTScanOpaque so);
+extern void _bt_compare_current_item(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
+									 ScanDirection dir, bool isRegularMode, BTSkipCompareResult* cmp);
+extern bool _bt_step_back_page(IndexScanDesc scan, IndexTuple *curTuple, OffsetNumber *curTupleOffnum);
+extern bool _bt_step_forward_page(IndexScanDesc scan, BlockNumber next, IndexTuple *curTuple,
+								  OffsetNumber *curTupleOffnum);
+extern bool _bt_checkkeys_skip(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
+							   ScanDirection dir, bool *continuescan, int *prefixskipindex);
 
 /*
  * prototypes for functions in nbtutils.c
  */
-extern BTScanInsert _bt_mkscankey(Relation rel, IndexTuple itup);
 extern void _bt_freestack(BTStack stack);
 extern void _bt_preprocess_array_keys(IndexScanDesc scan);
 extern void _bt_start_array_keys(IndexScanDesc scan, ScanDirection dir);
@@ -1118,7 +1237,7 @@ extern void _bt_mark_array_keys(IndexScanDesc scan);
 extern void _bt_restore_array_keys(IndexScanDesc scan);
 extern void _bt_preprocess_keys(IndexScanDesc scan);
 extern bool _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple,
-						  int tupnatts, ScanDirection dir, bool *continuescan);
+						  int tupnatts, ScanDirection dir, bool *continuescan, int *indexSkipPrefix);
 extern void _bt_killitems(IndexScanDesc scan);
 extern BTCycleId _bt_vacuum_cycleid(Relation rel);
 extern BTCycleId _bt_start_vacuum(Relation rel);
@@ -1140,6 +1259,19 @@ extern bool _bt_check_natts(Relation rel, bool heapkeyspace, Page page,
 extern void _bt_check_third_page(Relation rel, Relation heap,
 								 bool needheaptidspace, Page page, IndexTuple newtup);
 extern bool _bt_allequalimage(Relation rel, bool debugmessage);
+extern bool _bt_checkkeys_threeway(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
+				ScanDirection dir, bool *continuescan, int *prefixSkipIndex);
+extern bool _bt_create_insertion_scan_key(Relation	rel, ScanDirection dir,
+				ScanKey* startKeys, int keysCount,
+				BTScanInsert inskey, StrategyNumber* stratTotal,
+				bool* goback);
+extern void _bt_set_bsearch_flags(StrategyNumber stratTotal, ScanDirection dir,
+		bool* nextkey, bool* goback);
+extern int _bt_choose_scan_keys(ScanKey scanKeys, int numberOfKeys, ScanDirection dir,
+ScanKey* startKeys, ScanKeyData* notnullkeys,
+  StrategyNumber* stratTotal, int prefix);
+extern BTScanInsert _bt_mkscankey(Relation rel, IndexTuple itup, BTScanInsert key);
+extern void print_itup(BlockNumber blk, IndexTuple left, IndexTuple right, Relation rel, char *extra);
 
 /*
  * prototypes for functions in nbtvalidate.c
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 94890512dc..897b445884 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -429,9 +429,13 @@ extern Datum ExecMakeFunctionResultSet(SetExprState *fcache,
  */
 typedef TupleTableSlot *(*ExecScanAccessMtd) (ScanState *node);
 typedef bool (*ExecScanRecheckMtd) (ScanState *node, TupleTableSlot *slot);
+typedef bool (*ExecScanSkipMtd) (ScanState *node);
 
 extern TupleTableSlot *ExecScan(ScanState *node, ExecScanAccessMtd accessMtd,
 								ExecScanRecheckMtd recheckMtd);
+extern TupleTableSlot *ExecScanExtended(ScanState *node, ExecScanAccessMtd accessMtd,
+								ExecScanRecheckMtd recheckMtd,
+								ExecScanSkipMtd skipMtd);
 extern void ExecAssignScanProjectionInfo(ScanState *node);
 extern void ExecAssignScanProjectionInfoWithVarno(ScanState *node, Index varno);
 extern void ExecScanReScan(ScanState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3d27d50f09..03e5060765 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1331,6 +1331,7 @@ typedef struct ScanState
 	Relation	ss_currentRelation;
 	struct TableScanDescData *ss_currentScanDesc;
 	TupleTableSlot *ss_ScanTupleSlot;
+	bool ss_FirstTupleEmitted;
 } ScanState;
 
 /* ----------------
@@ -1427,6 +1428,8 @@ typedef struct IndexScanState
 	ExprContext *iss_RuntimeContext;
 	Relation	iss_RelationDesc;
 	struct IndexScanDescData *iss_ScanDesc;
+	int			iss_SkipPrefixSize;
+	bool		iss_Distinct;
 
 	/* These are needed for re-checking ORDER BY expr ordering */
 	pairingheap *iss_ReorderQueue;
@@ -1456,6 +1459,8 @@ typedef struct IndexScanState
  *		TableSlot		   slot for holding tuples fetched from the table
  *		VMBuffer		   buffer in use for visibility map testing, if any
  *		PscanLen		   size of parallel index-only scan descriptor
+ *		SkipPrefixSize	   number of keys for skip-based DISTINCT
+ *		FirstTupleEmitted  has the first tuple been emitted
  * ----------------
  */
 typedef struct IndexOnlyScanState
@@ -1474,6 +1479,8 @@ typedef struct IndexOnlyScanState
 	struct IndexScanDescData *ioss_ScanDesc;
 	TupleTableSlot *ioss_TableSlot;
 	Buffer		ioss_VMBuffer;
+	int			ioss_SkipPrefixSize;
+	bool		ioss_Distinct;
 	Size		ioss_PscanLen;
 } IndexOnlyScanState;
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d4816c180d..86dcd057ed 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -839,6 +839,7 @@ struct IndexOptInfo
 	bool		amsearchnulls;	/* can AM search for NULL/NOT NULL entries? */
 	bool		amhasgettuple;	/* does AM have amgettuple interface? */
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
+	bool		amcanskip;		/* can AM skip duplicate values? */
 	bool		amcanparallel;	/* does AM support parallel scan? */
 	/* Rather than include amapi.h here, we declare amcostestimate like this */
 	void		(*amcostestimate) ();	/* AM's cost estimator */
@@ -1189,6 +1190,9 @@ typedef struct Path
  * we need not recompute them when considering using the same index in a
  * bitmap index/heap scan (see BitmapHeapPath).  The costs of the IndexPath
  * itself represent the costs of an IndexScan or IndexOnlyScan plan type.
+ *
+ * 'indexskipprefix' represents the number of columns to consider for skip
+ * scans.
  *----------
  */
 typedef struct IndexPath
@@ -1201,6 +1205,8 @@ typedef struct IndexPath
 	ScanDirection indexscandir;
 	Cost		indextotalcost;
 	Selectivity indexselectivity;
+	int			indexskipprefix;
+	bool		indexdistinct;
 } IndexPath;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 4869fe7b6d..49f4de3843 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -409,6 +409,8 @@ typedef struct IndexScan
 	List	   *indexorderbyorig;	/* the same in original form */
 	List	   *indexorderbyops;	/* OIDs of sort ops for ORDER BY exprs */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for skip scans */
+	bool		indexdistinct; /* whether only distinct keys are requested */
 } IndexScan;
 
 /* ----------------
@@ -436,6 +438,8 @@ typedef struct IndexOnlyScan
 	List	   *indexorderby;	/* list of index ORDER BY exprs */
 	List	   *indextlist;		/* TargetEntry list describing index's cols */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for skip scans */
+	bool		indexdistinct; /* whether only distinct keys are requested */
 } IndexOnlyScan;
 
 /* ----------------
@@ -462,6 +466,7 @@ typedef struct BitmapIndexScan
 	bool		isshared;		/* Create shared bitmap if set */
 	List	   *indexqual;		/* list of index quals (OpExprs) */
 	List	   *indexqualorig;	/* the same in original form */
+	int			indexskipprefixsize;	/* the size of the prefix for skip scans */
 } BitmapIndexScan;
 
 /* ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 735ba09650..923eecf5f0 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -50,6 +50,7 @@ extern PGDLLIMPORT int max_parallel_workers_per_gather;
 extern PGDLLIMPORT bool enable_seqscan;
 extern PGDLLIMPORT bool enable_indexscan;
 extern PGDLLIMPORT bool enable_indexonlyscan;
+extern PGDLLIMPORT bool enable_indexskipscan;
 extern PGDLLIMPORT bool enable_bitmapscan;
 extern PGDLLIMPORT bool enable_tidscan;
 extern PGDLLIMPORT bool enable_sort;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index f75ff6f323..6c8c9dadbb 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -201,6 +201,11 @@ extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
 												 Path *subpath,
 												 int numCols,
 												 double numGroups);
+extern IndexPath *create_skipscan_unique_path(PlannerInfo *root,
+											  RelOptInfo *rel,
+											  Path *subpath,
+											  int numCols,
+											  double numGroups);
 extern AggPath *create_agg_path(PlannerInfo *root,
 								RelOptInfo *rel,
 								Path *subpath,
diff --git a/src/interfaces/libpq/encnames.c b/src/interfaces/libpq/encnames.c
new file mode 120000
index 0000000000..ca78618b55
--- /dev/null
+++ b/src/interfaces/libpq/encnames.c
@@ -0,0 +1 @@
+../../../src/backend/utils/mb/encnames.c
\ No newline at end of file
diff --git a/src/interfaces/libpq/wchar.c b/src/interfaces/libpq/wchar.c
new file mode 120000
index 0000000000..a27508f72a
--- /dev/null
+++ b/src/interfaces/libpq/wchar.c
@@ -0,0 +1 @@
+../../../src/backend/utils/mb/wchar.c
\ No newline at end of file
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index 11c6f50fbf..e21afa7990 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -306,3 +306,604 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+ANALYZE distinct_a;
+SELECT DISTINCT a FROM distinct_a;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+ a 
+---
+ 1
+(1 row)
+
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Index Only Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: Distinct only
+(2 rows)
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: Distinct only
+   Index Cond: (b = 2)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: Distinct only
+   Index Cond: (b = 2)
+(3 rows)
+
+DROP INDEX distinct_a_b_a;
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+FETCH FROM c;
+ a | b 
+---+---
+ 1 | 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+END;
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+FETCH FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+END;
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Index Only Scan using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: Distinct only
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 1 | 2
+ 3 | 1 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 1 | 2
+ 1 | 1 | 2
+(2 rows)
+
+END;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Index Only Scan Backward using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: Distinct only
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 2 | 2
+ 1 | 2 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 2 | 2
+ 3 | 2 | 2
+(2 rows)
+
+END;
+DROP TABLE distinct_abc;
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+ 2 | 1 | 10
+ 3 | 1 | 10
+ 4 | 1 | 10
+ 5 | 1 | 10
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Index Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: Distinct only
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Unique
+   ->  Bitmap Heap Scan on distinct_a
+         Recheck Cond: (a = 1)
+         ->  Bitmap Index Scan on distinct_a_a_b_idx
+               Index Cond: (a = 1)
+(5 rows)
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Index Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: Distinct only
+   Index Cond: (b = 2)
+   Filter: (c = 10)
+(4 rows)
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+ a | a 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+ 4 | 4
+ 5 | 5
+(5 rows)
+
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+ a | ?column? 
+---+----------
+ 1 |        1
+ 2 |        1
+ 3 |        1
+ 4 |        1
+ 5 |        1
+(5 rows)
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+FETCH FROM c;
+ a 
+---
+ 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a 
+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+END;
+DROP TABLE distinct_a;
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 |  9999
+ 1 | 10000
+(5 rows)
+
+DROP TABLE distinct_visibility;
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Index Only Scan using distinct_boundaries_a_b_c_idx on distinct_boundaries
+   Skip scan: Distinct only
+   Index Cond: ((b >= 1) AND (c = 0))
+(3 rows)
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+ a | b | c 
+---+---+---
+ 1 | 2 | 0
+ 2 | 2 | 0
+ 3 | 2 | 0
+ 4 | 2 | 0
+ 5 | 2 | 0
+(5 rows)
+
+DROP TABLE distinct_boundaries;
+-- test tuple killing
+-- DESC ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed where a = 3;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 5 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 1 | 1000 | 0 | 10
+(4 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 1 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 5 | 1000 | 0 | 10
+(4 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
+-- regular ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed where a = 3;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a, b;
+    FETCH FORWARD ALL FROM c;
+ a | b | c | d  
+---+---+---+----
+ 1 | 1 | 1 | 10
+ 2 | 1 | 1 | 10
+ 4 | 1 | 1 | 10
+ 5 | 1 | 1 | 10
+(4 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a | b | c | d  
+---+---+---+----
+ 5 | 1 | 1 | 10
+ 4 | 1 | 1 | 10
+ 2 | 1 | 1 | 10
+ 1 | 1 | 1 | 10
+(4 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
+-- partial delete
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed WHERE a = 3 AND b <= 999;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 5 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 3 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 1 | 1000 | 0 | 10
+(5 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 1 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 3 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 5 | 1000 | 0 | 10
+(5 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 715842b87a..7e16655f03 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -80,6 +80,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashjoin                | on
  enable_indexonlyscan           | on
  enable_indexscan               | on
+ enable_indexskipscan           | on
  enable_material                | on
  enable_mergejoin               | on
  enable_nestloop                | on
@@ -91,7 +92,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(19 rows)
+(20 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index 33102744eb..0227c98823 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -135,3 +135,251 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+ANALYZE distinct_a;
+
+SELECT DISTINCT a FROM distinct_a;
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+DROP INDEX distinct_a_b_a;
+
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+DROP TABLE distinct_abc;
+
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+DROP TABLE distinct_a;
+
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DROP TABLE distinct_visibility;
+
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+DROP TABLE distinct_boundaries;
+
+-- test tuple killing
+
+-- DESC ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed where a = 3;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
+
+-- regular ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed where a = 3;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a, b;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
+
+-- partial delete
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed WHERE a = 3 AND b <= 999;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
-- 
2.25.0

v33-0003-Make-planner-favor-skip-in-index-scans-and-modify-te.patchapplication/octet-stream; name=v33-0003-Make-planner-favor-skip-in-index-scans-and-modify-te.patchDownload

From a0e154fb3b2357f44bec05471878aab8fb0b710a Mon Sep 17 00:00:00 2001
From: Floris van Nee <floris.vannee@gmail.com>
Date: Thu, 19 Mar 2020 10:27:47 +0100
Subject: [PATCH 3/3] Make planner favor skip in index scans and modify test
 ouputs

This commit hacks the planner to greatly favor skip scans over regular index scans.
To be used for testing purposes until proper planner implementation is in place.
It also modifies all expected results of the unit tests
to add the skip scan attribute in EXPLAIN output.
---
 .../postgres_fdw/expected/postgres_fdw.out    |  47 ++--
 src/backend/optimizer/util/pathnode.c         |   7 +
 src/backend/utils/adt/selfuncs.c              |  13 +-
 .../expected/drop-index-concurrently-1.out    |   1 +
 .../isolation/expected/eval-plan-qual.out     |   2 +
 src/test/regress/expected/aggregates.out      |  59 +++--
 src/test/regress/expected/btree_index.out     |  24 +-
 src/test/regress/expected/cluster.out         |  18 +-
 src/test/regress/expected/create_index.out    |  43 +++-
 src/test/regress/expected/equivclass.out      |  72 ++++--
 src/test/regress/expected/fast_default.out    |   3 +-
 src/test/regress/expected/foreign_key.out     |   4 +-
 src/test/regress/expected/generated.out       |   9 +-
 src/test/regress/expected/groupingsets.out    |   6 +-
 src/test/regress/expected/index_including.out |   6 +-
 src/test/regress/expected/inet.out            |  12 +-
 src/test/regress/expected/inherit.out         | 149 ++++++++---
 src/test/regress/expected/insert_conflict.out |   3 +-
 src/test/regress/expected/interval.out        |   3 +-
 src/test/regress/expected/join.out            | 235 ++++++++++++------
 src/test/regress/expected/limit.out           |   9 +-
 src/test/regress/expected/misc_functions.out  |   6 +-
 src/test/regress/expected/partition_join.out  |  37 ++-
 src/test/regress/expected/partition_prune.out | 151 +++++++++--
 src/test/regress/expected/plancache.out       |   6 +-
 src/test/regress/expected/portals.out         |   3 +-
 src/test/regress/expected/privileges.out      |  15 +-
 src/test/regress/expected/regex.out           |  21 +-
 src/test/regress/expected/rowsecurity.out     |  30 ++-
 src/test/regress/expected/rowtypes.out        |  18 +-
 src/test/regress/expected/select.out          |  29 ++-
 src/test/regress/expected/select_distinct.out |  15 +-
 src/test/regress/expected/select_parallel.out |   9 +-
 src/test/regress/expected/subselect.out       |  10 +-
 src/test/regress/expected/tuplesort.out       |  12 +-
 src/test/regress/expected/union.out           |  36 ++-
 src/test/regress/expected/updatable_views.out |  56 +++--
 37 files changed, 875 insertions(+), 304 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 62c2697920..2db85952ec 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -444,7 +444,8 @@ EXPLAIN (VERBOSE, COSTS OFF)
                Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC NULLS LAST
          ->  Index Only Scan using t1_pkey on "S 1"."T 1" t2
                Output: t2."C 1"
-(11 rows)
+               Skip scan: All
+(12 rows)
 
 SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
  c1  | C 1 
@@ -478,7 +479,8 @@ EXPLAIN (VERBOSE, COSTS OFF)
                Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC NULLS LAST
          ->  Index Only Scan using t1_pkey on "S 1"."T 1" t2
                Output: t2."C 1"
-(11 rows)
+               Skip scan: All
+(12 rows)
 
 SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
  c1  | C 1 
@@ -513,7 +515,8 @@ EXPLAIN (VERBOSE, COSTS OFF)
                Remote SQL: SELECT r3."C 1" FROM ("S 1"."T 1" r2 INNER JOIN "S 1"."T 1" r3 ON (((r2."C 1" = r3."C 1")))) ORDER BY r2."C 1" ASC NULLS LAST
          ->  Index Only Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1"
-(12 rows)
+               Skip scan: All
+(13 rows)
 
 SELECT t1."C 1" FROM "S 1"."T 1" t1 left join ft1 t2 join ft2 t3 on (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;
  C 1 
@@ -549,7 +552,8 @@ EXPLAIN (VERBOSE, COSTS OFF)
                Remote SQL: SELECT r3."C 1", r2."C 1" FROM ("S 1"."T 1" r3 LEFT JOIN "S 1"."T 1" r2 ON (((r2."C 1" = r3."C 1")))) ORDER BY r3."C 1" ASC NULLS LAST
          ->  Index Only Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1"
-(12 rows)
+               Skip scan: All
+(13 rows)
 
 SELECT t1."C 1", t2.c1, t3.c1 FROM "S 1"."T 1" t1 left join ft1 t2 full join ft2 t3 on (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;
  C 1 | c1  | c1  
@@ -583,7 +587,8 @@ EXPLAIN (VERBOSE, COSTS OFF)
                Remote SQL: SELECT r2."C 1", r3."C 1" FROM ("S 1"."T 1" r2 FULL JOIN "S 1"."T 1" r3 ON (((r2."C 1" = r3."C 1")))) ORDER BY r3."C 1" ASC NULLS LAST
          ->  Index Only Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1"
-(12 rows)
+               Skip scan: All
+(13 rows)
 
 SELECT t1."C 1", t2.c1, t3.c1 FROM "S 1"."T 1" t1 full join ft1 t2 full join ft2 t3 on (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;
  C 1 | c1  | c1  
@@ -711,11 +716,12 @@ EXPLAIN (VERBOSE, COSTS OFF)
    Output: a."C 1", a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
    ->  Index Scan using t1_pkey on "S 1"."T 1" a
          Output: a."C 1", a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8
+         Skip scan: All
          Index Cond: (a."C 1" = 47)
    ->  Foreign Scan on public.ft2 b
          Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
          Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
-(8 rows)
+(9 rows)
 
 SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
  c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
@@ -2126,6 +2132,7 @@ SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM
          Output: t1."C 1"
          ->  Index Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1", t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
+               Skip scan: All
          ->  HashAggregate
                Output: t2.c1, t3.c1
                Group Key: t2.c1, t3.c1
@@ -2133,7 +2140,7 @@ SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM
                      Output: t2.c1, t3.c1
                      Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
                      Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer))))
-(13 rows)
+(14 rows)
 
 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND t2.c2 = t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10;
  C 1 
@@ -2288,7 +2295,8 @@ SELECT * FROM ft1, ft2, ft4, ft5, local_tbl WHERE ft1.c1 = ft2.c1 AND ft1.c2 = f
                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4" FOR UPDATE
          ->  Index Scan using local_tbl_pkey on public.local_tbl
                Output: local_tbl.c1, local_tbl.c2, local_tbl.c3, local_tbl.ctid
-(47 rows)
+               Skip scan: All
+(48 rows)
 
 SELECT * FROM ft1, ft2, ft4, ft5, local_tbl WHERE ft1.c1 = ft2.c1 AND ft1.c2 = ft4.c1
     AND ft1.c2 = ft5.c1 AND ft1.c2 = local_tbl.c1 AND ft1.c1 < 100 AND ft2.c1 < 100 FOR UPDATE;
@@ -3332,6 +3340,7 @@ select c2, sum from "S 1"."T 1" t1, lateral (select sum(t2.c1 + t1."C 1") sum fr
          Output: t1.c2, qry.sum
          ->  Index Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1", t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
+               Skip scan: All
                Index Cond: (t1."C 1" < 100)
                Filter: (t1.c2 < 3)
          ->  Subquery Scan on qry
@@ -3341,7 +3350,7 @@ select c2, sum from "S 1"."T 1" t1, lateral (select sum(t2.c1 + t1."C 1") sum fr
                      Output: (sum((t2.c1 + t1."C 1"))), t2.c1
                      Relations: Aggregate on (public.ft2 t2)
                      Remote SQL: SELECT sum(("C 1" + $1::integer)), "C 1" FROM "S 1"."T 1" GROUP BY 2
-(16 rows)
+(17 rows)
 
 select c2, sum from "S 1"."T 1" t1, lateral (select sum(t2.c1 + t1."C 1") sum from ft2 t2 group by t2.c1) qry where t1.c2 * 2 = qry.sum and t1.c2 < 3 and t1."C 1" < 100 order by 1;
  c2 | sum 
@@ -3372,6 +3381,7 @@ ORDER BY ref_0."C 1";
          Output: ref_0.c2, ref_0."C 1", ref_1.c3, (ref_0.c2)
          ->  Index Scan using t1_pkey on "S 1"."T 1" ref_0
                Output: ref_0."C 1", ref_0.c2, ref_0.c3, ref_0.c4, ref_0.c5, ref_0.c6, ref_0.c7, ref_0.c8
+               Skip scan: All
                Index Cond: (ref_0."C 1" < 10)
          ->  Foreign Scan on public.ft1 ref_1
                Output: ref_1.c3, ref_0.c2
@@ -3381,7 +3391,7 @@ ORDER BY ref_0."C 1";
          ->  Foreign Scan on public.ft2 ref_3
                Output: ref_3.c3
                Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE ((c3 = '00001'::text))
-(15 rows)
+(16 rows)
 
 SELECT ref_0.c2, subq_1.*
 FROM
@@ -4149,11 +4159,12 @@ explain (verbose, costs off) select * from ft3 f, loct3 l
    Output: f.f1, f.f2, f.f3, l.f1, l.f2, l.f3
    ->  Index Scan using loct3_f1_key on public.loct3 l
          Output: l.f1, l.f2, l.f3
+         Skip scan: All
          Index Cond: (l.f1 = 'foo'::text)
    ->  Foreign Scan on public.ft3 f
          Output: f.f1, f.f2, f.f3
          Remote SQL: SELECT f1, f2, f3 FROM public.loct3 WHERE (($1::character varying(10) = f3))
-(8 rows)
+(9 rows)
 
 -- can't be sent to remote
 explain (verbose, costs off) select * from ft3 where f1 COLLATE "POSIX" = 'foo';
@@ -4207,8 +4218,9 @@ explain (verbose, costs off) select * from ft3 f, loct3 l
          Output: l.f1, l.f2, l.f3
          ->  Index Scan using loct3_f1_key on public.loct3 l
                Output: l.f1, l.f2, l.f3
+               Skip scan: All
                Index Cond: (l.f1 = 'foo'::text)
-(12 rows)
+(13 rows)
 
 -- ===================================================================
 -- test writable foreign table stuff
@@ -7361,12 +7373,14 @@ explain (verbose, costs off)
                      Sort Key: foo.f1
                      ->  Index Scan using i_foo_f1 on public.foo foo_1
                            Output: foo_1.f1, foo_1.f2
+                           Skip scan: All
                      ->  Foreign Scan on public.foo2 foo_2
                            Output: foo_2.f1, foo_2.f2
                            Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f1 ASC NULLS LAST
                ->  Index Only Scan using i_loct1_f1 on public.loct1
                      Output: loct1.f1
-(17 rows)
+                     Skip scan: All
+(19 rows)
 
 select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
  f1 | f1 
@@ -7401,12 +7415,14 @@ explain (verbose, costs off)
                      Sort Key: foo.f1
                      ->  Index Scan using i_foo_f1 on public.foo foo_1
                            Output: foo_1.f1, foo_1.f2
+                           Skip scan: All
                      ->  Foreign Scan on public.foo2 foo_2
                            Output: foo_2.f1, foo_2.f2
                            Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f1 ASC NULLS LAST
                ->  Index Only Scan using i_loct1_f1 on public.loct1
                      Output: loct1.f1
-(17 rows)
+                     Skip scan: All
+(19 rows)
 
 select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
  f1 | f1 
@@ -7447,10 +7463,11 @@ delete from foo where f1 < 5 returning *;
    Foreign Delete on public.foo2 foo_1
    ->  Index Scan using i_foo_f1 on public.foo
          Output: foo.ctid
+         Skip scan: All
          Index Cond: (foo.f1 < 5)
    ->  Foreign Delete on public.foo2 foo_1
          Remote SQL: DELETE FROM public.loct1 WHERE ((f1 < 5)) RETURNING f1, f2
-(9 rows)
+(10 rows)
 
 delete from foo where f1 < 5 returning *;
  f1 | f2 
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 87d39570b5..865c4af8df 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1029,6 +1029,13 @@ create_index_path(PlannerInfo *root,
 	pathnode->indexorderbycols = indexorderbycols;
 	pathnode->indexscandir = indexscandir;
 
+	/* @todo this is just for testing purposes.
+	 * we need a better selection mechanism for when to
+	 * use skip scan and when to use regular index scan
+	 */
+	if (!partial_path && index->amcanskip && enable_indexskipscan)
+		pathnode->indexskipprefix = 10;
+
 	cost_index(pathnode, root, loop_count, partial_path);
 
 	return pathnode;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 8339f4cd7a..91ae33d74c 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5929,13 +5929,22 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 		if (indexcol != iclause->indexcol)
 		{
+			/* @todo this estimate is wrong but use it for
+				now for testing purposes. it forces index skip scan to
+				be used as often as possible.
+			*/
 			/* Beginning of a new column's quals */
-			if (!eqQualHere)
+			if (!eqQualHere && path->indexskipprefix == 0)
 				break;			/* done if no '=' qual for indexcol */
 			eqQualHere = false;
 			indexcol++;
 			if (indexcol != iclause->indexcol)
-				break;			/* no quals at all for indexcol */
+			{
+				if (path->indexskipprefix == 0)
+					break;			/* no quals at all for indexcol */
+				else
+					continue;
+			}
 		}
 
 		/* Examine each indexqual associated with this index clause */
diff --git a/src/test/isolation/expected/drop-index-concurrently-1.out b/src/test/isolation/expected/drop-index-concurrently-1.out
index 75dff56bc4..448256db95 100644
--- a/src/test/isolation/expected/drop-index-concurrently-1.out
+++ b/src/test/isolation/expected/drop-index-concurrently-1.out
@@ -15,6 +15,7 @@ QUERY PLAN
 Sort           
   Sort Key: id 
   ->  Index Scan using test_dc_data on test_dc
+        Skip scan: All
         Index Cond: (data = 34)
 step explains: EXPLAIN (COSTS OFF) EXECUTE getrow_seq;
 QUERY PLAN     
diff --git a/src/test/isolation/expected/eval-plan-qual.out b/src/test/isolation/expected/eval-plan-qual.out
index 3e55a55c63..751988faff 100644
--- a/src/test/isolation/expected/eval-plan-qual.out
+++ b/src/test/isolation/expected/eval-plan-qual.out
@@ -837,7 +837,9 @@ LockRows
   ->  Merge Join
         Merge Cond: (a.id = b.id)
         ->  Index Scan using jointest_id_idx on jointest a
+              Skip scan: All
         ->  Index Scan using jointest_id_idx on jointest b
+              Skip scan: All
 id             data           id             data           
 
 1              0              1              0              
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 3259a22516..2194f009fd 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -720,8 +720,9 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan using tenk1_unique1 on tenk1
+                 Skip scan: All
                  Index Cond: (unique1 IS NOT NULL)
-(5 rows)
+(6 rows)
 
 select min(unique1) from tenk1;
  min 
@@ -737,8 +738,9 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Skip scan: All
                  Index Cond: (unique1 IS NOT NULL)
-(5 rows)
+(6 rows)
 
 select max(unique1) from tenk1;
  max  
@@ -754,8 +756,9 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Skip scan: All
                  Index Cond: ((unique1 IS NOT NULL) AND (unique1 < 42))
-(5 rows)
+(6 rows)
 
 select max(unique1) from tenk1 where unique1 < 42;
  max 
@@ -771,8 +774,9 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Skip scan: All
                  Index Cond: ((unique1 IS NOT NULL) AND (unique1 > 42))
-(5 rows)
+(6 rows)
 
 select max(unique1) from tenk1 where unique1 > 42;
  max  
@@ -794,8 +798,9 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Skip scan: All
                  Index Cond: ((unique1 IS NOT NULL) AND (unique1 > 42000))
-(5 rows)
+(6 rows)
 
 select max(unique1) from tenk1 where unique1 > 42000;
  max 
@@ -813,8 +818,9 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_thous_tenthous on tenk1
+                 Skip scan: All
                  Index Cond: ((thousand = 33) AND (tenthous IS NOT NULL))
-(5 rows)
+(6 rows)
 
 select max(tenthous) from tenk1 where thousand = 33;
  max  
@@ -830,8 +836,9 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+                 Skip scan: All
                  Index Cond: ((thousand = 33) AND (tenthous IS NOT NULL))
-(5 rows)
+(6 rows)
 
 select min(tenthous) from tenk1 where thousand = 33;
  min 
@@ -851,8 +858,9 @@ explain (costs off)
            InitPlan 1 (returns $1)
              ->  Limit
                    ->  Index Only Scan using tenk1_unique1 on tenk1
+                         Skip scan: All
                          Index Cond: ((unique1 IS NOT NULL) AND (unique1 > int4_tbl.f1))
-(7 rows)
+(8 rows)
 
 select f1, (select min(unique1) from tenk1 where unique1 > f1) AS gt
   from int4_tbl;
@@ -875,9 +883,10 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
+                 Skip scan: All
                  Index Cond: (unique2 IS NOT NULL)
    ->  Result
-(7 rows)
+(8 rows)
 
 select distinct max(unique2) from tenk1;
  max  
@@ -894,9 +903,10 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
+                 Skip scan: All
                  Index Cond: (unique2 IS NOT NULL)
    ->  Result
-(7 rows)
+(8 rows)
 
 select max(unique2) from tenk1 order by 1;
  max  
@@ -913,9 +923,10 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
+                 Skip scan: All
                  Index Cond: (unique2 IS NOT NULL)
    ->  Result
-(7 rows)
+(8 rows)
 
 select max(unique2) from tenk1 order by max(unique2);
  max  
@@ -932,9 +943,10 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
+                 Skip scan: All
                  Index Cond: (unique2 IS NOT NULL)
    ->  Result
-(7 rows)
+(8 rows)
 
 select max(unique2) from tenk1 order by max(unique2)+1;
  max  
@@ -951,10 +963,11 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
+                 Skip scan: All
                  Index Cond: (unique2 IS NOT NULL)
    ->  ProjectSet
          ->  Result
-(8 rows)
+(9 rows)
 
 select max(unique2), generate_series(1,3) as g from tenk1 order by g desc;
  max  | g 
@@ -1006,24 +1019,32 @@ explain (costs off)
            ->  Merge Append
                  Sort Key: minmaxtest.f1
                  ->  Index Only Scan using minmaxtesti on minmaxtest minmaxtest_1
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan using minmaxtest1i on minmaxtest1 minmaxtest_2
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest2i on minmaxtest2 minmaxtest_3
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan using minmaxtest3i on minmaxtest3 minmaxtest_4
+                       Skip scan: All
    InitPlan 2 (returns $1)
      ->  Limit
            ->  Merge Append
                  Sort Key: minmaxtest_5.f1 DESC
                  ->  Index Only Scan Backward using minmaxtesti on minmaxtest minmaxtest_6
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest1i on minmaxtest1 minmaxtest_7
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest_8
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest_9
-(23 rows)
+                       Skip scan: All
+(31 rows)
 
 select min(f1), max(f1) from minmaxtest;
  min | max 
@@ -1042,27 +1063,35 @@ explain (costs off)
            ->  Merge Append
                  Sort Key: minmaxtest.f1
                  ->  Index Only Scan using minmaxtesti on minmaxtest minmaxtest_1
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan using minmaxtest1i on minmaxtest1 minmaxtest_2
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest2i on minmaxtest2 minmaxtest_3
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan using minmaxtest3i on minmaxtest3 minmaxtest_4
+                       Skip scan: All
    InitPlan 2 (returns $1)
      ->  Limit
            ->  Merge Append
                  Sort Key: minmaxtest_5.f1 DESC
                  ->  Index Only Scan Backward using minmaxtesti on minmaxtest minmaxtest_6
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest1i on minmaxtest1 minmaxtest_7
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest_8
+                       Skip scan: All
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest_9
+                       Skip scan: All
    ->  Sort
          Sort Key: ($0), ($1)
          ->  Result
-(26 rows)
+(34 rows)
 
 select distinct min(f1), max(f1) from minmaxtest;
  min | max 
diff --git a/src/test/regress/expected/btree_index.out b/src/test/regress/expected/btree_index.out
index 1646deb092..7e09e1df8c 100644
--- a/src/test/regress/expected/btree_index.out
+++ b/src/test/regress/expected/btree_index.out
@@ -110,9 +110,10 @@ select proname from pg_proc where proname like E'RI\\_FKey%del' order by 1;
                                   QUERY PLAN                                  
 ------------------------------------------------------------------------------
  Index Only Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: ((proname >= 'RI_FKey'::text) AND (proname < 'RI_FKez'::text))
    Filter: (proname ~~ 'RI\_FKey%del'::text)
-(3 rows)
+(4 rows)
 
 select proname from pg_proc where proname like E'RI\\_FKey%del' order by 1;
         proname         
@@ -129,9 +130,10 @@ select proname from pg_proc where proname ilike '00%foo' order by 1;
                              QUERY PLAN                             
 --------------------------------------------------------------------
  Index Only Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: ((proname >= '00'::text) AND (proname < '01'::text))
    Filter: (proname ~~* '00%foo'::text)
-(3 rows)
+(4 rows)
 
 select proname from pg_proc where proname ilike '00%foo' order by 1;
  proname 
@@ -143,8 +145,9 @@ select proname from pg_proc where proname ilike 'ri%foo' order by 1;
                            QUERY PLAN                            
 -----------------------------------------------------------------
  Index Only Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Filter: (proname ~~* 'ri%foo'::text)
-(2 rows)
+(3 rows)
 
 set enable_indexscan to false;
 set enable_bitmapscan to true;
@@ -157,8 +160,9 @@ select proname from pg_proc where proname like E'RI\\_FKey%del' order by 1;
    ->  Bitmap Heap Scan on pg_proc
          Filter: (proname ~~ 'RI\_FKey%del'::text)
          ->  Bitmap Index Scan on pg_proc_proname_args_nsp_index
+               Skip scan: All
                Index Cond: ((proname >= 'RI_FKey'::text) AND (proname < 'RI_FKez'::text))
-(6 rows)
+(7 rows)
 
 select proname from pg_proc where proname like E'RI\\_FKey%del' order by 1;
         proname         
@@ -179,8 +183,9 @@ select proname from pg_proc where proname ilike '00%foo' order by 1;
    ->  Bitmap Heap Scan on pg_proc
          Filter: (proname ~~* '00%foo'::text)
          ->  Bitmap Index Scan on pg_proc_proname_args_nsp_index
+               Skip scan: All
                Index Cond: ((proname >= '00'::text) AND (proname < '01'::text))
-(6 rows)
+(7 rows)
 
 select proname from pg_proc where proname ilike '00%foo' order by 1;
  proname 
@@ -192,8 +197,9 @@ select proname from pg_proc where proname ilike 'ri%foo' order by 1;
                            QUERY PLAN                            
 -----------------------------------------------------------------
  Index Only Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Filter: (proname ~~* 'ri%foo'::text)
-(2 rows)
+(3 rows)
 
 reset enable_seqscan;
 reset enable_indexscan;
@@ -240,8 +246,9 @@ select * from btree_bpchar where f1::bpchar like 'foo';
  Bitmap Heap Scan on btree_bpchar
    Filter: ((f1)::bpchar ~~ 'foo'::text)
    ->  Bitmap Index Scan on btree_bpchar_f1_idx
+         Skip scan: All
          Index Cond: ((f1)::bpchar = 'foo'::bpchar)
-(4 rows)
+(5 rows)
 
 select * from btree_bpchar where f1::bpchar like 'foo';
  f1  
@@ -256,8 +263,9 @@ select * from btree_bpchar where f1::bpchar like 'foo%';
  Bitmap Heap Scan on btree_bpchar
    Filter: ((f1)::bpchar ~~ 'foo%'::text)
    ->  Bitmap Index Scan on btree_bpchar_f1_idx
+         Skip scan: All
          Index Cond: (((f1)::bpchar >= 'foo'::bpchar) AND ((f1)::bpchar < 'fop'::bpchar))
-(4 rows)
+(5 rows)
 
 select * from btree_bpchar where f1::bpchar like 'foo%';
   f1  
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index bdae8fe00c..e7eeebf7e1 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -478,8 +478,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE upper(b) = 'PREFIX3';
                           QUERY PLAN                           
 ---------------------------------------------------------------
  Index Scan using clstr_expression_upper_b on clstr_expression
+   Skip scan: All
    Index Cond: (upper(b) = 'PREFIX3'::text)
-(2 rows)
+(3 rows)
 
 SELECT * FROM clstr_expression WHERE upper(b) = 'PREFIX3';
  id | a |    b    
@@ -491,8 +492,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
                           QUERY PLAN                           
 ---------------------------------------------------------------
  Index Scan using clstr_expression_minus_a on clstr_expression
+   Skip scan: All
    Index Cond: ((- a) = '-3'::integer)
-(2 rows)
+(3 rows)
 
 SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
  id  | a |     b     
@@ -512,8 +514,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE upper(b) = 'PREFIX3';
                           QUERY PLAN                           
 ---------------------------------------------------------------
  Index Scan using clstr_expression_upper_b on clstr_expression
+   Skip scan: All
    Index Cond: (upper(b) = 'PREFIX3'::text)
-(2 rows)
+(3 rows)
 
 SELECT * FROM clstr_expression WHERE upper(b) = 'PREFIX3';
  id | a |    b    
@@ -525,8 +528,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
                           QUERY PLAN                           
 ---------------------------------------------------------------
  Index Scan using clstr_expression_minus_a on clstr_expression
+   Skip scan: All
    Index Cond: ((- a) = '-3'::integer)
-(2 rows)
+(3 rows)
 
 SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
  id  | a |     b     
@@ -546,8 +550,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE upper(b) = 'PREFIX3';
                           QUERY PLAN                           
 ---------------------------------------------------------------
  Index Scan using clstr_expression_upper_b on clstr_expression
+   Skip scan: All
    Index Cond: (upper(b) = 'PREFIX3'::text)
-(2 rows)
+(3 rows)
 
 SELECT * FROM clstr_expression WHERE upper(b) = 'PREFIX3';
  id | a |    b    
@@ -559,8 +564,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
                           QUERY PLAN                           
 ---------------------------------------------------------------
  Index Scan using clstr_expression_minus_a on clstr_expression
+   Skip scan: All
    Index Cond: ((- a) = '-3'::integer)
-(2 rows)
+(3 rows)
 
 SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
  id  | a |     b     
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index ae95bb38a6..f8132ecacd 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1804,12 +1804,15 @@ SELECT * FROM tenk1
    Recheck Cond: (((thousand = 42) AND (tenthous = 1)) OR ((thousand = 42) AND (tenthous = 3)) OR ((thousand = 42) AND (tenthous = 42)))
    ->  BitmapOr
          ->  Bitmap Index Scan on tenk1_thous_tenthous
+               Skip scan: All
                Index Cond: ((thousand = 42) AND (tenthous = 1))
          ->  Bitmap Index Scan on tenk1_thous_tenthous
+               Skip scan: All
                Index Cond: ((thousand = 42) AND (tenthous = 3))
          ->  Bitmap Index Scan on tenk1_thous_tenthous
+               Skip scan: All
                Index Cond: ((thousand = 42) AND (tenthous = 42))
-(9 rows)
+(12 rows)
 
 SELECT * FROM tenk1
   WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42);
@@ -1828,13 +1831,16 @@ SELECT count(*) FROM tenk1
          Recheck Cond: ((hundred = 42) AND ((thousand = 42) OR (thousand = 99)))
          ->  BitmapAnd
                ->  Bitmap Index Scan on tenk1_hundred
+                     Skip scan: All
                      Index Cond: (hundred = 42)
                ->  BitmapOr
                      ->  Bitmap Index Scan on tenk1_thous_tenthous
+                           Skip scan: All
                            Index Cond: (thousand = 42)
                      ->  Bitmap Index Scan on tenk1_thous_tenthous
+                           Skip scan: All
                            Index Cond: (thousand = 99)
-(11 rows)
+(14 rows)
 
 SELECT count(*) FROM tenk1
   WHERE hundred = 42 AND (thousand = 42 OR thousand = 99);
@@ -1859,8 +1865,9 @@ EXPLAIN (COSTS OFF)
    ->  Bitmap Heap Scan on dupindexcols
          Recheck Cond: ((f1 >= 'WA'::text) AND (f1 <= 'ZZZ'::text) AND (id < 1000) AND (f1 ~<~ 'YX'::text))
          ->  Bitmap Index Scan on dupindexcols_i
+               Skip scan: All
                Index Cond: ((f1 >= 'WA'::text) AND (f1 <= 'ZZZ'::text) AND (id < 1000) AND (f1 ~<~ 'YX'::text))
-(5 rows)
+(6 rows)
 
 SELECT count(*) FROM dupindexcols
   WHERE f1 BETWEEN 'WA' AND 'ZZZ' and id < 1000 and f1 ~<~ 'YX';
@@ -1880,8 +1887,9 @@ ORDER BY unique1;
                       QUERY PLAN                       
 -------------------------------------------------------
  Index Only Scan using tenk1_unique1 on tenk1
+   Skip scan: All
    Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
-(2 rows)
+(3 rows)
 
 SELECT unique1 FROM tenk1
 WHERE unique1 IN (1,42,7)
@@ -1900,9 +1908,10 @@ ORDER BY thousand;
                       QUERY PLAN                       
 -------------------------------------------------------
  Index Only Scan using tenk1_thous_tenthous on tenk1
+   Skip scan: All
    Index Cond: (thousand < 2)
    Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
-(3 rows)
+(4 rows)
 
 SELECT thousand, tenthous FROM tenk1
 WHERE thousand < 2 AND tenthous IN (1001,3000)
@@ -1923,8 +1932,9 @@ ORDER BY thousand;
  Sort
    Sort Key: thousand
    ->  Index Scan using tenk1_thous_tenthous on tenk1
+         Skip scan: All
          Index Cond: ((thousand < 2) AND (tenthous = ANY ('{1001,3000}'::integer[])))
-(4 rows)
+(5 rows)
 
 SELECT thousand, tenthous FROM tenk1
 WHERE thousand < 2 AND tenthous IN (1001,3000)
@@ -1944,8 +1954,9 @@ explain (costs off)
                       QUERY PLAN                      
 ------------------------------------------------------
  Index Scan using tenk1_thous_tenthous on tenk1
+   Skip scan: All
    Index Cond: ((thousand = 1) AND (tenthous = 1001))
-(2 rows)
+(3 rows)
 
 --
 -- Check matching of boolean index columns to WHERE conditions and sort keys
@@ -1957,7 +1968,8 @@ explain (costs off)
 -------------------------------------------------------
  Limit
    ->  Index Scan using boolindex_b_i_key on boolindex
-(2 rows)
+         Skip scan: All
+(3 rows)
 
 explain (costs off)
   select * from boolindex where b order by i limit 10;
@@ -1965,8 +1977,9 @@ explain (costs off)
 -------------------------------------------------------
  Limit
    ->  Index Scan using boolindex_b_i_key on boolindex
+         Skip scan: All
          Index Cond: (b = true)
-(3 rows)
+(4 rows)
 
 explain (costs off)
   select * from boolindex where b = true order by i desc limit 10;
@@ -1974,8 +1987,9 @@ explain (costs off)
 ----------------------------------------------------------------
  Limit
    ->  Index Scan Backward using boolindex_b_i_key on boolindex
+         Skip scan: All
          Index Cond: (b = true)
-(3 rows)
+(4 rows)
 
 explain (costs off)
   select * from boolindex where not b order by i limit 10;
@@ -1983,8 +1997,9 @@ explain (costs off)
 -------------------------------------------------------
  Limit
    ->  Index Scan using boolindex_b_i_key on boolindex
+         Skip scan: All
          Index Cond: (b = false)
-(3 rows)
+(4 rows)
 
 explain (costs off)
   select * from boolindex where b is true order by i desc limit 10;
@@ -1992,8 +2007,9 @@ explain (costs off)
 ----------------------------------------------------------------
  Limit
    ->  Index Scan Backward using boolindex_b_i_key on boolindex
+         Skip scan: All
          Index Cond: (b = true)
-(3 rows)
+(4 rows)
 
 explain (costs off)
   select * from boolindex where b is false order by i desc limit 10;
@@ -2001,8 +2017,9 @@ explain (costs off)
 ----------------------------------------------------------------
  Limit
    ->  Index Scan Backward using boolindex_b_i_key on boolindex
+         Skip scan: All
          Index Cond: (b = false)
-(3 rows)
+(4 rows)
 
 --
 -- REINDEX (VERBOSE)
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..e163b8bf89 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -107,27 +107,30 @@ explain (costs off)
             QUERY PLAN             
 -----------------------------------
  Index Scan using ec0_pkey on ec0
+   Skip scan: All
    Index Cond: (ff = '42'::bigint)
    Filter: (f1 = '42'::bigint)
-(3 rows)
+(4 rows)
 
 explain (costs off)
   select * from ec0 where ff = f1 and f1 = '42'::int8alias1;
               QUERY PLAN               
 ---------------------------------------
  Index Scan using ec0_pkey on ec0
+   Skip scan: All
    Index Cond: (ff = '42'::int8alias1)
    Filter: (f1 = '42'::int8alias1)
-(3 rows)
+(4 rows)
 
 explain (costs off)
   select * from ec1 where ff = f1 and f1 = '42'::int8alias1;
               QUERY PLAN               
 ---------------------------------------
  Index Scan using ec1_pkey on ec1
+   Skip scan: All
    Index Cond: (ff = '42'::int8alias1)
    Filter: (f1 = '42'::int8alias1)
-(3 rows)
+(4 rows)
 
 explain (costs off)
   select * from ec1 where ff = f1 and f1 = '42'::int8alias2;
@@ -144,9 +147,10 @@ explain (costs off)
  Nested Loop
    Join Filter: (ec1.ff = ec2.x1)
    ->  Index Scan using ec1_pkey on ec1
+         Skip scan: All
          Index Cond: ((ff = '42'::bigint) AND (ff = '42'::bigint))
    ->  Seq Scan on ec2
-(5 rows)
+(6 rows)
 
 explain (costs off)
   select * from ec1, ec2 where ff = x1 and ff = '42'::int8alias1;
@@ -154,10 +158,11 @@ explain (costs off)
 ---------------------------------------------
  Nested Loop
    ->  Index Scan using ec1_pkey on ec1
+         Skip scan: All
          Index Cond: (ff = '42'::int8alias1)
    ->  Seq Scan on ec2
          Filter: (x1 = '42'::int8alias1)
-(5 rows)
+(6 rows)
 
 explain (costs off)
   select * from ec1, ec2 where ff = x1 and '42'::int8 = x1;
@@ -166,10 +171,11 @@ explain (costs off)
  Nested Loop
    Join Filter: (ec1.ff = ec2.x1)
    ->  Index Scan using ec1_pkey on ec1
+         Skip scan: All
          Index Cond: (ff = '42'::bigint)
    ->  Seq Scan on ec2
          Filter: ('42'::bigint = x1)
-(6 rows)
+(7 rows)
 
 explain (costs off)
   select * from ec1, ec2 where ff = x1 and x1 = '42'::int8alias1;
@@ -177,10 +183,11 @@ explain (costs off)
 ---------------------------------------------
  Nested Loop
    ->  Index Scan using ec1_pkey on ec1
+         Skip scan: All
          Index Cond: (ff = '42'::int8alias1)
    ->  Seq Scan on ec2
          Filter: (x1 = '42'::int8alias1)
-(5 rows)
+(6 rows)
 
 explain (costs off)
   select * from ec1, ec2 where ff = x1 and x1 = '42'::int8alias2;
@@ -190,8 +197,9 @@ explain (costs off)
    ->  Seq Scan on ec2
          Filter: (x1 = '42'::int8alias2)
    ->  Index Scan using ec1_pkey on ec1
+         Skip scan: All
          Index Cond: (ff = ec2.x1)
-(5 rows)
+(6 rows)
 
 create unique index ec1_expr1 on ec1((ff + 1));
 create unique index ec1_expr2 on ec1((ff + 2 + 1));
@@ -210,15 +218,19 @@ explain (costs off)
 -----------------------------------------------------
  Nested Loop
    ->  Index Scan using ec1_pkey on ec1
+         Skip scan: All
          Index Cond: (ff = '42'::bigint)
    ->  Append
          ->  Index Scan using ec1_expr2 on ec1 ec1_1
+               Skip scan: All
                Index Cond: (((ff + 2) + 1) = ec1.f1)
          ->  Index Scan using ec1_expr3 on ec1 ec1_2
+               Skip scan: All
                Index Cond: (((ff + 3) + 1) = ec1.f1)
          ->  Index Scan using ec1_expr4 on ec1 ec1_3
+               Skip scan: All
                Index Cond: ((ff + 4) = ec1.f1)
-(10 rows)
+(14 rows)
 
 explain (costs off)
   select * from ec1,
@@ -234,16 +246,20 @@ explain (costs off)
  Nested Loop
    Join Filter: ((((ec1_1.ff + 2) + 1)) = ec1.f1)
    ->  Index Scan using ec1_pkey on ec1
+         Skip scan: All
          Index Cond: ((ff = '42'::bigint) AND (ff = '42'::bigint))
          Filter: (ff = f1)
    ->  Append
          ->  Index Scan using ec1_expr2 on ec1 ec1_1
+               Skip scan: All
                Index Cond: (((ff + 2) + 1) = '42'::bigint)
          ->  Index Scan using ec1_expr3 on ec1 ec1_2
+               Skip scan: All
                Index Cond: (((ff + 3) + 1) = '42'::bigint)
          ->  Index Scan using ec1_expr4 on ec1 ec1_3
+               Skip scan: All
                Index Cond: ((ff + 4) = '42'::bigint)
-(12 rows)
+(16 rows)
 
 explain (costs off)
   select * from ec1,
@@ -265,22 +281,29 @@ explain (costs off)
  Nested Loop
    ->  Nested Loop
          ->  Index Scan using ec1_pkey on ec1
+               Skip scan: All
                Index Cond: (ff = '42'::bigint)
          ->  Append
                ->  Index Scan using ec1_expr2 on ec1 ec1_1
+                     Skip scan: All
                      Index Cond: (((ff + 2) + 1) = ec1.f1)
                ->  Index Scan using ec1_expr3 on ec1 ec1_2
+                     Skip scan: All
                      Index Cond: (((ff + 3) + 1) = ec1.f1)
                ->  Index Scan using ec1_expr4 on ec1 ec1_3
+                     Skip scan: All
                      Index Cond: ((ff + 4) = ec1.f1)
    ->  Append
          ->  Index Scan using ec1_expr2 on ec1 ec1_4
+               Skip scan: All
                Index Cond: (((ff + 2) + 1) = (((ec1_1.ff + 2) + 1)))
          ->  Index Scan using ec1_expr3 on ec1 ec1_5
+               Skip scan: All
                Index Cond: (((ff + 3) + 1) = (((ec1_1.ff + 2) + 1)))
          ->  Index Scan using ec1_expr4 on ec1 ec1_6
+               Skip scan: All
                Index Cond: ((ff + 4) = (((ec1_1.ff + 2) + 1)))
-(18 rows)
+(25 rows)
 
 -- let's try that as a mergejoin
 set enable_mergejoin = on;
@@ -307,21 +330,28 @@ explain (costs off)
    ->  Merge Append
          Sort Key: (((ec1_4.ff + 2) + 1))
          ->  Index Scan using ec1_expr2 on ec1 ec1_4
+               Skip scan: All
          ->  Index Scan using ec1_expr3 on ec1 ec1_5
+               Skip scan: All
          ->  Index Scan using ec1_expr4 on ec1 ec1_6
+               Skip scan: All
    ->  Materialize
          ->  Merge Join
                Merge Cond: ((((ec1_1.ff + 2) + 1)) = ec1.f1)
                ->  Merge Append
                      Sort Key: (((ec1_1.ff + 2) + 1))
                      ->  Index Scan using ec1_expr2 on ec1 ec1_1
+                           Skip scan: All
                      ->  Index Scan using ec1_expr3 on ec1 ec1_2
+                           Skip scan: All
                      ->  Index Scan using ec1_expr4 on ec1 ec1_3
+                           Skip scan: All
                ->  Sort
                      Sort Key: ec1.f1 USING <
                      ->  Index Scan using ec1_pkey on ec1
+                           Skip scan: All
                            Index Cond: (ff = '42'::bigint)
-(19 rows)
+(26 rows)
 
 -- check partially indexed scan
 set enable_nestloop = on;
@@ -340,15 +370,18 @@ explain (costs off)
 -----------------------------------------------------
  Nested Loop
    ->  Index Scan using ec1_pkey on ec1
+         Skip scan: All
          Index Cond: (ff = '42'::bigint)
    ->  Append
          ->  Index Scan using ec1_expr2 on ec1 ec1_1
+               Skip scan: All
                Index Cond: (((ff + 2) + 1) = ec1.f1)
          ->  Seq Scan on ec1 ec1_2
                Filter: (((ff + 3) + 1) = ec1.f1)
          ->  Index Scan using ec1_expr4 on ec1 ec1_3
+               Skip scan: All
                Index Cond: ((ff + 4) = ec1.f1)
-(10 rows)
+(13 rows)
 
 -- let's try that as a mergejoin
 set enable_mergejoin = on;
@@ -369,15 +402,18 @@ explain (costs off)
    ->  Merge Append
          Sort Key: (((ec1_1.ff + 2) + 1))
          ->  Index Scan using ec1_expr2 on ec1 ec1_1
+               Skip scan: All
          ->  Sort
                Sort Key: (((ec1_2.ff + 3) + 1))
                ->  Seq Scan on ec1 ec1_2
          ->  Index Scan using ec1_expr4 on ec1 ec1_3
+               Skip scan: All
    ->  Sort
          Sort Key: ec1.f1 USING <
          ->  Index Scan using ec1_pkey on ec1
+               Skip scan: All
                Index Cond: (ff = '42'::bigint)
-(13 rows)
+(16 rows)
 
 -- check effects of row-level security
 set enable_nestloop = on;
@@ -395,10 +431,12 @@ explain (costs off)
 ---------------------------------------------
  Nested Loop
    ->  Index Scan using ec0_pkey on ec0 a
+         Skip scan: All
          Index Cond: (ff = '43'::int8alias1)
    ->  Index Scan using ec1_pkey on ec1 b
+         Skip scan: All
          Index Cond: (ff = '43'::int8alias1)
-(5 rows)
+(7 rows)
 
 set session authorization regress_user_ectest;
 -- with RLS active, the non-leakproof a.ff = 43 clause is not treated
@@ -411,11 +449,13 @@ explain (costs off)
 ---------------------------------------------
  Nested Loop
    ->  Index Scan using ec0_pkey on ec0 a
+         Skip scan: All
          Index Cond: (ff = '43'::int8alias1)
    ->  Index Scan using ec1_pkey on ec1 b
+         Skip scan: All
          Index Cond: (ff = a.ff)
          Filter: (f1 < '5'::int8alias1)
-(6 rows)
+(8 rows)
 
 reset session authorization;
 revoke select on ec0 from regress_user_ectest;
diff --git a/src/test/regress/expected/fast_default.out b/src/test/regress/expected/fast_default.out
index 10bc5ff757..145145b71c 100644
--- a/src/test/regress/expected/fast_default.out
+++ b/src/test/regress/expected/fast_default.out
@@ -431,8 +431,9 @@ DELETE FROM T WHERE pk BETWEEN 10 AND 20 RETURNING *;
          Output: ctid
          Recheck Cond: ((t.pk >= 10) AND (t.pk <= 20))
          ->  Bitmap Index Scan on t_pkey
+               Skip scan: All
                Index Cond: ((t.pk >= 10) AND (t.pk <= 20))
-(7 rows)
+(8 rows)
 
 -- UPDATE
 UPDATE T SET c_text = '"' || c_text || '"'  WHERE pk < 10;
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 07bd5b6434..d9f5d6bfbe 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -1418,14 +1418,16 @@ explain (costs off) delete from t1 where a = 1;
  Delete on t2
    ->  Nested Loop
          ->  Index Scan using t1_pkey on t1
+               Skip scan: All
                Index Cond: (a = 1)
          ->  Seq Scan on t2
                Filter: (b = 1)
  
  Delete on t1
    ->  Index Scan using t1_pkey on t1
+         Skip scan: All
          Index Cond: (a = 1)
-(10 rows)
+(12 rows)
 
 delete from t1 where a = 1;
 -- Test a primary key with attributes located in later attnum positions
diff --git a/src/test/regress/expected/generated.out b/src/test/regress/expected/generated.out
index 620579a6fd..61b267f9aa 100644
--- a/src/test/regress/expected/generated.out
+++ b/src/test/regress/expected/generated.out
@@ -462,8 +462,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE b = 4;
                  QUERY PLAN                  
 ---------------------------------------------
  Index Scan using gtest22c_b_idx on gtest22c
+   Skip scan: All
    Index Cond: (b = 4)
-(2 rows)
+(3 rows)
 
 SELECT * FROM gtest22c WHERE b = 4;
  a | b 
@@ -475,8 +476,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE b * 3 = 6;
                    QUERY PLAN                   
 ------------------------------------------------
  Index Scan using gtest22c_expr_idx on gtest22c
+   Skip scan: All
    Index Cond: ((b * 3) = 6)
-(2 rows)
+(3 rows)
 
 SELECT * FROM gtest22c WHERE b * 3 = 6;
  a | b 
@@ -488,8 +490,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE a = 1 AND b > 0;
                    QUERY PLAN                   
 ------------------------------------------------
  Index Scan using gtest22c_pred_idx on gtest22c
+   Skip scan: All
    Index Cond: (a = 1)
-(2 rows)
+(3 rows)
 
 SELECT * FROM gtest22c WHERE a = 1 AND b > 0;
  a | b 
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index dbe5140b55..164451558c 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -466,8 +466,9 @@ explain (costs off)
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan using tenk1_unique1 on tenk1
+                 Skip scan: All
                  Index Cond: (unique1 IS NOT NULL)
-(5 rows)
+(6 rows)
 
 -- Views with GROUPING SET queries
 CREATE VIEW gstest_view AS select a, b, grouping(a,b), sum(c), count(*), max(c)
@@ -1402,7 +1403,8 @@ EXPLAIN (COSTS OFF) SELECT a, b, count(*), max(a), max(b) FROM gstest3 GROUP BY
          Sort Key: b
            Group Key: b
          ->  Index Scan using gstest3_pkey on gstest3
-(8 rows)
+               Skip scan: All
+(9 rows)
 
 SELECT a, b, count(*), max(a), max(b) FROM gstest3 GROUP BY GROUPING SETS(a, b,()) ORDER BY a, b;
  a | b | count | max | max 
diff --git a/src/test/regress/expected/index_including.out b/src/test/regress/expected/index_including.out
index 8e5d53e712..4db90623f9 100644
--- a/src/test/regress/expected/index_including.out
+++ b/src/test/regress/expected/index_including.out
@@ -134,8 +134,9 @@ select * from tbl where (c1,c2,c3) < (2,5,1);
  Bitmap Heap Scan on tbl
    Filter: (ROW(c1, c2, c3) < ROW(2, 5, 1))
    ->  Bitmap Index Scan on covering
+         Skip scan: All
          Index Cond: (ROW(c1, c2) <= ROW(2, 5))
-(4 rows)
+(5 rows)
 
 select * from tbl where (c1,c2,c3) < (2,5,1);
  c1 | c2 | c3 | c4 
@@ -152,9 +153,10 @@ select * from tbl where (c1,c2,c3) < (262,1,1) limit 1;
 ----------------------------------------------------
  Limit
    ->  Index Only Scan using covering on tbl
+         Skip scan: All
          Index Cond: (ROW(c1, c2) <= ROW(262, 1))
          Filter: (ROW(c1, c2, c3) < ROW(262, 1, 1))
-(4 rows)
+(5 rows)
 
 select * from tbl where (c1,c2,c3) < (262,1,1) limit 1;
  c1 | c2 | c3 | c4 
diff --git a/src/test/regress/expected/inet.out b/src/test/regress/expected/inet.out
index 12df25fe9d..3d1fd73fd6 100644
--- a/src/test/regress/expected/inet.out
+++ b/src/test/regress/expected/inet.out
@@ -247,9 +247,10 @@ SELECT * FROM inet_tbl WHERE i<<'192.168.1.0/24'::cidr;
                                   QUERY PLAN                                   
 -------------------------------------------------------------------------------
  Index Scan using inet_idx1 on inet_tbl
+   Skip scan: All
    Index Cond: ((i > '192.168.1.0/24'::inet) AND (i <= '192.168.1.255'::inet))
    Filter: (i << '192.168.1.0/24'::inet)
-(3 rows)
+(4 rows)
 
 SELECT * FROM inet_tbl WHERE i<<'192.168.1.0/24'::cidr;
        c        |        i         
@@ -264,9 +265,10 @@ SELECT * FROM inet_tbl WHERE i<<='192.168.1.0/24'::cidr;
                                    QUERY PLAN                                   
 --------------------------------------------------------------------------------
  Index Scan using inet_idx1 on inet_tbl
+   Skip scan: All
    Index Cond: ((i >= '192.168.1.0/24'::inet) AND (i <= '192.168.1.255'::inet))
    Filter: (i <<= '192.168.1.0/24'::inet)
-(3 rows)
+(4 rows)
 
 SELECT * FROM inet_tbl WHERE i<<='192.168.1.0/24'::cidr;
        c        |        i         
@@ -284,9 +286,10 @@ SELECT * FROM inet_tbl WHERE '192.168.1.0/24'::cidr >>= i;
                                    QUERY PLAN                                   
 --------------------------------------------------------------------------------
  Index Scan using inet_idx1 on inet_tbl
+   Skip scan: All
    Index Cond: ((i >= '192.168.1.0/24'::inet) AND (i <= '192.168.1.255'::inet))
    Filter: ('192.168.1.0/24'::inet >>= i)
-(3 rows)
+(4 rows)
 
 SELECT * FROM inet_tbl WHERE '192.168.1.0/24'::cidr >>= i;
        c        |        i         
@@ -304,9 +307,10 @@ SELECT * FROM inet_tbl WHERE '192.168.1.0/24'::cidr >> i;
                                   QUERY PLAN                                   
 -------------------------------------------------------------------------------
  Index Scan using inet_idx1 on inet_tbl
+   Skip scan: All
    Index Cond: ((i > '192.168.1.0/24'::inet) AND (i <= '192.168.1.255'::inet))
    Filter: ('192.168.1.0/24'::inet >> i)
-(3 rows)
+(4 rows)
 
 SELECT * FROM inet_tbl WHERE '192.168.1.0/24'::cidr >> i;
        c        |        i         
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index dfd0ee414f..2f0be34a15 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1339,12 +1339,15 @@ select * from patest0 join (select f1 from int4_tbl limit 1) ss on id = f1;
          ->  Seq Scan on int4_tbl
    ->  Append
          ->  Index Scan using patest0i on patest0 patest0_1
+               Skip scan: All
                Index Cond: (id = int4_tbl.f1)
          ->  Index Scan using patest1i on patest1 patest0_2
+               Skip scan: All
                Index Cond: (id = int4_tbl.f1)
          ->  Index Scan using patest2i on patest2 patest0_3
+               Skip scan: All
                Index Cond: (id = int4_tbl.f1)
-(10 rows)
+(13 rows)
 
 select * from patest0 join (select f1 from int4_tbl limit 1) ss on id = f1;
  id | x | f1 
@@ -1364,12 +1367,14 @@ select * from patest0 join (select f1 from int4_tbl limit 1) ss on id = f1;
          ->  Seq Scan on int4_tbl
    ->  Append
          ->  Index Scan using patest0i on patest0 patest0_1
+               Skip scan: All
                Index Cond: (id = int4_tbl.f1)
          ->  Index Scan using patest1i on patest1 patest0_2
+               Skip scan: All
                Index Cond: (id = int4_tbl.f1)
          ->  Seq Scan on patest2 patest0_3
                Filter: (int4_tbl.f1 = id)
-(10 rows)
+(12 rows)
 
 select * from patest0 join (select f1 from int4_tbl limit 1) ss on id = f1;
  id | x | f1 
@@ -1466,8 +1471,10 @@ explain (verbose, costs off) select * from matest0 order by 1-id;
    Sort Key: ((1 - matest0.id))
    ->  Index Scan using matest0i on public.matest0 matest0_1
          Output: matest0_1.id, matest0_1.name, (1 - matest0_1.id)
+         Skip scan: All
    ->  Index Scan using matest1i on public.matest1 matest0_2
          Output: matest0_2.id, matest0_2.name, (1 - matest0_2.id)
+         Skip scan: All
    ->  Sort
          Output: matest0_3.id, matest0_3.name, ((1 - matest0_3.id))
          Sort Key: ((1 - matest0_3.id))
@@ -1475,7 +1482,8 @@ explain (verbose, costs off) select * from matest0 order by 1-id;
                Output: matest0_3.id, matest0_3.name, (1 - matest0_3.id)
    ->  Index Scan using matest3i on public.matest3 matest0_4
          Output: matest0_4.id, matest0_4.name, (1 - matest0_4.id)
-(13 rows)
+         Skip scan: All
+(16 rows)
 
 select * from matest0 order by 1-id;
  id |  name  
@@ -1502,9 +1510,11 @@ explain (verbose, costs off) select min(1-id) from matest0;
                        Sort Key: ((1 - matest0.id))
                        ->  Index Scan using matest0i on public.matest0 matest0_1
                              Output: matest0_1.id, (1 - matest0_1.id)
+                             Skip scan: All
                              Index Cond: ((1 - matest0_1.id) IS NOT NULL)
                        ->  Index Scan using matest1i on public.matest1 matest0_2
                              Output: matest0_2.id, (1 - matest0_2.id)
+                             Skip scan: All
                              Index Cond: ((1 - matest0_2.id) IS NOT NULL)
                        ->  Sort
                              Output: matest0_3.id, ((1 - matest0_3.id))
@@ -1513,10 +1523,12 @@ explain (verbose, costs off) select min(1-id) from matest0;
                                    Output: matest0_3.id, (1 - matest0_3.id)
                                    Filter: ((1 - matest0_3.id) IS NOT NULL)
                                    ->  Bitmap Index Scan on matest2_pkey
+                                         Skip scan: All
                        ->  Index Scan using matest3i on public.matest3 matest0_4
                              Output: matest0_4.id, (1 - matest0_4.id)
+                             Skip scan: All
                              Index Cond: ((1 - matest0_4.id) IS NOT NULL)
-(25 rows)
+(29 rows)
 
 select min(1-id) from matest0;
  min 
@@ -1552,15 +1564,19 @@ order by t1.b limit 10;
          ->  Merge Append
                Sort Key: t1.b
                ->  Index Scan using matest0i on matest0 t1_1
+                     Skip scan: All
                ->  Index Scan using matest1i on matest1 t1_2
+                     Skip scan: All
          ->  Materialize
                ->  Merge Append
                      Sort Key: t2.b
                      ->  Index Scan using matest0i on matest0 t2_1
+                           Skip scan: All
                            Filter: (c = d)
                      ->  Index Scan using matest1i on matest1 t2_2
+                           Skip scan: All
                            Filter: (c = d)
-(14 rows)
+(18 rows)
 
 reset enable_nestloop;
 drop table matest0 cascade;
@@ -1582,10 +1598,12 @@ ORDER BY thousand, tenthous;
  Merge Append
    Sort Key: tenk1.thousand, tenk1.tenthous
    ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+         Skip scan: All
    ->  Sort
          Sort Key: tenk1_1.thousand, tenk1_1.thousand
          ->  Index Only Scan using tenk1_thous_tenthous on tenk1 tenk1_1
-(6 rows)
+               Skip scan: All
+(8 rows)
 
 explain (costs off)
 SELECT thousand, tenthous, thousand+tenthous AS x FROM tenk1
@@ -1597,10 +1615,12 @@ ORDER BY thousand, tenthous;
  Merge Append
    Sort Key: tenk1.thousand, tenk1.tenthous
    ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+         Skip scan: All
    ->  Sort
          Sort Key: 42, 42
          ->  Index Only Scan using tenk1_hundred on tenk1 tenk1_1
-(6 rows)
+               Skip scan: All
+(8 rows)
 
 explain (costs off)
 SELECT thousand, tenthous FROM tenk1
@@ -1612,10 +1632,12 @@ ORDER BY thousand, tenthous;
  Merge Append
    Sort Key: tenk1.thousand, tenk1.tenthous
    ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+         Skip scan: All
    ->  Sort
          Sort Key: tenk1_1.thousand, ((random())::integer)
          ->  Index Only Scan using tenk1_thous_tenthous on tenk1 tenk1_1
-(6 rows)
+               Skip scan: All
+(8 rows)
 
 -- Check min/max aggregate optimization
 explain (costs off)
@@ -1631,10 +1653,12 @@ SELECT min(x) FROM
            ->  Merge Append
                  Sort Key: a.unique1
                  ->  Index Only Scan using tenk1_unique1 on tenk1 a
+                       Skip scan: All
                        Index Cond: (unique1 IS NOT NULL)
                  ->  Index Only Scan using tenk1_unique2 on tenk1 b
+                       Skip scan: All
                        Index Cond: (unique2 IS NOT NULL)
-(9 rows)
+(11 rows)
 
 explain (costs off)
 SELECT min(y) FROM
@@ -1649,10 +1673,12 @@ SELECT min(y) FROM
            ->  Merge Append
                  Sort Key: a.unique1
                  ->  Index Only Scan using tenk1_unique1 on tenk1 a
+                       Skip scan: All
                        Index Cond: (unique1 IS NOT NULL)
                  ->  Index Only Scan using tenk1_unique2 on tenk1 b
+                       Skip scan: All
                        Index Cond: (unique2 IS NOT NULL)
-(9 rows)
+(11 rows)
 
 -- XXX planner doesn't recognize that index on unique2 is sufficiently sorted
 explain (costs off)
@@ -1666,10 +1692,12 @@ ORDER BY x, y;
  Merge Append
    Sort Key: a.thousand, a.tenthous
    ->  Index Only Scan using tenk1_thous_tenthous on tenk1 a
+         Skip scan: All
    ->  Sort
          Sort Key: b.unique2, b.unique2
          ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(6 rows)
+               Skip scan: All
+(8 rows)
 
 -- exercise rescan code path via a repeatedly-evaluated subquery
 explain (costs off)
@@ -2069,12 +2097,14 @@ explain (costs off) select min(a), max(a) from parted_minmax where b = '12345';
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan using parted_minmax1i on parted_minmax1 parted_minmax
+                 Skip scan: All
                  Index Cond: ((a IS NOT NULL) AND (b = '12345'::text))
    InitPlan 2 (returns $1)
      ->  Limit
            ->  Index Only Scan Backward using parted_minmax1i on parted_minmax1 parted_minmax_1
+                 Skip scan: All
                  Index Cond: ((a IS NOT NULL) AND (b = '12345'::text))
-(9 rows)
+(11 rows)
 
 select min(a), max(a) from parted_minmax where b = '12345';
  min | max 
@@ -2093,13 +2123,20 @@ explain (costs off) select * from mcrparted order by a, abs(b), c;
  Merge Append
    Sort Key: mcrparted.a, (abs(mcrparted.b)), mcrparted.c
    ->  Index Scan using mcrparted0_a_abs_c_idx on mcrparted0 mcrparted_1
+         Skip scan: All
    ->  Index Scan using mcrparted1_a_abs_c_idx on mcrparted1 mcrparted_2
+         Skip scan: All
    ->  Index Scan using mcrparted2_a_abs_c_idx on mcrparted2 mcrparted_3
+         Skip scan: All
    ->  Index Scan using mcrparted3_a_abs_c_idx on mcrparted3 mcrparted_4
+         Skip scan: All
    ->  Index Scan using mcrparted4_a_abs_c_idx on mcrparted4 mcrparted_5
+         Skip scan: All
    ->  Index Scan using mcrparted5_a_abs_c_idx on mcrparted5 mcrparted_6
+         Skip scan: All
    ->  Index Scan using mcrparted_def_a_abs_c_idx on mcrparted_def mcrparted_7
-(9 rows)
+         Skip scan: All
+(16 rows)
 
 drop table mcrparted_def;
 -- Append is used for a RANGE partitioned table with no default
@@ -2109,12 +2146,18 @@ explain (costs off) select * from mcrparted order by a, abs(b), c;
 -------------------------------------------------------------------------
  Append
    ->  Index Scan using mcrparted0_a_abs_c_idx on mcrparted0 mcrparted_1
+         Skip scan: All
    ->  Index Scan using mcrparted1_a_abs_c_idx on mcrparted1 mcrparted_2
+         Skip scan: All
    ->  Index Scan using mcrparted2_a_abs_c_idx on mcrparted2 mcrparted_3
+         Skip scan: All
    ->  Index Scan using mcrparted3_a_abs_c_idx on mcrparted3 mcrparted_4
+         Skip scan: All
    ->  Index Scan using mcrparted4_a_abs_c_idx on mcrparted4 mcrparted_5
+         Skip scan: All
    ->  Index Scan using mcrparted5_a_abs_c_idx on mcrparted5 mcrparted_6
-(7 rows)
+         Skip scan: All
+(13 rows)
 
 -- Append is used with subpaths in reverse order with backwards index scans
 explain (costs off) select * from mcrparted order by a desc, abs(b) desc, c desc;
@@ -2122,12 +2165,18 @@ explain (costs off) select * from mcrparted order by a desc, abs(b) desc, c desc
 ----------------------------------------------------------------------------------
  Append
    ->  Index Scan Backward using mcrparted5_a_abs_c_idx on mcrparted5 mcrparted_6
+         Skip scan: All
    ->  Index Scan Backward using mcrparted4_a_abs_c_idx on mcrparted4 mcrparted_5
+         Skip scan: All
    ->  Index Scan Backward using mcrparted3_a_abs_c_idx on mcrparted3 mcrparted_4
+         Skip scan: All
    ->  Index Scan Backward using mcrparted2_a_abs_c_idx on mcrparted2 mcrparted_3
+         Skip scan: All
    ->  Index Scan Backward using mcrparted1_a_abs_c_idx on mcrparted1 mcrparted_2
+         Skip scan: All
    ->  Index Scan Backward using mcrparted0_a_abs_c_idx on mcrparted0 mcrparted_1
-(7 rows)
+         Skip scan: All
+(13 rows)
 
 -- check that Append plan is used containing a MergeAppend for sub-partitions
 -- that are unordered.
@@ -2140,15 +2189,22 @@ explain (costs off) select * from mcrparted order by a, abs(b), c;
 ---------------------------------------------------------------------------------------
  Append
    ->  Index Scan using mcrparted0_a_abs_c_idx on mcrparted0 mcrparted_1
+         Skip scan: All
    ->  Index Scan using mcrparted1_a_abs_c_idx on mcrparted1 mcrparted_2
+         Skip scan: All
    ->  Index Scan using mcrparted2_a_abs_c_idx on mcrparted2 mcrparted_3
+         Skip scan: All
    ->  Index Scan using mcrparted3_a_abs_c_idx on mcrparted3 mcrparted_4
+         Skip scan: All
    ->  Index Scan using mcrparted4_a_abs_c_idx on mcrparted4 mcrparted_5
+         Skip scan: All
    ->  Merge Append
          Sort Key: mcrparted_7.a, (abs(mcrparted_7.b)), mcrparted_7.c
          ->  Index Scan using mcrparted5a_a_abs_c_idx on mcrparted5a mcrparted_7
+               Skip scan: All
          ->  Index Scan using mcrparted5_def_a_abs_c_idx on mcrparted5_def mcrparted_8
-(10 rows)
+               Skip scan: All
+(17 rows)
 
 drop table mcrparted5_def;
 -- check that an Append plan is used and the sub-partitions are flattened
@@ -2159,12 +2215,18 @@ explain (costs off) select a, abs(b) from mcrparted order by a, abs(b), c;
 ---------------------------------------------------------------------------
  Append
    ->  Index Scan using mcrparted0_a_abs_c_idx on mcrparted0 mcrparted_1
+         Skip scan: All
    ->  Index Scan using mcrparted1_a_abs_c_idx on mcrparted1 mcrparted_2
+         Skip scan: All
    ->  Index Scan using mcrparted2_a_abs_c_idx on mcrparted2 mcrparted_3
+         Skip scan: All
    ->  Index Scan using mcrparted3_a_abs_c_idx on mcrparted3 mcrparted_4
+         Skip scan: All
    ->  Index Scan using mcrparted4_a_abs_c_idx on mcrparted4 mcrparted_5
+         Skip scan: All
    ->  Index Scan using mcrparted5a_a_abs_c_idx on mcrparted5a mcrparted_6
-(7 rows)
+         Skip scan: All
+(13 rows)
 
 -- check that Append is used when the sub-partitioned tables are pruned
 -- during planning.
@@ -2173,14 +2235,18 @@ explain (costs off) select * from mcrparted where a < 20 order by a, abs(b), c;
 -------------------------------------------------------------------------
  Append
    ->  Index Scan using mcrparted0_a_abs_c_idx on mcrparted0 mcrparted_1
+         Skip scan: All
          Index Cond: (a < 20)
    ->  Index Scan using mcrparted1_a_abs_c_idx on mcrparted1 mcrparted_2
+         Skip scan: All
          Index Cond: (a < 20)
    ->  Index Scan using mcrparted2_a_abs_c_idx on mcrparted2 mcrparted_3
+         Skip scan: All
          Index Cond: (a < 20)
    ->  Index Scan using mcrparted3_a_abs_c_idx on mcrparted3 mcrparted_4
+         Skip scan: All
          Index Cond: (a < 20)
-(9 rows)
+(13 rows)
 
 create table mclparted (a int) partition by list(a);
 create table mclparted1 partition of mclparted for values in(1);
@@ -2192,8 +2258,10 @@ explain (costs off) select * from mclparted order by a;
 ------------------------------------------------------------------------
  Append
    ->  Index Only Scan using mclparted1_a_idx on mclparted1 mclparted_1
+         Skip scan: All
    ->  Index Only Scan using mclparted2_a_idx on mclparted2 mclparted_2
-(3 rows)
+         Skip scan: All
+(5 rows)
 
 -- Ensure a MergeAppend is used when a partition exists with interleaved
 -- datums in the partition bound.
@@ -2205,10 +2273,14 @@ explain (costs off) select * from mclparted order by a;
  Merge Append
    Sort Key: mclparted.a
    ->  Index Only Scan using mclparted1_a_idx on mclparted1 mclparted_1
+         Skip scan: All
    ->  Index Only Scan using mclparted2_a_idx on mclparted2 mclparted_2
+         Skip scan: All
    ->  Index Only Scan using mclparted3_5_a_idx on mclparted3_5 mclparted_3
+         Skip scan: All
    ->  Index Only Scan using mclparted4_a_idx on mclparted4 mclparted_4
-(6 rows)
+         Skip scan: All
+(10 rows)
 
 drop table mclparted;
 -- Ensure subplans which don't have a path with the correct pathkeys get
@@ -2228,12 +2300,15 @@ explain (costs off) select * from mcrparted where a < 20 order by a, abs(b), c l
                ->  Seq Scan on mcrparted0 mcrparted_1
                      Filter: (a < 20)
          ->  Index Scan using mcrparted1_a_abs_c_idx on mcrparted1 mcrparted_2
+               Skip scan: All
                Index Cond: (a < 20)
          ->  Index Scan using mcrparted2_a_abs_c_idx on mcrparted2 mcrparted_3
+               Skip scan: All
                Index Cond: (a < 20)
          ->  Index Scan using mcrparted3_a_abs_c_idx on mcrparted3 mcrparted_4
+               Skip scan: All
                Index Cond: (a < 20)
-(12 rows)
+(15 rows)
 
 set enable_bitmapscan = 0;
 -- Ensure Append node can be used when the partition is ordered by some
@@ -2243,10 +2318,12 @@ explain (costs off) select * from mcrparted where a = 10 order by a, abs(b), c;
 -------------------------------------------------------------------------
  Append
    ->  Index Scan using mcrparted1_a_abs_c_idx on mcrparted1 mcrparted_1
+         Skip scan: All
          Index Cond: (a = 10)
    ->  Index Scan using mcrparted2_a_abs_c_idx on mcrparted2 mcrparted_2
+         Skip scan: All
          Index Cond: (a = 10)
-(5 rows)
+(7 rows)
 
 reset enable_bitmapscan;
 drop table mcrparted;
@@ -2260,8 +2337,10 @@ explain (costs off) select * from bool_lp order by b;
 ----------------------------------------------------------------------------
  Append
    ->  Index Only Scan using bool_lp_false_b_idx on bool_lp_false bool_lp_1
+         Skip scan: All
    ->  Index Only Scan using bool_lp_true_b_idx on bool_lp_true bool_lp_2
-(3 rows)
+         Skip scan: All
+(5 rows)
 
 drop table bool_lp;
 -- Ensure const bool quals can be properly detected as redundant
@@ -2276,40 +2355,48 @@ explain (costs off) select * from bool_rp where b = true order by b,a;
 ----------------------------------------------------------------------------------
  Append
    ->  Index Only Scan using bool_rp_true_1k_b_a_idx on bool_rp_true_1k bool_rp_1
+         Skip scan: All
          Index Cond: (b = true)
    ->  Index Only Scan using bool_rp_true_2k_b_a_idx on bool_rp_true_2k bool_rp_2
+         Skip scan: All
          Index Cond: (b = true)
-(5 rows)
+(7 rows)
 
 explain (costs off) select * from bool_rp where b = false order by b,a;
                                      QUERY PLAN                                     
 ------------------------------------------------------------------------------------
  Append
    ->  Index Only Scan using bool_rp_false_1k_b_a_idx on bool_rp_false_1k bool_rp_1
+         Skip scan: All
          Index Cond: (b = false)
    ->  Index Only Scan using bool_rp_false_2k_b_a_idx on bool_rp_false_2k bool_rp_2
+         Skip scan: All
          Index Cond: (b = false)
-(5 rows)
+(7 rows)
 
 explain (costs off) select * from bool_rp where b = true order by a;
                                     QUERY PLAN                                    
 ----------------------------------------------------------------------------------
  Append
    ->  Index Only Scan using bool_rp_true_1k_b_a_idx on bool_rp_true_1k bool_rp_1
+         Skip scan: All
          Index Cond: (b = true)
    ->  Index Only Scan using bool_rp_true_2k_b_a_idx on bool_rp_true_2k bool_rp_2
+         Skip scan: All
          Index Cond: (b = true)
-(5 rows)
+(7 rows)
 
 explain (costs off) select * from bool_rp where b = false order by a;
                                      QUERY PLAN                                     
 ------------------------------------------------------------------------------------
  Append
    ->  Index Only Scan using bool_rp_false_1k_b_a_idx on bool_rp_false_1k bool_rp_1
+         Skip scan: All
          Index Cond: (b = false)
    ->  Index Only Scan using bool_rp_false_2k_b_a_idx on bool_rp_false_2k bool_rp_2
+         Skip scan: All
          Index Cond: (b = false)
-(5 rows)
+(7 rows)
 
 drop table bool_rp;
 -- Ensure an Append scan is chosen when the partition order is a subset of
@@ -2323,16 +2410,20 @@ explain (costs off) select * from range_parted order by a,b,c;
 -------------------------------------------------------------------------------------
  Append
    ->  Index Only Scan using range_parted1_a_b_c_idx on range_parted1 range_parted_1
+         Skip scan: All
    ->  Index Only Scan using range_parted2_a_b_c_idx on range_parted2 range_parted_2
-(3 rows)
+         Skip scan: All
+(5 rows)
 
 explain (costs off) select * from range_parted order by a desc,b desc,c desc;
                                           QUERY PLAN                                          
 ----------------------------------------------------------------------------------------------
  Append
    ->  Index Only Scan Backward using range_parted2_a_b_c_idx on range_parted2 range_parted_2
+         Skip scan: All
    ->  Index Only Scan Backward using range_parted1_a_b_c_idx on range_parted1 range_parted_1
-(3 rows)
+         Skip scan: All
+(5 rows)
 
 drop table range_parted;
 -- Check that we allow access to a child table's statistics when the user
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index 1338b2b23e..e7eae05abf 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -54,10 +54,11 @@ explain (costs off) insert into insertconflicttest values(0, 'Crowberry') on con
    ->  Result
    SubPlan 1
      ->  Index Only Scan using both_index_expr_key on insertconflicttest ii
+           Skip scan: All
            Index Cond: (key = excluded.key)
    SubPlan 2
      ->  Seq Scan on insertconflicttest ii_1
-(10 rows)
+(11 rows)
 
 -- Neither collation nor operator class specifications are required --
 -- supplying them merely *limits* matches to indexes with matching opclasses
diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out
index f772909e49..09131c5933 100644
--- a/src/test/regress/expected/interval.out
+++ b/src/test/regress/expected/interval.out
@@ -260,7 +260,8 @@ SELECT f1 FROM INTERVAL_TBL_OF r1 ORDER BY f1;
                              QUERY PLAN                             
 --------------------------------------------------------------------
  Index Only Scan using interval_tbl_of_f1_idx on interval_tbl_of r1
-(1 row)
+   Skip scan: All
+(2 rows)
 
 SELECT f1 FROM INTERVAL_TBL_OF r1 ORDER BY f1;
                     f1                     
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 761376b007..789cb32585 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -1857,16 +1857,16 @@ select * from int4_tbl i4, tenk1 a
 where exists(select * from tenk1 b
              where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
       and i4.f1 = a.tenthous;
-                  QUERY PLAN                  
-----------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Hash Semi Join
    Hash Cond: (a.twothousand = b.twothousand)
    Join Filter: (a.fivethous <> b.fivethous)
-   ->  Hash Join
-         Hash Cond: (a.tenthous = i4.f1)
-         ->  Seq Scan on tenk1 a
-         ->  Hash
-               ->  Seq Scan on int4_tbl i4
+   ->  Nested Loop
+         ->  Seq Scan on int4_tbl i4
+         ->  Index Scan using tenk1_thous_tenthous on tenk1 a
+               Skip scan: All
+               Index Cond: (tenthous = i4.f1)
    ->  Hash
          ->  Seq Scan on tenk1 b
 (10 rows)
@@ -2293,8 +2293,9 @@ where b.f1 = t.thousand and a.f1 = b.f1 and (a.f1+b.f1+999) = t.tenthous;
          ->  Aggregate
                ->  Seq Scan on int4_tbl i4a
          ->  Index Only Scan using tenk1_thous_tenthous on tenk1 t
+               Skip scan: All
                Index Cond: ((thousand = (sum(i4b.f1))) AND (tenthous = ((((sum(i4a.f1) + 1)) + (sum(i4b.f1))) + 999)))
-(9 rows)
+(10 rows)
 
 select a.f1, b.f1, t.thousand, t.tenthous from
   tenk1 t,
@@ -2373,7 +2374,8 @@ select count(*) from
                ->  Seq Scan on tenk1 x
          ->  Materialize
                ->  Index Scan using tenk1_unique2 on tenk1 y
-(9 rows)
+                     Skip scan: All
+(10 rows)
 
 select count(*) from
   (select * from tenk1 x order by x.thousand, x.twothousand, x.fivethous) x
@@ -2493,10 +2495,11 @@ select count(*) from tenk1 a, tenk1 b
    ->  Hash Join
          Hash Cond: (a.hundred = b.thousand)
          ->  Index Only Scan using tenk1_hundred on tenk1 a
+               Skip scan: All
          ->  Hash
                ->  Seq Scan on tenk1 b
                      Filter: ((fivethous % 10) < 10)
-(7 rows)
+(8 rows)
 
 select count(*) from tenk1 a, tenk1 b
   where a.hundred = b.thousand and (b.fivethous % 10) < 10;
@@ -2702,9 +2705,11 @@ select a.idv, b.idv from tidv a, tidv b where a.idv = b.idv;
  Merge Join
    Merge Cond: (a.idv = b.idv)
    ->  Index Only Scan using tidv_idv_idx on tidv a
+         Skip scan: All
    ->  Materialize
          ->  Index Only Scan using tidv_idv_idx on tidv b
-(5 rows)
+               Skip scan: All
+(7 rows)
 
 set enable_mergejoin = 0;
 explain (costs off)
@@ -2714,8 +2719,9 @@ select a.idv, b.idv from tidv a, tidv b where a.idv = b.idv;
  Nested Loop
    ->  Seq Scan on tidv a
    ->  Index Only Scan using tidv_idv_idx on tidv b
+         Skip scan: All
          Index Cond: (idv = a.idv)
-(4 rows)
+(5 rows)
 
 rollback;
 --
@@ -2874,8 +2880,9 @@ SELECT qq, unique1
          ->  Hash
                ->  Seq Scan on int8_tbl b
    ->  Index Scan using tenk1_unique2 on tenk1 c
+         Skip scan: All
          Index Cond: (unique2 = COALESCE((COALESCE(a.q1, '0'::bigint)), (COALESCE(b.q2, '-1'::bigint))))
-(8 rows)
+(9 rows)
 
 SELECT qq, unique1
   FROM
@@ -2938,13 +2945,16 @@ where nt3.id = 1 and ss2.b3;
  Nested Loop
    ->  Nested Loop
          ->  Index Scan using nt3_pkey on nt3
+               Skip scan: All
                Index Cond: (id = 1)
          ->  Index Scan using nt2_pkey on nt2
+               Skip scan: All
                Index Cond: (id = nt3.nt2_id)
    ->  Index Only Scan using nt1_pkey on nt1
+         Skip scan: All
          Index Cond: (id = nt2.nt1_id)
          Filter: (nt2.b1 AND (id IS NOT NULL))
-(9 rows)
+(12 rows)
 
 select nt3.id
 from nt3 as nt3
@@ -3081,12 +3091,14 @@ where q1 = thousand or q2 = thousand;
                Recheck Cond: ((q1.q1 = thousand) OR (q2.q2 = thousand))
                ->  BitmapOr
                      ->  Bitmap Index Scan on tenk1_thous_tenthous
+                           Skip scan: All
                            Index Cond: (thousand = q1.q1)
                      ->  Bitmap Index Scan on tenk1_thous_tenthous
+                           Skip scan: All
                            Index Cond: (thousand = q2.q2)
    ->  Hash
          ->  Seq Scan on int4_tbl
-(15 rows)
+(17 rows)
 
 explain (costs off)
 select * from
@@ -3104,10 +3116,11 @@ where thousand = (q1 + q2);
          ->  Bitmap Heap Scan on tenk1
                Recheck Cond: (thousand = (q1.q1 + q2.q2))
                ->  Bitmap Index Scan on tenk1_thous_tenthous
+                     Skip scan: All
                      Index Cond: (thousand = (q1.q1 + q2.q2))
    ->  Hash
          ->  Seq Scan on int4_tbl
-(12 rows)
+(13 rows)
 
 --
 -- test ability to generate a suitable plan for a star-schema query
@@ -3116,17 +3129,19 @@ explain (costs off)
 select * from
   tenk1, int8_tbl a, int8_tbl b
 where thousand = a.q1 and tenthous = b.q1 and a.q2 = 1 and b.q2 = 2;
-                             QUERY PLAN                              
----------------------------------------------------------------------
+                         QUERY PLAN                         
+------------------------------------------------------------
  Nested Loop
-   ->  Seq Scan on int8_tbl b
-         Filter: (q2 = 2)
+   Join Filter: (tenk1.thousand = a.q1)
    ->  Nested Loop
-         ->  Seq Scan on int8_tbl a
-               Filter: (q2 = 1)
+         ->  Seq Scan on int8_tbl b
+               Filter: (q2 = 2)
          ->  Index Scan using tenk1_thous_tenthous on tenk1
-               Index Cond: ((thousand = a.q1) AND (tenthous = b.q1))
-(8 rows)
+               Skip scan: All
+               Index Cond: (tenthous = b.q1)
+   ->  Seq Scan on int8_tbl a
+         Filter: (q2 = 1)
+(10 rows)
 
 --
 -- test a corner case in which we shouldn't apply the star-schema optimization
@@ -3154,12 +3169,14 @@ where t1.unique2 < 42 and t1.stringu1 > t2.stringu2;
                      ->  Seq Scan on onerow
                      ->  Seq Scan on onerow onerow_1
                ->  Index Scan using tenk1_unique2 on tenk1 t1
+                     Skip scan: All
                      Index Cond: ((unique2 = (11)) AND (unique2 < 42))
          ->  Index Scan using tenk1_unique1 on tenk1 t2
+               Skip scan: All
                Index Cond: (unique1 = (3))
    ->  Seq Scan on int4_tbl i1
          Filter: (f1 = 0)
-(13 rows)
+(15 rows)
 
 select t1.unique2, t1.stringu1, t2.unique1, t2.stringu2 from
   tenk1 t1
@@ -3220,10 +3237,12 @@ where t1.unique2 < 42 and t1.stringu1 > t2.stringu2;
          ->  Seq Scan on int4_tbl i1
                Filter: (f1 = 0)
          ->  Index Scan using tenk1_unique2 on tenk1 t1
+               Skip scan: All
                Index Cond: ((unique2 = (11)) AND (unique2 < 42))
    ->  Index Scan using tenk1_unique1 on tenk1 t2
+         Skip scan: All
          Index Cond: (unique1 = (3))
-(9 rows)
+(11 rows)
 
 select t1.unique2, t1.stringu1, t2.unique1, t2.stringu2 from
   tenk1 t1
@@ -3280,8 +3299,9 @@ where x = unique1;
                   QUERY PLAN                  
 ----------------------------------------------
  Index Only Scan using tenk1_unique1 on tenk1
+   Skip scan: All
    Index Cond: (unique1 = 1)
-(2 rows)
+(3 rows)
 
 explain (verbose, costs off)
 select unique1, x.*
@@ -3295,32 +3315,36 @@ where x = unique1;
          Output: 1, random()
    ->  Index Only Scan using tenk1_unique1 on public.tenk1
          Output: tenk1.unique1
+         Skip scan: All
          Index Cond: (tenk1.unique1 = (1))
-(7 rows)
+(8 rows)
 
 explain (costs off)
 select unique1 from tenk1, f_immutable_int4(1) x where x = unique1;
                   QUERY PLAN                  
 ----------------------------------------------
  Index Only Scan using tenk1_unique1 on tenk1
+   Skip scan: All
    Index Cond: (unique1 = 1)
-(2 rows)
+(3 rows)
 
 explain (costs off)
 select unique1 from tenk1, lateral f_immutable_int4(1) x where x = unique1;
                   QUERY PLAN                  
 ----------------------------------------------
  Index Only Scan using tenk1_unique1 on tenk1
+   Skip scan: All
    Index Cond: (unique1 = 1)
-(2 rows)
+(3 rows)
 
 explain (costs off)
 select unique1, x from tenk1 join f_immutable_int4(1) x on unique1 = x;
                   QUERY PLAN                  
 ----------------------------------------------
  Index Only Scan using tenk1_unique1 on tenk1
+   Skip scan: All
    Index Cond: (unique1 = 1)
-(2 rows)
+(3 rows)
 
 explain (costs off)
 select unique1, x from tenk1 left join f_immutable_int4(1) x on unique1 = x;
@@ -3329,9 +3353,10 @@ select unique1, x from tenk1 left join f_immutable_int4(1) x on unique1 = x;
  Nested Loop Left Join
    Join Filter: (tenk1.unique1 = 1)
    ->  Index Only Scan using tenk1_unique1 on tenk1
+         Skip scan: All
    ->  Materialize
          ->  Result
-(5 rows)
+(6 rows)
 
 explain (costs off)
 select unique1, x from tenk1 right join f_immutable_int4(1) x on unique1 = x;
@@ -3340,8 +3365,9 @@ select unique1, x from tenk1 right join f_immutable_int4(1) x on unique1 = x;
  Nested Loop Left Join
    ->  Result
    ->  Index Only Scan using tenk1_unique1 on tenk1
+         Skip scan: All
          Index Cond: (unique1 = 1)
-(4 rows)
+(5 rows)
 
 explain (costs off)
 select unique1, x from tenk1 full join f_immutable_int4(1) x on unique1 = x;
@@ -3350,10 +3376,11 @@ select unique1, x from tenk1 full join f_immutable_int4(1) x on unique1 = x;
  Merge Full Join
    Merge Cond: (tenk1.unique1 = (1))
    ->  Index Only Scan using tenk1_unique1 on tenk1
+         Skip scan: All
    ->  Sort
          Sort Key: (1)
          ->  Result
-(6 rows)
+(7 rows)
 
 -- check that pullup of a const function allows further const-folding
 explain (costs off)
@@ -3382,13 +3409,15 @@ where nt3.id = 1 and ss2.b3;
  Nested Loop Left Join
    Filter: ((nt2.b1 OR ((0) = 42)))
    ->  Index Scan using nt3_pkey on nt3
+         Skip scan: All
          Index Cond: (id = 1)
    ->  Nested Loop Left Join
          Join Filter: (0 = nt2.nt1_id)
          ->  Index Scan using nt2_pkey on nt2
+               Skip scan: All
                Index Cond: (id = nt3.nt2_id)
          ->  Result
-(9 rows)
+(11 rows)
 
 drop function f_immutable_int4(int);
 -- test inlining when function returns composite
@@ -3443,18 +3472,22 @@ select * from tenk1 a join tenk1 b on
          Recheck Cond: ((unique1 = 2) OR (hundred = 4))
          ->  BitmapOr
                ->  Bitmap Index Scan on tenk1_unique1
+                     Skip scan: All
                      Index Cond: (unique1 = 2)
                ->  Bitmap Index Scan on tenk1_hundred
+                     Skip scan: All
                      Index Cond: (hundred = 4)
    ->  Materialize
          ->  Bitmap Heap Scan on tenk1 a
                Recheck Cond: ((unique1 = 1) OR (unique2 = 3))
                ->  BitmapOr
                      ->  Bitmap Index Scan on tenk1_unique1
+                           Skip scan: All
                            Index Cond: (unique1 = 1)
                      ->  Bitmap Index Scan on tenk1_unique2
+                           Skip scan: All
                            Index Cond: (unique2 = 3)
-(17 rows)
+(21 rows)
 
 explain (costs off)
 select * from tenk1 a join tenk1 b on
@@ -3470,10 +3503,12 @@ select * from tenk1 a join tenk1 b on
                Recheck Cond: ((unique1 = 1) OR (unique2 = 3))
                ->  BitmapOr
                      ->  Bitmap Index Scan on tenk1_unique1
+                           Skip scan: All
                            Index Cond: (unique1 = 1)
                      ->  Bitmap Index Scan on tenk1_unique2
+                           Skip scan: All
                            Index Cond: (unique2 = 3)
-(12 rows)
+(14 rows)
 
 explain (costs off)
 select * from tenk1 a join tenk1 b on
@@ -3487,20 +3522,25 @@ select * from tenk1 a join tenk1 b on
          Recheck Cond: ((unique1 = 2) OR (hundred = 4))
          ->  BitmapOr
                ->  Bitmap Index Scan on tenk1_unique1
+                     Skip scan: All
                      Index Cond: (unique1 = 2)
                ->  Bitmap Index Scan on tenk1_hundred
+                     Skip scan: All
                      Index Cond: (hundred = 4)
    ->  Materialize
          ->  Bitmap Heap Scan on tenk1 a
                Recheck Cond: ((unique1 = 1) OR (unique2 = 3) OR (unique2 = 7))
                ->  BitmapOr
                      ->  Bitmap Index Scan on tenk1_unique1
+                           Skip scan: All
                            Index Cond: (unique1 = 1)
                      ->  Bitmap Index Scan on tenk1_unique2
+                           Skip scan: All
                            Index Cond: (unique2 = 3)
                      ->  Bitmap Index Scan on tenk1_unique2
+                           Skip scan: All
                            Index Cond: (unique2 = 7)
-(19 rows)
+(24 rows)
 
 --
 -- test placement of movable quals in a parameterized join tree
@@ -3514,16 +3554,19 @@ where t1.unique1 = 1;
 --------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
+         Skip scan: All
          Index Cond: (unique1 = 1)
    ->  Nested Loop
          Join Filter: (t1.ten = t3.ten)
          ->  Bitmap Heap Scan on tenk1 t2
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
+                     Skip scan: All
                      Index Cond: (hundred = t1.hundred)
          ->  Index Scan using tenk1_unique2 on tenk1 t3
+               Skip scan: All
                Index Cond: (unique2 = t2.thousand)
-(11 rows)
+(14 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
@@ -3534,16 +3577,19 @@ where t1.unique1 = 1;
 --------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
+         Skip scan: All
          Index Cond: (unique1 = 1)
    ->  Nested Loop
          Join Filter: ((t1.ten + t2.ten) = t3.ten)
          ->  Bitmap Heap Scan on tenk1 t2
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
+                     Skip scan: All
                      Index Cond: (hundred = t1.hundred)
          ->  Index Scan using tenk1_unique2 on tenk1 t3
+               Skip scan: All
                Index Cond: (unique2 = t2.thousand)
-(11 rows)
+(14 rows)
 
 explain (costs off)
 select count(*) from
@@ -3561,12 +3607,15 @@ select count(*) from
                      ->  Bitmap Heap Scan on tenk1 b
                            Recheck Cond: (thousand = int4_tbl.f1)
                            ->  Bitmap Index Scan on tenk1_thous_tenthous
+                                 Skip scan: All
                                  Index Cond: (thousand = int4_tbl.f1)
                ->  Index Scan using tenk1_unique1 on tenk1 a
+                     Skip scan: All
                      Index Cond: (unique1 = b.unique2)
          ->  Index Only Scan using tenk1_thous_tenthous on tenk1 c
+               Skip scan: All
                Index Cond: (thousand = a.thousand)
-(14 rows)
+(17 rows)
 
 select count(*) from
   tenk1 a join tenk1 b on a.unique1 = b.unique2
@@ -3584,24 +3633,28 @@ select b.unique1 from
   join int4_tbl i1 on b.thousand = f1
   right join int4_tbl i2 on i2.f1 = b.tenthous
   order by 1;
-                                       QUERY PLAN                                        
------------------------------------------------------------------------------------------
+                                   QUERY PLAN                                   
+--------------------------------------------------------------------------------
  Sort
    Sort Key: b.unique1
    ->  Nested Loop Left Join
          ->  Seq Scan on int4_tbl i2
-         ->  Nested Loop Left Join
-               Join Filter: (b.unique1 = 42)
-               ->  Nested Loop
+         ->  Nested Loop
+               Join Filter: (b.thousand = i1.f1)
+               ->  Nested Loop Left Join
+                     Join Filter: (b.unique1 = 42)
                      ->  Nested Loop
-                           ->  Seq Scan on int4_tbl i1
                            ->  Index Scan using tenk1_thous_tenthous on tenk1 b
-                                 Index Cond: ((thousand = i1.f1) AND (tenthous = i2.f1))
-                     ->  Index Scan using tenk1_unique1 on tenk1 a
-                           Index Cond: (unique1 = b.unique2)
-               ->  Index Only Scan using tenk1_thous_tenthous on tenk1 c
-                     Index Cond: (thousand = a.thousand)
-(15 rows)
+                                 Skip scan: All
+                                 Index Cond: (tenthous = i2.f1)
+                           ->  Index Scan using tenk1_unique1 on tenk1 a
+                                 Skip scan: All
+                                 Index Cond: (unique1 = b.unique2)
+                     ->  Index Only Scan using tenk1_thous_tenthous on tenk1 c
+                           Skip scan: All
+                           Index Cond: (thousand = a.thousand)
+               ->  Seq Scan on int4_tbl i1
+(19 rows)
 
 select b.unique1 from
   tenk1 a join tenk1 b on a.unique1 = b.unique2
@@ -3632,8 +3685,9 @@ order by fault;
    Filter: ((COALESCE(tenk1.unique1, '-1'::integer) + int8_tbl.q1) = 122)
    ->  Seq Scan on int8_tbl
    ->  Index Scan using tenk1_unique2 on tenk1
+         Skip scan: All
          Index Cond: (unique2 = int8_tbl.q2)
-(5 rows)
+(6 rows)
 
 select * from
 (
@@ -3687,8 +3741,9 @@ select q1, unique2, thousand, hundred
    Filter: ((COALESCE(b.thousand, 123) = a.q1) AND (a.q1 = COALESCE(b.hundred, 123)))
    ->  Seq Scan on int8_tbl a
    ->  Index Scan using tenk1_unique2 on tenk1 b
+         Skip scan: All
          Index Cond: (unique2 = a.q1)
-(5 rows)
+(6 rows)
 
 select q1, unique2, thousand, hundred
   from int8_tbl a left join tenk1 b on q1 = unique2
@@ -3707,8 +3762,9 @@ select f1, unique2, case when unique2 is null then f1 else 0 end
    Filter: (CASE WHEN (b.unique2 IS NULL) THEN a.f1 ELSE 0 END = 0)
    ->  Seq Scan on int4_tbl a
    ->  Index Only Scan using tenk1_unique2 on tenk1 b
+         Skip scan: All
          Index Cond: (unique2 = a.f1)
-(5 rows)
+(6 rows)
 
 select f1, unique2, case when unique2 is null then f1 else 0 end
   from int4_tbl a left join tenk1 b on f1 = unique2
@@ -3731,14 +3787,17 @@ select a.unique1, b.unique1, c.unique1, coalesce(b.twothousand, a.twothousand)
    ->  Nested Loop Left Join
          Filter: (COALESCE(b.twothousand, a.twothousand) = 44)
          ->  Index Scan using tenk1_unique2 on tenk1 a
+               Skip scan: All
                Index Cond: (unique2 < 10)
          ->  Bitmap Heap Scan on tenk1 b
                Recheck Cond: (thousand = a.unique1)
                ->  Bitmap Index Scan on tenk1_thous_tenthous
+                     Skip scan: All
                      Index Cond: (thousand = a.unique1)
    ->  Index Scan using tenk1_unique2 on tenk1 c
+         Skip scan: All
          Index Cond: ((unique2 = COALESCE(b.twothousand, a.twothousand)) AND (unique2 = 44))
-(11 rows)
+(14 rows)
 
 select a.unique1, b.unique1, c.unique1, coalesce(b.twothousand, a.twothousand)
   from tenk1 a left join tenk1 b on b.thousand = a.unique1                        left join tenk1 c on c.unique2 = coalesce(b.twothousand, a.twothousand)
@@ -3778,8 +3837,9 @@ using (join_key);
                      Output: i1.f1
                ->  Index Only Scan using tenk1_unique2 on public.tenk1 i2
                      Output: i2.unique2
+                     Skip scan: All
                      Index Cond: (i2.unique2 = i1.f1)
-(14 rows)
+(15 rows)
 
 select foo1.join_key as foo1_id, foo3.join_key AS foo3_id, bug_field from
   (values (0),(1)) foo1(join_key)
@@ -4281,8 +4341,9 @@ explain (costs off)
    ->  Seq Scan on int4_tbl a
          Filter: (f1 = 0)
    ->  Index Scan using tenk1_unique2 on tenk1 b
+         Skip scan: All
          Index Cond: (unique2 = 0)
-(6 rows)
+(7 rows)
 
 explain (costs off)
   select * from tenk1 a full join tenk1 b using(unique2) where unique2 = 42;
@@ -4291,10 +4352,12 @@ explain (costs off)
  Merge Full Join
    Merge Cond: (a.unique2 = b.unique2)
    ->  Index Scan using tenk1_unique2 on tenk1 a
+         Skip scan: All
          Index Cond: (unique2 = 42)
    ->  Index Scan using tenk1_unique2 on tenk1 b
+         Skip scan: All
          Index Cond: (unique2 = 42)
-(6 rows)
+(8 rows)
 
 --
 -- test that quals attached to an outer join have correct semantics,
@@ -4424,10 +4487,11 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
    ->  Group
          Group Key: b.id
          ->  Index Scan using b_pkey on b
+               Skip scan: All
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
-(8 rows)
+(9 rows)
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
@@ -4539,8 +4603,9 @@ select p.* from
  Result
    One-Time Filter: false
    ->  Index Scan using parent_pkey on parent p
+         Skip scan: All
          Index Cond: (k = 1)
-(4 rows)
+(5 rows)
 
 select p.* from
   (parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
@@ -4632,11 +4697,12 @@ where ss.stringu2 !~* ss.case1;
    ->  Nested Loop
          ->  Seq Scan on int4_tbl i4
          ->  Index Scan using tenk1_unique2 on tenk1 t1
+               Skip scan: All
                Index Cond: (unique2 = i4.f1)
                Filter: (stringu2 !~* CASE ten WHEN 0 THEN 'doh!'::text ELSE NULL::text END)
    ->  Materialize
          ->  Seq Scan on text_tbl t0
-(9 rows)
+(10 rows)
 
 select t0.*
 from
@@ -4723,8 +4789,9 @@ explain (costs off)
  Nested Loop
    ->  Seq Scan on int4_tbl b
    ->  Index Scan using tenk1_unique1 on tenk1 a
+         Skip scan: All
          Index Cond: (unique1 = b.f1)
-(4 rows)
+(5 rows)
 
 select unique2, x.*
 from int4_tbl x, lateral (select unique2 from tenk1 where f1 = unique1) ss;
@@ -4741,8 +4808,9 @@ explain (costs off)
  Nested Loop
    ->  Seq Scan on int4_tbl x
    ->  Index Scan using tenk1_unique1 on tenk1
+         Skip scan: All
          Index Cond: (unique1 = x.f1)
-(4 rows)
+(5 rows)
 
 explain (costs off)
   select unique2, x.*
@@ -4752,8 +4820,9 @@ explain (costs off)
  Nested Loop
    ->  Seq Scan on int4_tbl x
    ->  Index Scan using tenk1_unique1 on tenk1
+         Skip scan: All
          Index Cond: (unique1 = x.f1)
-(4 rows)
+(5 rows)
 
 select unique2, x.*
 from int4_tbl x left join lateral (select unique1, unique2 from tenk1 where f1 = unique1) ss on true;
@@ -4774,8 +4843,9 @@ explain (costs off)
  Nested Loop Left Join
    ->  Seq Scan on int4_tbl x
    ->  Index Scan using tenk1_unique1 on tenk1
+         Skip scan: All
          Index Cond: (unique1 = x.f1)
-(4 rows)
+(5 rows)
 
 -- check scoping of lateral versus parent references
 -- the first of these should return int8_tbl.q2, the second int8_tbl.q1
@@ -4873,8 +4943,10 @@ explain (costs off)
    ->  Merge Join
          Merge Cond: (a.unique1 = b.unique2)
          ->  Index Only Scan using tenk1_unique1 on tenk1 a
+               Skip scan: All
          ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(5 rows)
+               Skip scan: All
+(7 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1)) ss(x) on b.unique2 = ss.x;
@@ -4894,10 +4966,12 @@ explain (costs off)
          Hash Cond: ("*VALUES*".column1 = b.unique2)
          ->  Nested Loop
                ->  Index Only Scan using tenk1_unique1 on tenk1 a
+                     Skip scan: All
                ->  Values Scan on "*VALUES*"
          ->  Hash
                ->  Index Only Scan using tenk1_unique2 on tenk1 b
-(8 rows)
+                     Skip scan: All
+(10 rows)
 
 select count(*) from tenk1 a,
   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
@@ -5638,8 +5712,9 @@ select * from
                Output: tenk1.unique1
                ->  Index Scan using tenk1_unique2 on public.tenk1
                      Output: tenk1.unique1
+                     Skip scan: All
                      Index Cond: (tenk1.unique2 = "*VALUES*".column2)
-(14 rows)
+(15 rows)
 
 select * from
   (values (0,9998), (1,1000)) v(id,x),
@@ -5867,14 +5942,18 @@ where f.c = 1;
    ->  Nested Loop Left Join
          ->  Nested Loop Left Join
                ->  Index Scan using fkest_c_key on fkest f
+                     Skip scan: All
                      Index Cond: (c = 1)
                ->  Index Only Scan using fkest1_pkey on fkest1 f1
+                     Skip scan: All
                      Index Cond: ((a = f.a) AND (b = f.b))
          ->  Index Only Scan using fkest1_pkey on fkest1 f2
+               Skip scan: All
                Index Cond: ((a = f.a) AND (b = f.b))
    ->  Index Only Scan using fkest1_pkey on fkest1 f3
+         Skip scan: All
          Index Cond: ((a = f.a) AND (b = f.b))
-(11 rows)
+(15 rows)
 
 rollback;
 --
@@ -6164,8 +6243,10 @@ where j1.id1 % 1000 = 1 and j2.id1 % 1000 = 1;
    Merge Cond: (j1.id1 = j2.id1)
    Join Filter: (j1.id2 = j2.id2)
    ->  Index Scan using j1_id1_idx on j1
+         Skip scan: All
    ->  Index Scan using j2_id1_idx on j2
-(5 rows)
+         Skip scan: All
+(7 rows)
 
 select * from j1
 inner join j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
@@ -6201,15 +6282,18 @@ where exists (select 1 from tenk1 t3
                Group Key: t3.thousand, t3.tenthous
                ->  Index Only Scan using tenk1_thous_tenthous on public.tenk1 t3
                      Output: t3.thousand, t3.tenthous
+                     Skip scan: All
          ->  Hash
                Output: t1.unique1
                ->  Index Only Scan using onek_unique1 on public.onek t1
                      Output: t1.unique1
+                     Skip scan: All
                      Index Cond: (t1.unique1 < 1)
    ->  Index Only Scan using tenk1_hundred on public.tenk1 t2
          Output: t2.hundred
+         Skip scan: All
          Index Cond: (t2.hundred = t3.tenthous)
-(18 rows)
+(21 rows)
 
 -- ... unless it actually is unique
 create table j3 as select unique1, tenthous from onek;
@@ -6229,13 +6313,16 @@ where exists (select 1 from j3
          Output: t1.unique1, j3.tenthous
          ->  Index Only Scan using onek_unique1 on public.onek t1
                Output: t1.unique1
+               Skip scan: All
                Index Cond: (t1.unique1 < 1)
          ->  Index Only Scan using j3_unique1_tenthous_idx on public.j3
                Output: j3.unique1, j3.tenthous
+               Skip scan: All
                Index Cond: (j3.unique1 = t1.unique1)
    ->  Index Only Scan using tenk1_hundred on public.tenk1 t2
          Output: t2.hundred
+         Skip scan: All
          Index Cond: (t2.hundred = j3.tenthous)
-(13 rows)
+(16 rows)
 
 drop table j3;
diff --git a/src/test/regress/expected/limit.out b/src/test/regress/expected/limit.out
index c18f547cbd..3e7085b379 100644
--- a/src/test/regress/expected/limit.out
+++ b/src/test/regress/expected/limit.out
@@ -322,7 +322,8 @@ select unique1, unique2, nextval('testseq')
    Output: unique1, unique2, (nextval('testseq'::regclass))
    ->  Index Scan using tenk1_unique2 on public.tenk1
          Output: unique1, unique2, nextval('testseq'::regclass)
-(4 rows)
+         Skip scan: All
+(5 rows)
 
 select unique1, unique2, nextval('testseq')
   from tenk1 order by unique2 limit 10;
@@ -395,7 +396,8 @@ select unique1, unique2, generate_series(1,10)
          Output: unique1, unique2, generate_series(1, 10)
          ->  Index Scan using tenk1_unique2 on public.tenk1
                Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
-(6 rows)
+               Skip scan: All
+(7 rows)
 
 select unique1, unique2, generate_series(1,10)
   from tenk1 order by unique2 limit 7;
@@ -492,7 +494,8 @@ select sum(tenthous) as s1, sum(tenthous) + random()*0 as s2
          Group Key: tenk1.thousand
          ->  Index Only Scan using tenk1_thous_tenthous on public.tenk1
                Output: thousand, tenthous
-(7 rows)
+               Skip scan: All
+(8 rows)
 
 select sum(tenthous) as s1, sum(tenthous) + random()*0 as s2
   from tenk1 group by thousand order by thousand limit 3;
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index d3acb98d04..ee6bcc4b1b 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -232,8 +232,9 @@ WHERE my_int_eq(a.unique2, 42);
    ->  Seq Scan on tenk1 a
          Filter: my_int_eq(unique2, 42)
    ->  Index Scan using tenk1_unique1 on tenk1 b
+         Skip scan: All
          Index Cond: (unique1 = a.unique1)
-(5 rows)
+(6 rows)
 
 -- Also test non-default rowcount estimate
 CREATE FUNCTION my_gen_series(int, int) RETURNS SETOF integer
@@ -258,6 +259,7 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
  Nested Loop
    ->  Function Scan on my_gen_series g
    ->  Index Scan using tenk1_unique1 on tenk1 a
+         Skip scan: All
          Index Cond: (unique1 = g.g)
-(4 rows)
+(5 rows)
 
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index b3fbe47bde..a1a52212a7 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -126,8 +126,9 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1 RIGHT JOIN prt2 t2 ON t1.a = t2.b WHE
                ->  Seq Scan on prt2_p3 t2_3
                      Filter: (a = 0)
                ->  Index Scan using iprt1_p3_a on prt1_p3 t1_3
+                     Skip scan: All
                      Index Cond: (a = t2_3.b)
-(20 rows)
+(21 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1 RIGHT JOIN prt2 t2 ON t1.a = t2.b WHERE t2.a = 0 ORDER BY t1.a, t2.b;
   a  |  c   |  b  |  c   
@@ -366,26 +367,32 @@ SELECT * FROM prt1 t1 LEFT JOIN LATERAL
                      Filter: (b = 0)
                ->  Nested Loop
                      ->  Index Only Scan using iprt1_p1_a on prt1_p1 t2_1
+                           Skip scan: All
                            Index Cond: (a = t1_1.a)
                      ->  Index Scan using iprt2_p1_b on prt2_p1 t3_1
+                           Skip scan: All
                            Index Cond: (b = t2_1.a)
          ->  Nested Loop Left Join
                ->  Seq Scan on prt1_p2 t1_2
                      Filter: (b = 0)
                ->  Nested Loop
                      ->  Index Only Scan using iprt1_p2_a on prt1_p2 t2_2
+                           Skip scan: All
                            Index Cond: (a = t1_2.a)
                      ->  Index Scan using iprt2_p2_b on prt2_p2 t3_2
+                           Skip scan: All
                            Index Cond: (b = t2_2.a)
          ->  Nested Loop Left Join
                ->  Seq Scan on prt1_p3 t1_3
                      Filter: (b = 0)
                ->  Nested Loop
                      ->  Index Only Scan using iprt1_p3_a on prt1_p3 t2_3
+                           Skip scan: All
                            Index Cond: (a = t1_3.a)
                      ->  Index Scan using iprt2_p3_b on prt2_p3 t3_3
+                           Skip scan: All
                            Index Cond: (b = t2_3.a)
-(27 rows)
+(33 rows)
 
 SELECT * FROM prt1 t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t3.a AS t3a, least(t1.a,t2.a,t3.b) FROM prt1 t2 JOIN prt2 t3 ON (t2.a = t3.b)) ss
@@ -609,6 +616,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t
                            ->  Seq Scan on prt1_p1 t1_1
                                  Filter: (b = 0)
                ->  Index Scan using iprt1_e_p1_ab2 on prt1_e_p1 t3_1
+                     Skip scan: All
                      Index Cond: (((a + b) / 2) = t2_1.b)
          ->  Nested Loop
                Join Filter: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
@@ -619,6 +627,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t
                            ->  Seq Scan on prt1_p2 t1_2
                                  Filter: (b = 0)
                ->  Index Scan using iprt1_e_p2_ab2 on prt1_e_p2 t3_2
+                     Skip scan: All
                      Index Cond: (((a + b) / 2) = t2_2.b)
          ->  Nested Loop
                Join Filter: (t1_3.a = ((t3_3.a + t3_3.b) / 2))
@@ -629,8 +638,9 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t
                            ->  Seq Scan on prt1_p3 t1_3
                                  Filter: (b = 0)
                ->  Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t3_3
+                     Skip scan: All
                      Index Cond: (((a + b) / 2) = t2_3.b)
-(33 rows)
+(36 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t3 WHERE t1.a = t2.b AND t1.a = (t3.a + t3.b)/2 AND t1.b = 0 ORDER BY t1.a, t2.b;
   a  |  c   |  b  |  c   | ?column? | c 
@@ -712,6 +722,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
                            ->  Seq Scan on prt1_e_p1 t3_1
                                  Filter: (c = 0)
                ->  Index Scan using iprt2_p1_b on prt2_p1 t2_1
+                     Skip scan: All
                      Index Cond: (b = t1_1.a)
          ->  Nested Loop Left Join
                ->  Hash Right Join
@@ -721,6 +732,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
                            ->  Seq Scan on prt1_e_p2 t3_2
                                  Filter: (c = 0)
                ->  Index Scan using iprt2_p2_b on prt2_p2 t2_2
+                     Skip scan: All
                      Index Cond: (b = t1_2.a)
          ->  Nested Loop Left Join
                ->  Hash Right Join
@@ -730,8 +742,9 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
                            ->  Seq Scan on prt1_e_p3 t3_3
                                  Filter: (c = 0)
                ->  Index Scan using iprt2_p3_b on prt2_p3 t2_3
+                     Skip scan: All
                      Index Cond: (b = t1_3.a)
-(30 rows)
+(33 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
@@ -826,6 +839,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHER
                                  ->  Seq Scan on prt2_p1 t1_5
                                        Filter: (a = 0)
                ->  Index Scan using iprt1_p1_a on prt1_p1 t1_2
+                     Skip scan: All
                      Index Cond: (a = ((t2_1.a + t2_1.b) / 2))
                      Filter: (b = 0)
          ->  Nested Loop
@@ -839,6 +853,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHER
                                  ->  Seq Scan on prt2_p2 t1_6
                                        Filter: (a = 0)
                ->  Index Scan using iprt1_p2_a on prt1_p2 t1_3
+                     Skip scan: All
                      Index Cond: (a = ((t2_2.a + t2_2.b) / 2))
                      Filter: (b = 0)
          ->  Nested Loop
@@ -849,11 +864,13 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHER
                            ->  Seq Scan on prt2_p3 t1_7
                                  Filter: (a = 0)
                            ->  Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t2_3
+                                 Skip scan: All
                                  Index Cond: (((a + b) / 2) = t1_7.b)
                ->  Index Scan using iprt1_p3_a on prt1_p3 t1_4
+                     Skip scan: All
                      Index Cond: (a = ((t2_3.a + t2_3.b) / 2))
                      Filter: (b = 0)
-(41 rows)
+(45 rows)
 
 SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHERE t1.a = 0 AND t1.b = (t2.a + t2.b)/2) AND t1.b = 0 ORDER BY t1.a;
   a  | b |  c   
@@ -881,6 +898,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (
                                  ->  Seq Scan on prt1_e_p1 t1_9
                                        Filter: (c = 0)
                ->  Index Scan using iprt1_p1_a on prt1_p1 t1_3
+                     Skip scan: All
                      Index Cond: (a = t1_6.b)
                      Filter: (b = 0)
          ->  Nested Loop
@@ -893,6 +911,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (
                                  ->  Seq Scan on prt1_e_p2 t1_10
                                        Filter: (c = 0)
                ->  Index Scan using iprt1_p2_a on prt1_p2 t1_4
+                     Skip scan: All
                      Index Cond: (a = t1_7.b)
                      Filter: (b = 0)
          ->  Nested Loop
@@ -905,9 +924,10 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (
                                  ->  Seq Scan on prt1_e_p3 t1_11
                                        Filter: (c = 0)
                ->  Index Scan using iprt1_p3_a on prt1_p3 t1_5
+                     Skip scan: All
                      Index Cond: (a = t1_8.b)
                      Filter: (b = 0)
-(39 rows)
+(42 rows)
 
 SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
   a  | b |  c   
@@ -1933,12 +1953,15 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1 LEFT JOIN prt2 t2 ON (t1.a < t2.b);
          ->  Seq Scan on prt1_p3 t1_3
    ->  Append
          ->  Index Scan using iprt2_p1_b on prt2_p1 t2_1
+               Skip scan: All
                Index Cond: (b > t1.a)
          ->  Index Scan using iprt2_p2_b on prt2_p2 t2_2
+               Skip scan: All
                Index Cond: (b > t1.a)
          ->  Index Scan using iprt2_p3_b on prt2_p3 t2_3
+               Skip scan: All
                Index Cond: (b > t1.a)
-(12 rows)
+(15 rows)
 
 -- equi-join with join condition on partial keys does not qualify for
 -- partitionwise join
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 9c8f80da87..8708a54ada 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2070,24 +2070,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
                      ->  Append (actual rows=N loops=N)
                            ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
-(27 rows)
+(36 rows)
 
 -- Ensure the same partitions are pruned when we make the nested loop
 -- parameter an Expr rather than a plain Param.
@@ -2104,24 +2113,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                            Filter: (a = ANY ('{0,0,1}'::integer[]))
                      ->  Append (actual rows=N loops=N)
                            ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
                            ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
                            ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
                            ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
                            ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
                            ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
                            ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
                            ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
                            ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = (a.a + 0))
-(27 rows)
+(36 rows)
 
 insert into lprt_a values(3),(3);
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 3)');
@@ -2137,24 +2155,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                            Filter: (a = ANY ('{1,0,3}'::integer[]))
                      ->  Append (actual rows=N loops=N)
                            ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
-(27 rows)
+(36 rows)
 
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
                                         explain_parallel_append                                         
@@ -2170,24 +2197,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                            Rows Removed by Filter: N
                      ->  Append (actual rows=N loops=N)
                            ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (actual rows=N loops=N)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
-(28 rows)
+(37 rows)
 
 delete from lprt_a where a = 1;
 select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on ab.a = a.a where a.a in(1, 0, 0)');
@@ -2204,24 +2240,33 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                            Rows Removed by Filter: N
                      ->  Append (actual rows=N loops=N)
                            ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 ab_2 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 ab_3 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b1_a_idx on ab_a2_b1 ab_4 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b2_a_idx on ab_a2_b2 ab_5 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a2_b3_a_idx on ab_a2_b3 ab_6 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b1_a_idx on ab_a3_b1 ab_7 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b2_a_idx on ab_a3_b2 ab_8 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
                            ->  Index Scan using ab_a3_b3_a_idx on ab_a3_b3 ab_9 (never executed)
+                                 Skip scan: All
                                  Index Cond: (a = a.a)
-(28 rows)
+(37 rows)
 
 reset enable_hashjoin;
 reset enable_mergejoin;
@@ -2245,48 +2290,57 @@ select * from ab where a = (select max(a) from lprt_a) and b = (select max(a)-1
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a1_b1_a_idx (never executed)
+               Skip scan: All
                Index Cond: (a = $0)
    ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (never executed)
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a1_b2_a_idx (never executed)
+               Skip scan: All
                Index Cond: (a = $0)
    ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (never executed)
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
+               Skip scan: All
                Index Cond: (a = $0)
    ->  Bitmap Heap Scan on ab_a2_b1 ab_4 (never executed)
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a2_b1_a_idx (never executed)
+               Skip scan: All
                Index Cond: (a = $0)
    ->  Bitmap Heap Scan on ab_a2_b2 ab_5 (never executed)
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a2_b2_a_idx (never executed)
+               Skip scan: All
                Index Cond: (a = $0)
    ->  Bitmap Heap Scan on ab_a2_b3 ab_6 (never executed)
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a2_b3_a_idx (never executed)
+               Skip scan: All
                Index Cond: (a = $0)
    ->  Bitmap Heap Scan on ab_a3_b1 ab_7 (never executed)
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a3_b1_a_idx (never executed)
+               Skip scan: All
                Index Cond: (a = $0)
    ->  Bitmap Heap Scan on ab_a3_b2 ab_8 (actual rows=0 loops=1)
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a3_b2_a_idx (actual rows=0 loops=1)
+               Skip scan: All
                Index Cond: (a = $0)
    ->  Bitmap Heap Scan on ab_a3_b3 ab_9 (never executed)
          Recheck Cond: (a = $0)
          Filter: (b = $1)
          ->  Bitmap Index Scan on ab_a3_b3_a_idx (never executed)
+               Skip scan: All
                Index Cond: (a = $0)
-(52 rows)
+(61 rows)
 
 -- Test run-time partition pruning with UNION ALL parents
 explain (analyze, costs off, summary off, timing off)
@@ -2301,16 +2355,19 @@ select * from (select * from ab where a = 1 union all select * from ab) ab where
                Recheck Cond: (a = 1)
                Filter: (b = $0)
                ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                     Skip scan: All
                      Index Cond: (a = 1)
          ->  Bitmap Heap Scan on ab_a1_b2 ab_12 (never executed)
                Recheck Cond: (a = 1)
                Filter: (b = $0)
                ->  Bitmap Index Scan on ab_a1_b2_a_idx (never executed)
+                     Skip scan: All
                      Index Cond: (a = 1)
          ->  Bitmap Heap Scan on ab_a1_b3 ab_13 (never executed)
                Recheck Cond: (a = 1)
                Filter: (b = $0)
                ->  Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
+                     Skip scan: All
                      Index Cond: (a = 1)
    ->  Seq Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
          Filter: (b = $0)
@@ -2330,7 +2387,7 @@ select * from (select * from ab where a = 1 union all select * from ab) ab where
          Filter: (b = $0)
    ->  Seq Scan on ab_a3_b3 ab_9 (never executed)
          Filter: (b = $0)
-(37 rows)
+(40 rows)
 
 -- A case containing a UNION ALL with a non-partitioned child.
 explain (analyze, costs off, summary off, timing off)
@@ -2345,16 +2402,19 @@ select * from (select * from ab where a = 1 union all (values(10,5)) union all s
                Recheck Cond: (a = 1)
                Filter: (b = $0)
                ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                     Skip scan: All
                      Index Cond: (a = 1)
          ->  Bitmap Heap Scan on ab_a1_b2 ab_12 (never executed)
                Recheck Cond: (a = 1)
                Filter: (b = $0)
                ->  Bitmap Index Scan on ab_a1_b2_a_idx (never executed)
+                     Skip scan: All
                      Index Cond: (a = 1)
          ->  Bitmap Heap Scan on ab_a1_b3 ab_13 (never executed)
                Recheck Cond: (a = 1)
                Filter: (b = $0)
                ->  Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
+                     Skip scan: All
                      Index Cond: (a = 1)
    ->  Result (actual rows=0 loops=1)
          One-Time Filter: (5 = $0)
@@ -2376,7 +2436,7 @@ select * from (select * from ab where a = 1 union all (values(10,5)) union all s
          Filter: (b = $0)
    ->  Seq Scan on ab_a3_b3 ab_9 (never executed)
          Filter: (b = $0)
-(39 rows)
+(42 rows)
 
 -- Another UNION ALL test, but containing a mix of exec init and exec run-time pruning.
 create table xy_1 (x int, y int);
@@ -2446,63 +2506,75 @@ update ab_a1 set b = 3 from ab where ab.a = 1 and ab.a = ab_a1.a;
                ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
                ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
                      Recheck Cond: (a = 1)
                      Heap Blocks: exact=1
                      ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
                ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=0 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
          ->  Materialize (actual rows=0 loops=1)
                ->  Bitmap Heap Scan on ab_a1_b1 ab_a1_1 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
    ->  Nested Loop (actual rows=1 loops=1)
          ->  Append (actual rows=1 loops=1)
                ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
                ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
                      Recheck Cond: (a = 1)
                      Heap Blocks: exact=1
                      ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
                ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
          ->  Materialize (actual rows=1 loops=1)
                ->  Bitmap Heap Scan on ab_a1_b2 ab_a1_2 (actual rows=1 loops=1)
                      Recheck Cond: (a = 1)
                      Heap Blocks: exact=1
                      ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
    ->  Nested Loop (actual rows=0 loops=1)
          ->  Append (actual rows=1 loops=1)
                ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
                ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
                      Recheck Cond: (a = 1)
                      Heap Blocks: exact=1
                      ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
                ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
          ->  Materialize (actual rows=0 loops=1)
                ->  Bitmap Heap Scan on ab_a1_b3 ab_a1_3 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
+                           Skip scan: All
                            Index Cond: (a = 1)
-(65 rows)
+(77 rows)
 
 table ab;
  a | b 
@@ -2593,18 +2665,24 @@ select * from tbl1 join tprt on tbl1.col1 > tprt.col1;
    ->  Seq Scan on tbl1 (actual rows=2 loops=1)
    ->  Append (actual rows=3 loops=2)
          ->  Index Scan using tprt1_idx on tprt_1 (actual rows=2 loops=2)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt2_idx on tprt_2 (actual rows=2 loops=1)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt3_idx on tprt_3 (never executed)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt4_idx on tprt_4 (never executed)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt5_idx on tprt_5 (never executed)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt6_idx on tprt_6 (never executed)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
-(15 rows)
+(21 rows)
 
 explain (analyze, costs off, summary off, timing off)
 select * from tbl1 join tprt on tbl1.col1 = tprt.col1;
@@ -2614,18 +2692,24 @@ select * from tbl1 join tprt on tbl1.col1 = tprt.col1;
    ->  Seq Scan on tbl1 (actual rows=2 loops=1)
    ->  Append (actual rows=1 loops=2)
          ->  Index Scan using tprt1_idx on tprt_1 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt2_idx on tprt_2 (actual rows=1 loops=2)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt3_idx on tprt_3 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt4_idx on tprt_4 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt5_idx on tprt_5 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt6_idx on tprt_6 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
-(15 rows)
+(21 rows)
 
 select tbl1.col1, tprt.col1 from tbl1
 inner join tprt on tbl1.col1 > tprt.col1
@@ -2659,18 +2743,24 @@ select * from tbl1 inner join tprt on tbl1.col1 > tprt.col1;
    ->  Seq Scan on tbl1 (actual rows=5 loops=1)
    ->  Append (actual rows=5 loops=5)
          ->  Index Scan using tprt1_idx on tprt_1 (actual rows=2 loops=5)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt2_idx on tprt_2 (actual rows=3 loops=4)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt3_idx on tprt_3 (actual rows=1 loops=2)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt4_idx on tprt_4 (never executed)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt5_idx on tprt_5 (never executed)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
          ->  Index Scan using tprt6_idx on tprt_6 (never executed)
+               Skip scan: All
                Index Cond: (col1 < tbl1.col1)
-(15 rows)
+(21 rows)
 
 explain (analyze, costs off, summary off, timing off)
 select * from tbl1 inner join tprt on tbl1.col1 = tprt.col1;
@@ -2680,18 +2770,24 @@ select * from tbl1 inner join tprt on tbl1.col1 = tprt.col1;
    ->  Seq Scan on tbl1 (actual rows=5 loops=1)
    ->  Append (actual rows=1 loops=5)
          ->  Index Scan using tprt1_idx on tprt_1 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt2_idx on tprt_2 (actual rows=1 loops=2)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt3_idx on tprt_3 (actual rows=0 loops=3)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt4_idx on tprt_4 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt5_idx on tprt_5 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt6_idx on tprt_6 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
-(15 rows)
+(21 rows)
 
 select tbl1.col1, tprt.col1 from tbl1
 inner join tprt on tbl1.col1 > tprt.col1
@@ -2744,18 +2840,24 @@ select * from tbl1 join tprt on tbl1.col1 < tprt.col1;
    ->  Seq Scan on tbl1 (actual rows=1 loops=1)
    ->  Append (actual rows=1 loops=1)
          ->  Index Scan using tprt1_idx on tprt_1 (never executed)
+               Skip scan: All
                Index Cond: (col1 > tbl1.col1)
          ->  Index Scan using tprt2_idx on tprt_2 (never executed)
+               Skip scan: All
                Index Cond: (col1 > tbl1.col1)
          ->  Index Scan using tprt3_idx on tprt_3 (never executed)
+               Skip scan: All
                Index Cond: (col1 > tbl1.col1)
          ->  Index Scan using tprt4_idx on tprt_4 (never executed)
+               Skip scan: All
                Index Cond: (col1 > tbl1.col1)
          ->  Index Scan using tprt5_idx on tprt_5 (never executed)
+               Skip scan: All
                Index Cond: (col1 > tbl1.col1)
          ->  Index Scan using tprt6_idx on tprt_6 (actual rows=1 loops=1)
+               Skip scan: All
                Index Cond: (col1 > tbl1.col1)
-(15 rows)
+(21 rows)
 
 select tbl1.col1, tprt.col1 from tbl1
 inner join tprt on tbl1.col1 < tprt.col1
@@ -2776,18 +2878,24 @@ select * from tbl1 join tprt on tbl1.col1 = tprt.col1;
    ->  Seq Scan on tbl1 (actual rows=1 loops=1)
    ->  Append (actual rows=0 loops=1)
          ->  Index Scan using tprt1_idx on tprt_1 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt2_idx on tprt_2 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt3_idx on tprt_3 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt4_idx on tprt_4 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt5_idx on tprt_5 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
          ->  Index Scan using tprt6_idx on tprt_6 (never executed)
+               Skip scan: All
                Index Cond: (col1 = tbl1.col1)
-(15 rows)
+(21 rows)
 
 select tbl1.col1, tprt.col1 from tbl1
 inner join tprt on tbl1.col1 = tprt.col1
@@ -3115,12 +3223,14 @@ explain (analyze, costs off, summary off, timing off) execute mt_q1(15);
    Sort Key: ma_test.b
    Subplans Removed: 1
    ->  Index Scan using ma_test_p2_b_idx on ma_test_p2 ma_test_1 (actual rows=1 loops=1)
+         Skip scan: All
          Filter: ((a >= $1) AND ((a % 10) = 5))
          Rows Removed by Filter: 9
    ->  Index Scan using ma_test_p3_b_idx on ma_test_p3 ma_test_2 (actual rows=1 loops=1)
+         Skip scan: All
          Filter: ((a >= $1) AND ((a % 10) = 5))
          Rows Removed by Filter: 9
-(9 rows)
+(11 rows)
 
 execute mt_q1(15);
  a  
@@ -3136,9 +3246,10 @@ explain (analyze, costs off, summary off, timing off) execute mt_q1(25);
    Sort Key: ma_test.b
    Subplans Removed: 2
    ->  Index Scan using ma_test_p3_b_idx on ma_test_p3 ma_test_1 (actual rows=1 loops=1)
+         Skip scan: All
          Filter: ((a >= $1) AND ((a % 10) = 5))
          Rows Removed by Filter: 9
-(6 rows)
+(7 rows)
 
 execute mt_q1(25);
  a  
@@ -3185,14 +3296,18 @@ explain (analyze, costs off, summary off, timing off) select * from ma_test wher
            InitPlan 1 (returns $0)
              ->  Limit (actual rows=1 loops=1)
                    ->  Index Scan using ma_test_p2_b_idx on ma_test_p2 (actual rows=1 loops=1)
+                         Skip scan: All
                          Index Cond: (b IS NOT NULL)
    ->  Index Scan using ma_test_p1_b_idx on ma_test_p1 ma_test_1 (never executed)
+         Skip scan: All
          Filter: (a >= $1)
    ->  Index Scan using ma_test_p2_b_idx on ma_test_p2 ma_test_2 (actual rows=10 loops=1)
+         Skip scan: All
          Filter: (a >= $1)
    ->  Index Scan using ma_test_p3_b_idx on ma_test_p3 ma_test_3 (actual rows=10 loops=1)
+         Skip scan: All
          Filter: (a >= $1)
-(14 rows)
+(18 rows)
 
 reset enable_seqscan;
 reset enable_sort;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 7d289b8c5e..2e75bad44d 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -290,8 +290,9 @@ explain (costs off) execute test_mode_pp(2);
 ----------------------------------------------------------
  Aggregate
    ->  Index Only Scan using test_mode_a_idx on test_mode
+         Skip scan: All
          Index Cond: (a = 2)
-(3 rows)
+(4 rows)
 
 -- force generic plan
 set plan_cache_mode to force_generic_plan;
@@ -351,7 +352,8 @@ explain (costs off) execute test_mode_pp(2);
 ----------------------------------------------------------
  Aggregate
    ->  Index Only Scan using test_mode_a_idx on test_mode
+         Skip scan: All
          Index Cond: (a = 2)
-(3 rows)
+(4 rows)
 
 drop table test_mode;
diff --git a/src/test/regress/expected/portals.out b/src/test/regress/expected/portals.out
index dc0d2ef7dd..5c18de3aa0 100644
--- a/src/test/regress/expected/portals.out
+++ b/src/test/regress/expected/portals.out
@@ -1253,8 +1253,9 @@ DECLARE c1 CURSOR FOR SELECT stringu1 FROM onek WHERE stringu1 = 'DZAAAA';
                  QUERY PLAN                  
 ---------------------------------------------
  Index Only Scan using onek_stringu1 on onek
+   Skip scan: All
    Index Cond: (stringu1 = 'DZAAAA'::name)
-(2 rows)
+(3 rows)
 
 DECLARE c1 CURSOR FOR SELECT stringu1 FROM onek WHERE stringu1 = 'DZAAAA';
 FETCH FROM c1;
diff --git a/src/test/regress/expected/privileges.out b/src/test/regress/expected/privileges.out
index c2d037b614..00a113bcd9 100644
--- a/src/test/regress/expected/privileges.out
+++ b/src/test/regress/expected/privileges.out
@@ -212,9 +212,10 @@ EXPLAIN (COSTS OFF) SELECT * FROM atest12v x, atest12v y WHERE x.a = y.b;
    ->  Seq Scan on atest12 atest12_1
          Filter: (b <<< 5)
    ->  Index Scan using atest12_a_idx on atest12
+         Skip scan: All
          Index Cond: (a = atest12_1.b)
          Filter: (b <<< 5)
-(6 rows)
+(7 rows)
 
 -- And this one.
 EXPLAIN (COSTS OFF) SELECT * FROM atest12 x, atest12 y
@@ -225,8 +226,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM atest12 x, atest12 y
    ->  Seq Scan on atest12 y
          Filter: (abs(a) <<< 5)
    ->  Index Scan using atest12_a_idx on atest12 x
+         Skip scan: All
          Index Cond: (a = y.b)
-(5 rows)
+(6 rows)
 
 -- This should also be a nestloop, but the security barrier forces the inner
 -- scan to be materialized
@@ -261,9 +263,10 @@ EXPLAIN (COSTS OFF) SELECT * FROM atest12v x, atest12v y WHERE x.a = y.b;
    ->  Seq Scan on atest12 atest12_1
          Filter: (b <<< 5)
    ->  Index Scan using atest12_a_idx on atest12
+         Skip scan: All
          Index Cond: (a = atest12_1.b)
          Filter: (b <<< 5)
-(6 rows)
+(7 rows)
 
 EXPLAIN (COSTS OFF) SELECT * FROM atest12sbv x, atest12sbv y WHERE x.a = y.b;
                 QUERY PLAN                 
@@ -286,9 +289,10 @@ EXPLAIN (COSTS OFF) SELECT * FROM atest12v x, atest12v y
    ->  Seq Scan on atest12 atest12_1
          Filter: ((b <<< 5) AND (abs(a) <<< 5))
    ->  Index Scan using atest12_a_idx on atest12
+         Skip scan: All
          Index Cond: (a = atest12_1.b)
          Filter: (b <<< 5)
-(6 rows)
+(7 rows)
 
 -- But a security barrier view isolates the leaky operator.
 EXPLAIN (COSTS OFF) SELECT * FROM atest12sbv x, atest12sbv y
@@ -317,9 +321,10 @@ EXPLAIN (COSTS OFF) SELECT * FROM atest12v x, atest12v y WHERE x.a = y.b;
    ->  Seq Scan on atest12 atest12_1
          Filter: (b <<< 5)
    ->  Index Scan using atest12_a_idx on atest12
+         Skip scan: All
          Index Cond: (a = atest12_1.b)
          Filter: (b <<< 5)
-(6 rows)
+(7 rows)
 
 -- But not for this, due to lack of table-wide permissions needed
 -- to make use of the expression index's statistics.
diff --git a/src/test/regress/expected/regex.out b/src/test/regress/expected/regex.out
index 0923ad9b5b..1cd0fc95fa 100644
--- a/src/test/regress/expected/regex.out
+++ b/src/test/regress/expected/regex.out
@@ -299,49 +299,55 @@ explain (costs off) select * from pg_proc where proname ~ '^abc';
                               QUERY PLAN                              
 ----------------------------------------------------------------------
  Index Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: ((proname >= 'abc'::text) AND (proname < 'abd'::text))
    Filter: (proname ~ '^abc'::text)
-(3 rows)
+(4 rows)
 
 explain (costs off) select * from pg_proc where proname ~ '^abc$';
                          QUERY PLAN                         
 ------------------------------------------------------------
  Index Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: (proname = 'abc'::text)
    Filter: (proname ~ '^abc$'::text)
-(3 rows)
+(4 rows)
 
 explain (costs off) select * from pg_proc where proname ~ '^abcd*e';
                               QUERY PLAN                              
 ----------------------------------------------------------------------
  Index Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: ((proname >= 'abc'::text) AND (proname < 'abd'::text))
    Filter: (proname ~ '^abcd*e'::text)
-(3 rows)
+(4 rows)
 
 explain (costs off) select * from pg_proc where proname ~ '^abc+d';
                               QUERY PLAN                              
 ----------------------------------------------------------------------
  Index Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: ((proname >= 'abc'::text) AND (proname < 'abd'::text))
    Filter: (proname ~ '^abc+d'::text)
-(3 rows)
+(4 rows)
 
 explain (costs off) select * from pg_proc where proname ~ '^(abc)(def)';
                                  QUERY PLAN                                 
 ----------------------------------------------------------------------------
  Index Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: ((proname >= 'abcdef'::text) AND (proname < 'abcdeg'::text))
    Filter: (proname ~ '^(abc)(def)'::text)
-(3 rows)
+(4 rows)
 
 explain (costs off) select * from pg_proc where proname ~ '^(abc)$';
                          QUERY PLAN                         
 ------------------------------------------------------------
  Index Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: (proname = 'abc'::text)
    Filter: (proname ~ '^(abc)$'::text)
-(3 rows)
+(4 rows)
 
 explain (costs off) select * from pg_proc where proname ~ '^(abc)?d';
                QUERY PLAN               
@@ -354,9 +360,10 @@ explain (costs off) select * from pg_proc where proname ~ '^abcd(x|(?=\w\w)q)';
                                QUERY PLAN                               
 ------------------------------------------------------------------------
  Index Scan using pg_proc_proname_args_nsp_index on pg_proc
+   Skip scan: All
    Index Cond: ((proname >= 'abcd'::text) AND (proname < 'abce'::text))
    Filter: (proname ~ '^abcd(x|(?=\w\w)q)'::text)
-(3 rows)
+(4 rows)
 
 -- Test for infinite loop in pullback() (CVE-2007-4772)
 select 'a' ~ '($|^)*';
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 9506aaef82..17d1c916cb 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -271,8 +271,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM document WHERE f_leak(dtitle);
    Filter: ((dlevel <= $0) AND f_leak(dtitle))
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
-(5 rows)
+(6 rows)
 
 EXPLAIN (COSTS OFF) SELECT * FROM document NATURAL JOIN category WHERE f_leak(dtitle);
                         QUERY PLAN                         
@@ -281,12 +282,13 @@ EXPLAIN (COSTS OFF) SELECT * FROM document NATURAL JOIN category WHERE f_leak(dt
    Hash Cond: (category.cid = document.cid)
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
    ->  Seq Scan on category
    ->  Hash
          ->  Seq Scan on document
                Filter: ((dlevel <= $0) AND f_leak(dtitle))
-(9 rows)
+(10 rows)
 
 -- viewpoint from regress_rls_dave
 SET SESSION AUTHORIZATION regress_rls_dave;
@@ -335,8 +337,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM document WHERE f_leak(dtitle);
    Filter: ((cid <> 44) AND (cid <> 44) AND (cid < 50) AND (dlevel <= $0) AND f_leak(dtitle))
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
-(5 rows)
+(6 rows)
 
 EXPLAIN (COSTS OFF) SELECT * FROM document NATURAL JOIN category WHERE f_leak(dtitle);
                                                 QUERY PLAN                                                
@@ -345,12 +348,13 @@ EXPLAIN (COSTS OFF) SELECT * FROM document NATURAL JOIN category WHERE f_leak(dt
    Hash Cond: (category.cid = document.cid)
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
    ->  Seq Scan on category
    ->  Hash
          ->  Seq Scan on document
                Filter: ((cid <> 44) AND (cid <> 44) AND (cid < 50) AND (dlevel <= $0) AND f_leak(dtitle))
-(9 rows)
+(10 rows)
 
 -- 44 would technically fail for both p2r and p1r, but we should get an error
 -- back from p1r for this because it sorts first
@@ -436,8 +440,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM document NATURAL JOIN category WHERE f_leak(dt
    ->  Seq Scan on document
          Filter: ((dauthor = CURRENT_USER) AND f_leak(dtitle))
    ->  Index Scan using category_pkey on category
+         Skip scan: All
          Index Cond: (cid = document.cid)
-(5 rows)
+(6 rows)
 
 -- interaction of FK/PK constraints
 SET SESSION AUTHORIZATION regress_rls_alice;
@@ -990,6 +995,7 @@ EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
  Append
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
    ->  Seq Scan on part_document_fiction part_document_1
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
@@ -997,7 +1003,7 @@ EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
    ->  Seq Scan on part_document_nonfiction part_document_3
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
-(10 rows)
+(11 rows)
 
 -- viewpoint from regress_rls_carol
 SET SESSION AUTHORIZATION regress_rls_carol;
@@ -1032,6 +1038,7 @@ EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
  Append
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
    ->  Seq Scan on part_document_fiction part_document_1
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
@@ -1039,7 +1046,7 @@ EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
    ->  Seq Scan on part_document_nonfiction part_document_3
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
-(10 rows)
+(11 rows)
 
 -- viewpoint from regress_rls_dave
 SET SESSION AUTHORIZATION regress_rls_dave;
@@ -1063,8 +1070,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
    Filter: ((cid < 55) AND (dlevel <= $0) AND f_leak(dtitle))
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
-(5 rows)
+(6 rows)
 
 -- pp1 ERROR
 INSERT INTO part_document VALUES (100, 11, 5, 'regress_rls_dave', 'testing pp1'); -- fail
@@ -1141,8 +1149,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
    Filter: ((cid < 55) AND (dlevel <= $0) AND f_leak(dtitle))
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
-(5 rows)
+(6 rows)
 
 -- viewpoint from regress_rls_carol
 SET SESSION AUTHORIZATION regress_rls_carol;
@@ -1179,6 +1188,7 @@ EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
  Append
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
+           Skip scan: All
            Index Cond: (pguser = CURRENT_USER)
    ->  Seq Scan on part_document_fiction part_document_1
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
@@ -1186,7 +1196,7 @@ EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
    ->  Seq Scan on part_document_nonfiction part_document_3
          Filter: ((dlevel <= $0) AND f_leak(dtitle))
-(10 rows)
+(11 rows)
 
 -- only owner can change policies
 ALTER POLICY pp1 ON part_document USING (true);    --fail
diff --git a/src/test/regress/expected/rowtypes.out b/src/test/regress/expected/rowtypes.out
index 2a273f8404..4d58053e4f 100644
--- a/src/test/regress/expected/rowtypes.out
+++ b/src/test/regress/expected/rowtypes.out
@@ -259,8 +259,9 @@ order by thousand, tenthous;
                         QUERY PLAN                         
 -----------------------------------------------------------
  Index Only Scan using tenk1_thous_tenthous on tenk1
+   Skip scan: All
    Index Cond: (ROW(thousand, tenthous) >= ROW(997, 5000))
-(2 rows)
+(3 rows)
 
 select thousand, tenthous from tenk1
 where (thousand, tenthous) >= (997, 5000)
@@ -305,8 +306,9 @@ order by thousand, tenthous;
    ->  Bitmap Heap Scan on tenk1
          Filter: (ROW(thousand, tenthous, four) > ROW(998, 5000, 3))
          ->  Bitmap Index Scan on tenk1_thous_tenthous
+               Skip scan: All
                Index Cond: (ROW(thousand, tenthous) >= ROW(998, 5000))
-(6 rows)
+(7 rows)
 
 select thousand, tenthous, four from tenk1
 where (thousand, tenthous, four) > (998, 5000, 3)
@@ -337,8 +339,9 @@ order by thousand, tenthous;
                         QUERY PLAN                        
 ----------------------------------------------------------
  Index Only Scan using tenk1_thous_tenthous on tenk1
+   Skip scan: All
    Index Cond: (ROW(thousand, tenthous) > ROW(998, 5000))
-(2 rows)
+(3 rows)
 
 select thousand, tenthous from tenk1
 where (998, 5000) < (thousand, tenthous)
@@ -373,8 +376,9 @@ order by thousand, hundred;
    ->  Bitmap Heap Scan on tenk1
          Filter: (ROW(998, 5000) < ROW(thousand, hundred))
          ->  Bitmap Index Scan on tenk1_thous_tenthous
+               Skip scan: All
                Index Cond: (thousand >= 998)
-(6 rows)
+(7 rows)
 
 select thousand, hundred from tenk1
 where (998, 5000) < (thousand, hundred)
@@ -405,8 +409,9 @@ select a,b from test_table where (a,b) > ('a','a') order by a,b;
                        QUERY PLAN                       
 --------------------------------------------------------
  Index Only Scan using test_table_a_b_idx on test_table
+   Skip scan: All
    Index Cond: (ROW(a, b) > ROW('a'::text, 'a'::text))
-(2 rows)
+(3 rows)
 
 select a,b from test_table where (a,b) > ('a','a') order by a,b;
  a | b 
@@ -1109,8 +1114,9 @@ select row_to_json(q) from
 -------------------------------------------------------------
  Subquery Scan on q
    ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+         Skip scan: All
          Index Cond: ((thousand = 42) AND (tenthous < 2000))
-(3 rows)
+(4 rows)
 
 select row_to_json(q) from
   (select thousand, tenthous from tenk1
diff --git a/src/test/regress/expected/select.out b/src/test/regress/expected/select.out
index c441049f41..dd7c4117ed 100644
--- a/src/test/regress/expected/select.out
+++ b/src/test/regress/expected/select.out
@@ -742,9 +742,10 @@ select * from onek2 where unique2 = 11 and stringu1 = 'ATAAAA';
                QUERY PLAN                
 -----------------------------------------
  Index Scan using onek2_u2_prtl on onek2
+   Skip scan: All
    Index Cond: (unique2 = 11)
    Filter: (stringu1 = 'ATAAAA'::name)
-(3 rows)
+(4 rows)
 
 select * from onek2 where unique2 = 11 and stringu1 = 'ATAAAA';
  unique1 | unique2 | two | four | ten | twenty | hundred | thousand | twothousand | fivethous | tenthous | odd | even | stringu1 | stringu2 | string4 
@@ -758,18 +759,20 @@ select * from onek2 where unique2 = 11 and stringu1 = 'ATAAAA';
                            QUERY PLAN                            
 -----------------------------------------------------------------
  Index Scan using onek2_u2_prtl on onek2 (actual rows=1 loops=1)
+   Skip scan: All
    Index Cond: (unique2 = 11)
    Filter: (stringu1 = 'ATAAAA'::name)
-(3 rows)
+(4 rows)
 
 explain (costs off)
 select unique2 from onek2 where unique2 = 11 and stringu1 = 'ATAAAA';
                QUERY PLAN                
 -----------------------------------------
  Index Scan using onek2_u2_prtl on onek2
+   Skip scan: All
    Index Cond: (unique2 = 11)
    Filter: (stringu1 = 'ATAAAA'::name)
-(3 rows)
+(4 rows)
 
 select unique2 from onek2 where unique2 = 11 and stringu1 = 'ATAAAA';
  unique2 
@@ -783,8 +786,9 @@ select * from onek2 where unique2 = 11 and stringu1 < 'B';
                QUERY PLAN                
 -----------------------------------------
  Index Scan using onek2_u2_prtl on onek2
+   Skip scan: All
    Index Cond: (unique2 = 11)
-(2 rows)
+(3 rows)
 
 select * from onek2 where unique2 = 11 and stringu1 < 'B';
  unique1 | unique2 | two | four | ten | twenty | hundred | thousand | twothousand | fivethous | tenthous | odd | even | stringu1 | stringu2 | string4 
@@ -797,8 +801,9 @@ select unique2 from onek2 where unique2 = 11 and stringu1 < 'B';
                   QUERY PLAN                  
 ----------------------------------------------
  Index Only Scan using onek2_u2_prtl on onek2
+   Skip scan: All
    Index Cond: (unique2 = 11)
-(2 rows)
+(3 rows)
 
 select unique2 from onek2 where unique2 = 11 and stringu1 < 'B';
  unique2 
@@ -813,9 +818,10 @@ select unique2 from onek2 where unique2 = 11 and stringu1 < 'B' for update;
 -----------------------------------------------
  LockRows
    ->  Index Scan using onek2_u2_prtl on onek2
+         Skip scan: All
          Index Cond: (unique2 = 11)
          Filter: (stringu1 < 'B'::name)
-(4 rows)
+(5 rows)
 
 select unique2 from onek2 where unique2 = 11 and stringu1 < 'B' for update;
  unique2 
@@ -847,8 +853,9 @@ select unique2 from onek2 where unique2 = 11 and stringu1 < 'B';
  Bitmap Heap Scan on onek2
    Recheck Cond: ((unique2 = 11) AND (stringu1 < 'B'::name))
    ->  Bitmap Index Scan on onek2_u2_prtl
+         Skip scan: All
          Index Cond: (unique2 = 11)
-(4 rows)
+(5 rows)
 
 select unique2 from onek2 where unique2 = 11 and stringu1 < 'B';
  unique2 
@@ -868,10 +875,12 @@ select unique1, unique2 from onek2
    Filter: (stringu1 < 'B'::name)
    ->  BitmapOr
          ->  Bitmap Index Scan on onek2_u2_prtl
+               Skip scan: All
                Index Cond: (unique2 = 11)
          ->  Bitmap Index Scan on onek2_u1_prtl
+               Skip scan: All
                Index Cond: (unique1 = 0)
-(8 rows)
+(10 rows)
 
 select unique1, unique2 from onek2
   where (unique2 = 11 or unique1 = 0) and stringu1 < 'B';
@@ -890,10 +899,12 @@ select unique1, unique2 from onek2
    Recheck Cond: (((unique2 = 11) AND (stringu1 < 'B'::name)) OR (unique1 = 0))
    ->  BitmapOr
          ->  Bitmap Index Scan on onek2_u2_prtl
+               Skip scan: All
                Index Cond: (unique2 = 11)
          ->  Bitmap Index Scan on onek2_u1_prtl
+               Skip scan: All
                Index Cond: (unique1 = 0)
-(7 rows)
+(9 rows)
 
 select unique1, unique2 from onek2
   where (unique2 = 11 and stringu1 < 'B') or unique1 = 0;
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index e21afa7990..076c3d571d 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -395,18 +395,18 @@ SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
 
 EXPLAIN (COSTS OFF)
 SELECT DISTINCT a FROM distinct_a WHERE b = 2;
-                     QUERY PLAN                     
-----------------------------------------------------
- Index Only Scan using distinct_a_b_a on distinct_a
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Index Only Scan using distinct_a_a_b_idx on distinct_a
    Skip scan: Distinct only
    Index Cond: (b = 2)
 (3 rows)
 
 EXPLAIN (COSTS OFF)
 SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
-                     QUERY PLAN                     
-----------------------------------------------------
- Index Only Scan using distinct_a_b_a on distinct_a
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Index Only Scan using distinct_a_a_b_idx on distinct_a
    Skip scan: Distinct only
    Index Cond: (b = 2)
 (3 rows)
@@ -633,8 +633,9 @@ FROM distinct_a WHERE a = 1 ORDER BY a;
    ->  Bitmap Heap Scan on distinct_a
          Recheck Cond: (a = 1)
          ->  Bitmap Index Scan on distinct_a_a_b_idx
+               Skip scan: All
                Index Cond: (a = 1)
-(5 rows)
+(6 rows)
 
 -- check colums order
 SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 96dfb7c8dd..5e6377f258 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -523,8 +523,9 @@ explain (costs off)
                ->  Parallel Bitmap Heap Scan on tenk1
                      Recheck Cond: (hundred > 1)
                      ->  Bitmap Index Scan on tenk1_hundred
+                           Skip scan: All
                            Index Cond: (hundred > 1)
-(10 rows)
+(11 rows)
 
 select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
  count 
@@ -621,7 +622,8 @@ explain (costs off)
                      Merge Cond: (tenk1.unique1 = tenk2.unique1)
                      ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
                      ->  Index Only Scan using tenk2_unique1 on tenk2
-(8 rows)
+                           Skip scan: All
+(9 rows)
 
 select  count(*) from tenk1, tenk2 where tenk1.unique1 = tenk2.unique1;
  count 
@@ -949,8 +951,9 @@ explain (costs off)
    Workers Planned: 1
    Single Copy: true
    ->  Index Scan using tenk1_unique1 on tenk1
+         Skip scan: All
          Index Cond: (unique1 = 1)
-(5 rows)
+(6 rows)
 
 ROLLBACK TO SAVEPOINT settings;
 -- exercise record typmod remapping between backends
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 4c6cd5f146..7334413144 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -895,12 +895,13 @@ where o.ten = 0;
                Filter: (o.ten = 0)
          ->  Index Scan using onek_unique1 on public.onek i
                Output: (hashed SubPlan 1), random()
+               Skip scan: All
                Index Cond: (i.unique1 = o.unique1)
                SubPlan 1
                  ->  Seq Scan on public.int4_tbl
                        Output: int4_tbl.f1
                        Filter: (int4_tbl.f1 <= $0)
-(14 rows)
+(15 rows)
 
 select sum(ss.tst::int) from
   onek o cross join lateral (
@@ -935,11 +936,13 @@ where o.ten = 1;
                      ->  Append
                            ->  Subquery Scan on "*SELECT* 1"
                                  ->  Index Scan using onek_unique1 on onek i1
+                                       Skip scan: All
                                        Index Cond: (unique1 = o.unique1)
                            ->  Subquery Scan on "*SELECT* 2"
                                  ->  Index Scan using onek_unique1 on onek i2
+                                       Skip scan: All
                                        Index Cond: (unique1 = o.unique2)
-(13 rows)
+(15 rows)
 
 select count(*) from
   onek o cross join lateral (
@@ -1095,7 +1098,8 @@ select * from int4_tbl where
    SubPlan 1
      ->  Index Only Scan using tenk1_unique1 on public.tenk1 a
            Output: a.unique1
-(10 rows)
+           Skip scan: All
+(11 rows)
 
 select * from int4_tbl where
   (case when f1 in (select unique1 from tenk1 a) then f1 else null end) in
diff --git a/src/test/regress/expected/tuplesort.out b/src/test/regress/expected/tuplesort.out
index 3fc1998bf2..f47744d2fe 100644
--- a/src/test/regress/expected/tuplesort.out
+++ b/src/test/regress/expected/tuplesort.out
@@ -146,7 +146,8 @@ SELECT id, noabort_increasing, noabort_decreasing FROM abbrev_abort_uuids ORDER
 -----------------------------------------------------------------------------------------
  Limit
    ->  Index Scan using abbrev_abort_uuids__noabort_increasing_idx on abbrev_abort_uuids
-(2 rows)
+         Skip scan: All
+(3 rows)
 
 SELECT id, noabort_increasing, noabort_decreasing FROM abbrev_abort_uuids ORDER BY noabort_increasing LIMIT 5;
   id   |          noabort_increasing          |          noabort_decreasing          
@@ -164,7 +165,8 @@ SELECT id, noabort_increasing, noabort_decreasing FROM abbrev_abort_uuids ORDER
 -----------------------------------------------------------------------------------------
  Limit
    ->  Index Scan using abbrev_abort_uuids__noabort_decreasing_idx on abbrev_abort_uuids
-(2 rows)
+         Skip scan: All
+(3 rows)
 
 SELECT id, noabort_increasing, noabort_decreasing FROM abbrev_abort_uuids ORDER BY noabort_decreasing LIMIT 5;
   id   |          noabort_increasing          |          noabort_decreasing          
@@ -186,7 +188,8 @@ SELECT id, abort_increasing, abort_decreasing FROM abbrev_abort_uuids ORDER BY a
 ---------------------------------------------------------------------------------------
  Limit
    ->  Index Scan using abbrev_abort_uuids__abort_increasing_idx on abbrev_abort_uuids
-(2 rows)
+         Skip scan: All
+(3 rows)
 
 SELECT id, abort_increasing, abort_decreasing FROM abbrev_abort_uuids ORDER BY abort_increasing LIMIT 5;
   id   |           abort_increasing           |           abort_decreasing           
@@ -204,7 +207,8 @@ SELECT id, abort_increasing, abort_decreasing FROM abbrev_abort_uuids ORDER BY a
 ---------------------------------------------------------------------------------------
  Limit
    ->  Index Scan using abbrev_abort_uuids__abort_decreasing_idx on abbrev_abort_uuids
-(2 rows)
+         Skip scan: All
+(3 rows)
 
 SELECT id, abort_increasing, abort_decreasing FROM abbrev_abort_uuids ORDER BY abort_decreasing LIMIT 5;
   id   |           abort_increasing           |           abort_decreasing           
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 6e72e92d80..1739a87d46 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -360,7 +360,8 @@ select count(*) from
                            ->  Seq Scan on tenk1
                      ->  Subquery Scan on "*SELECT* 1"
                            ->  Index Only Scan using tenk1_unique1 on tenk1 tenk1_1
-(8 rows)
+                                 Skip scan: All
+(9 rows)
 
 select count(*) from
   ( select unique1 from tenk1 intersect select fivethous from tenk1 ) ss;
@@ -377,10 +378,12 @@ select unique1 from tenk1 except select unique2 from tenk1 where unique2 != 10;
    ->  Append
          ->  Subquery Scan on "*SELECT* 1"
                ->  Index Only Scan using tenk1_unique1 on tenk1
+                     Skip scan: All
          ->  Subquery Scan on "*SELECT* 2"
                ->  Index Only Scan using tenk1_unique2 on tenk1 tenk1_1
+                     Skip scan: All
                      Filter: (unique2 <> 10)
-(7 rows)
+(9 rows)
 
 select unique1 from tenk1 except select unique2 from tenk1 where unique2 != 10;
  unique1 
@@ -404,7 +407,8 @@ select count(*) from
                                  ->  Seq Scan on tenk1
                            ->  Subquery Scan on "*SELECT* 1"
                                  ->  Index Only Scan using tenk1_unique1 on tenk1 tenk1_1
-(10 rows)
+                                       Skip scan: All
+(11 rows)
 
 select count(*) from
   ( select unique1 from tenk1 intersect select fivethous from tenk1 ) ss;
@@ -423,10 +427,12 @@ select unique1 from tenk1 except select unique2 from tenk1 where unique2 != 10;
          ->  Append
                ->  Subquery Scan on "*SELECT* 1"
                      ->  Index Only Scan using tenk1_unique1 on tenk1
+                           Skip scan: All
                ->  Subquery Scan on "*SELECT* 2"
                      ->  Index Only Scan using tenk1_unique2 on tenk1 tenk1_1
+                           Skip scan: All
                            Filter: (unique2 <> 10)
-(9 rows)
+(11 rows)
 
 select unique1 from tenk1 except select unique2 from tenk1 where unique2 != 10;
  unique1 
@@ -711,10 +717,12 @@ explain (costs off)
 ---------------------------------------------
  Append
    ->  Index Scan using t1_ab_idx on t1
+         Skip scan: All
          Index Cond: ((a || b) = 'ab'::text)
    ->  Index Only Scan using t2_pkey on t2
+         Skip scan: All
          Index Cond: (ab = 'ab'::text)
-(5 rows)
+(7 rows)
 
 explain (costs off)
  SELECT * FROM
@@ -728,10 +736,12 @@ explain (costs off)
    Group Key: ((t1.a || t1.b))
    ->  Append
          ->  Index Scan using t1_ab_idx on t1
+               Skip scan: All
                Index Cond: ((a || b) = 'ab'::text)
          ->  Index Only Scan using t2_pkey on t2
+               Skip scan: All
                Index Cond: (ab = 'ab'::text)
-(7 rows)
+(9 rows)
 
 --
 -- Test that ORDER BY for UNION ALL can be pushed down to inheritance
@@ -757,10 +767,14 @@ explain (costs off)
    ->  Merge Append
          Sort Key: ((t1.a || t1.b))
          ->  Index Scan using t1_ab_idx on t1
+               Skip scan: All
          ->  Index Scan using t1c_ab_idx on t1c t1_1
+               Skip scan: All
          ->  Index Scan using t2_pkey on t2
+               Skip scan: All
          ->  Index Scan using t2c_pkey on t2c t2_1
-(7 rows)
+               Skip scan: All
+(11 rows)
 
   SELECT * FROM
   (SELECT a || b AS ab FROM t1
@@ -797,11 +811,13 @@ select event_id
  Merge Append
    Sort Key: events.event_id
    ->  Index Scan using events_pkey on events
+         Skip scan: All
    ->  Sort
          Sort Key: events_1.event_id
          ->  Seq Scan on events_child events_1
    ->  Index Scan using other_events_pkey on other_events
-(7 rows)
+         Skip scan: All
+(9 rows)
 
 drop table events_child, events, other_events;
 reset enable_indexonlyscan;
@@ -1006,10 +1022,12 @@ select * from
    ->  Seq Scan on int4_tbl
    ->  Append
          ->  Index Scan using t3i on t3 a
+               Skip scan: All
                Index Cond: (expensivefunc(x) = int4_tbl.f1)
          ->  Index Scan using t3i on t3 b
+               Skip scan: All
                Index Cond: (expensivefunc(x) = int4_tbl.f1)
-(7 rows)
+(9 rows)
 
 select * from
   (select * from t3 a union all select * from t3 b) ss
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 5de53f2782..3b0d9d42bf 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -422,16 +422,18 @@ EXPLAIN (costs off) UPDATE rw_view1 SET a=6 WHERE a=5;
 --------------------------------------------------
  Update on base_tbl
    ->  Index Scan using base_tbl_pkey on base_tbl
+         Skip scan: All
          Index Cond: ((a > 0) AND (a = 5))
-(3 rows)
+(4 rows)
 
 EXPLAIN (costs off) DELETE FROM rw_view1 WHERE a=5;
                     QUERY PLAN                    
 --------------------------------------------------
  Delete on base_tbl
    ->  Index Scan using base_tbl_pkey on base_tbl
+         Skip scan: All
          Index Cond: ((a > 0) AND (a = 5))
-(3 rows)
+(4 rows)
 
 DROP TABLE base_tbl CASCADE;
 NOTICE:  drop cascades to view rw_view1
@@ -492,16 +494,18 @@ EXPLAIN (costs off) UPDATE rw_view2 SET aaa=5 WHERE aaa=4;
 --------------------------------------------------------
  Update on base_tbl
    ->  Index Scan using base_tbl_pkey on base_tbl
+         Skip scan: All
          Index Cond: ((a < 10) AND (a > 0) AND (a = 4))
-(3 rows)
+(4 rows)
 
 EXPLAIN (costs off) DELETE FROM rw_view2 WHERE aaa=4;
                        QUERY PLAN                       
 --------------------------------------------------------
  Delete on base_tbl
    ->  Index Scan using base_tbl_pkey on base_tbl
+         Skip scan: All
          Index Cond: ((a < 10) AND (a > 0) AND (a = 4))
-(3 rows)
+(4 rows)
 
 DROP TABLE base_tbl CASCADE;
 NOTICE:  drop cascades to 2 other objects
@@ -685,14 +689,16 @@ EXPLAIN (costs off) UPDATE rw_view2 SET a=3 WHERE a=2;
  Update on base_tbl
    ->  Nested Loop
          ->  Index Scan using base_tbl_pkey on base_tbl
+               Skip scan: All
                Index Cond: (a = 2)
          ->  Subquery Scan on rw_view1
                Filter: ((rw_view1.a < 10) AND (rw_view1.a = 2))
                ->  Bitmap Heap Scan on base_tbl base_tbl_1
                      Recheck Cond: (a > 0)
                      ->  Bitmap Index Scan on base_tbl_pkey
+                           Skip scan: All
                            Index Cond: (a > 0)
-(10 rows)
+(12 rows)
 
 EXPLAIN (costs off) DELETE FROM rw_view2 WHERE a=2;
                            QUERY PLAN                           
@@ -700,14 +706,16 @@ EXPLAIN (costs off) DELETE FROM rw_view2 WHERE a=2;
  Delete on base_tbl
    ->  Nested Loop
          ->  Index Scan using base_tbl_pkey on base_tbl
+               Skip scan: All
                Index Cond: (a = 2)
          ->  Subquery Scan on rw_view1
                Filter: ((rw_view1.a < 10) AND (rw_view1.a = 2))
                ->  Bitmap Heap Scan on base_tbl base_tbl_1
                      Recheck Cond: (a > 0)
                      ->  Bitmap Index Scan on base_tbl_pkey
+                           Skip scan: All
                            Index Cond: (a > 0)
-(10 rows)
+(12 rows)
 
 DROP TABLE base_tbl CASCADE;
 NOTICE:  drop cascades to 2 other objects
@@ -919,8 +927,9 @@ EXPLAIN (costs off) UPDATE rw_view2 SET a=3 WHERE a=2;
          ->  Bitmap Heap Scan on base_tbl
                Recheck Cond: (a > 0)
                ->  Bitmap Index Scan on base_tbl_pkey
+                     Skip scan: All
                      Index Cond: (a > 0)
-(7 rows)
+(8 rows)
 
 EXPLAIN (costs off) DELETE FROM rw_view2 WHERE a=2;
                         QUERY PLAN                        
@@ -931,8 +940,9 @@ EXPLAIN (costs off) DELETE FROM rw_view2 WHERE a=2;
          ->  Bitmap Heap Scan on base_tbl
                Recheck Cond: (a > 0)
                ->  Bitmap Index Scan on base_tbl_pkey
+                     Skip scan: All
                      Index Cond: (a > 0)
-(7 rows)
+(8 rows)
 
 DROP TABLE base_tbl CASCADE;
 NOTICE:  drop cascades to 2 other objects
@@ -969,8 +979,9 @@ UPDATE rw_view1 v SET bb='Updated row 2' WHERE rw_view1_aa(v)=2
 --------------------------------------------------
  Update on base_tbl
    ->  Index Scan using base_tbl_pkey on base_tbl
+         Skip scan: All
          Index Cond: (a = 2)
-(3 rows)
+(4 rows)
 
 DROP TABLE base_tbl CASCADE;
 NOTICE:  drop cascades to 2 other objects
@@ -1868,10 +1879,11 @@ EXPLAIN (costs off) INSERT INTO rw_view1 VALUES (5);
    ->  Result
    SubPlan 1
      ->  Index Only Scan using ref_tbl_pkey on ref_tbl r
+           Skip scan: All
            Index Cond: (a = b.a)
    SubPlan 2
      ->  Seq Scan on ref_tbl r_1
-(7 rows)
+(8 rows)
 
 EXPLAIN (costs off) UPDATE rw_view1 SET a = a + 5;
                         QUERY PLAN                         
@@ -1884,10 +1896,11 @@ EXPLAIN (costs off) UPDATE rw_view1 SET a = a + 5;
                ->  Seq Scan on ref_tbl r
    SubPlan 1
      ->  Index Only Scan using ref_tbl_pkey on ref_tbl r_1
+           Skip scan: All
            Index Cond: (a = b.a)
    SubPlan 2
      ->  Seq Scan on ref_tbl r_2
-(11 rows)
+(12 rows)
 
 DROP TABLE base_tbl, ref_tbl CASCADE;
 NOTICE:  drop cascades to view rw_view1
@@ -2219,11 +2232,13 @@ EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
  Update on base_tbl base_tbl_1
    ->  Nested Loop
          ->  Index Scan using base_tbl_pkey on base_tbl base_tbl_1
+               Skip scan: All
                Index Cond: (id = 1)
          ->  Index Scan using base_tbl_pkey on base_tbl
+               Skip scan: All
                Index Cond: (id = 1)
                Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+(9 rows)
 
 DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
 NOTICE:  snooped value: Row 1
@@ -2233,6 +2248,7 @@ EXPLAIN (costs off) INSERT INTO rw_view1 VALUES (2, 'New row 2');
  Insert on base_tbl
    InitPlan 1 (returns $0)
      ->  Index Only Scan using base_tbl_pkey on base_tbl t
+           Skip scan: All
            Index Cond: (id = 2)
    ->  Result
          One-Time Filter: ($0 IS NOT TRUE)
@@ -2240,12 +2256,14 @@ EXPLAIN (costs off) INSERT INTO rw_view1 VALUES (2, 'New row 2');
  Update on base_tbl
    InitPlan 1 (returns $0)
      ->  Index Only Scan using base_tbl_pkey on base_tbl t
+           Skip scan: All
            Index Cond: (id = 2)
    ->  Result
          One-Time Filter: $0
          ->  Index Scan using base_tbl_pkey on base_tbl
+               Skip scan: All
                Index Cond: (id = 2)
-(15 rows)
+(18 rows)
 
 INSERT INTO rw_view1 VALUES (2, 'New row 2');
 SELECT * FROM base_tbl;
@@ -2310,6 +2328,7 @@ UPDATE v1 SET a=100 WHERE snoop(a) AND leakproof(a) AND a < 7 AND a != 6;
    Update on public.t111 t1_3
    ->  Index Scan using t1_a_idx on public.t1
          Output: 100, t1.b, t1.c, t1.ctid
+         Skip scan: All
          Index Cond: ((t1.a > 5) AND (t1.a < 7))
          Filter: ((t1.a <> 6) AND (alternatives: SubPlan 1 or hashed SubPlan 2) AND snoop(t1.a) AND leakproof(t1.a))
          SubPlan 1
@@ -2326,17 +2345,20 @@ UPDATE v1 SET a=100 WHERE snoop(a) AND leakproof(a) AND a < 7 AND a != 6;
                        Output: t12_5.a
    ->  Index Scan using t11_a_idx on public.t11 t1_1
          Output: 100, t1_1.b, t1_1.c, t1_1.d, t1_1.ctid
+         Skip scan: All
          Index Cond: ((t1_1.a > 5) AND (t1_1.a < 7))
          Filter: ((t1_1.a <> 6) AND (alternatives: SubPlan 1 or hashed SubPlan 2) AND snoop(t1_1.a) AND leakproof(t1_1.a))
    ->  Index Scan using t12_a_idx on public.t12 t1_2
          Output: 100, t1_2.b, t1_2.c, t1_2.e, t1_2.ctid
+         Skip scan: All
          Index Cond: ((t1_2.a > 5) AND (t1_2.a < 7))
          Filter: ((t1_2.a <> 6) AND (alternatives: SubPlan 1 or hashed SubPlan 2) AND snoop(t1_2.a) AND leakproof(t1_2.a))
    ->  Index Scan using t111_a_idx on public.t111 t1_3
          Output: 100, t1_3.b, t1_3.c, t1_3.d, t1_3.e, t1_3.ctid
+         Skip scan: All
          Index Cond: ((t1_3.a > 5) AND (t1_3.a < 7))
          Filter: ((t1_3.a <> 6) AND (alternatives: SubPlan 1 or hashed SubPlan 2) AND snoop(t1_3.a) AND leakproof(t1_3.a))
-(33 rows)
+(37 rows)
 
 UPDATE v1 SET a=100 WHERE snoop(a) AND leakproof(a) AND a < 7 AND a != 6;
 SELECT * FROM v1 WHERE a=100; -- Nothing should have been changed to 100
@@ -2360,6 +2382,7 @@ UPDATE v1 SET a=a+1 WHERE snoop(a) AND leakproof(a) AND a = 8;
    Update on public.t111 t1_3
    ->  Index Scan using t1_a_idx on public.t1
          Output: (t1.a + 1), t1.b, t1.c, t1.ctid
+         Skip scan: All
          Index Cond: ((t1.a > 5) AND (t1.a = 8))
          Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) AND snoop(t1.a) AND leakproof(t1.a))
          SubPlan 1
@@ -2376,17 +2399,20 @@ UPDATE v1 SET a=a+1 WHERE snoop(a) AND leakproof(a) AND a = 8;
                        Output: t12_5.a
    ->  Index Scan using t11_a_idx on public.t11 t1_1
          Output: (t1_1.a + 1), t1_1.b, t1_1.c, t1_1.d, t1_1.ctid
+         Skip scan: All
          Index Cond: ((t1_1.a > 5) AND (t1_1.a = 8))
          Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) AND snoop(t1_1.a) AND leakproof(t1_1.a))
    ->  Index Scan using t12_a_idx on public.t12 t1_2
          Output: (t1_2.a + 1), t1_2.b, t1_2.c, t1_2.e, t1_2.ctid
+         Skip scan: All
          Index Cond: ((t1_2.a > 5) AND (t1_2.a = 8))
          Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) AND snoop(t1_2.a) AND leakproof(t1_2.a))
    ->  Index Scan using t111_a_idx on public.t111 t1_3
          Output: (t1_3.a + 1), t1_3.b, t1_3.c, t1_3.d, t1_3.e, t1_3.ctid
+         Skip scan: All
          Index Cond: ((t1_3.a > 5) AND (t1_3.a = 8))
          Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) AND snoop(t1_3.a) AND leakproof(t1_3.a))
-(33 rows)
+(37 rows)
 
 UPDATE v1 SET a=a+1 WHERE snoop(a) AND leakproof(a) AND a = 8;
 NOTICE:  snooped value: 8
-- 
2.25.0

#56

Thomas Munro

thomas.munro@gmail.com

almost 6 years ago

In reply to: Floris Van Nee (#55)

Re: Index Skip Scan

Hi Floris,

On Sun, Mar 22, 2020 at 11:00 AM Floris Van Nee
<florisvannee@optiver.com> wrote:

create index on t1 (a,b,c);

select * from t1 where b in (100, 200);
Execution Time: 2.464 ms
Execution Time: 252.224 ms
Execution Time: 244.872 ms

Wow. This is very cool work and I'm sure it will become a major
headline feature of PG14 if the requisite planner brains can be sorted
out.

On Mon, Mar 23, 2020 at 1:55 AM Floris Van Nee <florisvannee@optiver.com> wrote:

I'm unsure which version number to give this patch (to continue with numbers from previous skip scan patches, or to start numbering from scratch again). It's a rather big change, so one could argue it's mostly a separate patch. I guess it mostly depends on how close the original versions were to be committable. Thoughts?

I don't know, but from the sidelines, it'd be nice to see the unique
path part go into PG13, where IIUC it can power the "useless unique
removal" patch.

#57

Andy Fan

zhihui.fan1213@gmail.com

almost 6 years ago

In reply to: Thomas Munro (#56)

Re: Index Skip Scan

On Mon, Mar 23, 2020 at 1:55 AM Floris Van Nee <florisvannee@optiver.com>
wrote:

I'm unsure which version number to give this patch (to continue with

numbers from previous skip scan patches, or to start numbering from scratch
again). It's a rather big change, so one could argue it's mostly a separate
patch. I guess it mostly depends on how close the original versions were to
be committable. Thoughts?

I don't know, but from the sidelines, it'd be nice to see the unique
path part go into PG13, where IIUC it can power the "useless unique
removal" patch.

Actually I have a patch to remove the distinct clause some long time ago[1]/messages/by-id/CAKU4AWqOORqW900O-+L4L2+0xknsEqpfcs9FF7SeiO9TmpeZOg@mail.gmail.com,
and later it came to the UniqueKey as well, you can see [2]/messages/by-id/CAKU4AWrwZMAL=uaFUDMf4WGOVkEL3ONbatqju9nSXTUucpp_pw@mail.gmail.com for the current
status.

[1]: /messages/by-id/CAKU4AWqOORqW900O-+L4L2+0xknsEqpfcs9FF7SeiO9TmpeZOg@mail.gmail.com
/messages/by-id/CAKU4AWqOORqW900O-+L4L2+0xknsEqpfcs9FF7SeiO9TmpeZOg@mail.gmail.com

[2]: /messages/by-id/CAKU4AWrwZMAL=uaFUDMf4WGOVkEL3ONbatqju9nSXTUucpp_pw@mail.gmail.com
/messages/by-id/CAKU4AWrwZMAL=uaFUDMf4WGOVkEL3ONbatqju9nSXTUucpp_pw@mail.gmail.com

#58

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: David Rowley (#50)

2 attachment(s)

Re: Index Skip Scan

On Wed, Mar 11, 2020 at 11:17:51AM +1300, David Rowley wrote:

Yes, I was complaining that a ProjectionPath breaks the optimisation
and I don't believe there's any reason that it should.

I believe the way to make that work correctly requires paying
attention to the Path's uniquekeys rather than what type of path it
is.

Thanks for the suggestion. As a result of the discussion I've modified
the patch, does it look similar to what you had in mind?

In this version if all conditions are met and there are corresponding
unique keys, a new index skip scan path will be added to
unique_pathlists. In case if requested distinct clauses match with
unique keys, create_distinct_paths can choose this path without needen
to know what kind of path is it. Also unique_keys are passed through
ProjectionPath, so optimization for the example mentioned in this thread
before now should work (I've added one test for that).

I haven't changed anything about UniqueKey structure itself (one of the
suggestions was about Expr instead of EquivalenceClass), but I believe
we need anyway to figure out how two existing imlementation (in this
patch and from [1]/messages/by-id/CAKU4AWrwZMAL=uaFUDMf4WGOVkEL3ONbatqju9nSXTUucpp_pw@mail.gmail.com) of this idea can be connected.

[1]: /messages/by-id/CAKU4AWrwZMAL=uaFUDMf4WGOVkEL3ONbatqju9nSXTUucpp_pw@mail.gmail.com

Attachments:

v33-0001-Unique-key.patchtext/x-diff; charset=us-asciiDownload

From c7af8157da82db1cedf02e6ec0de355b56275680 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Tue, 24 Mar 2020 17:04:32 +0100
Subject: [PATCH v33 1/2] Unique key

Design by David Rowley.

Author: Jesper Pedersen
---
 src/backend/nodes/outfuncs.c          | 14 ++++++
 src/backend/nodes/print.c             | 39 +++++++++++++++
 src/backend/optimizer/path/Makefile   |  3 +-
 src/backend/optimizer/path/allpaths.c |  8 +++
 src/backend/optimizer/path/indxpath.c | 41 ++++++++++++++++
 src/backend/optimizer/path/pathkeys.c | 71 ++++++++++++++++++++++-----
 src/backend/optimizer/plan/planagg.c  |  1 +
 src/backend/optimizer/plan/planmain.c |  1 +
 src/backend/optimizer/plan/planner.c  | 37 +++++++++++++-
 src/backend/optimizer/util/pathnode.c | 46 +++++++++++++----
 src/include/nodes/nodes.h             |  1 +
 src/include/nodes/pathnodes.h         | 19 +++++++
 src/include/nodes/print.h             |  1 +
 src/include/optimizer/pathnode.h      |  2 +
 src/include/optimizer/paths.h         | 11 +++++
 15 files changed, 272 insertions(+), 23 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index d76fae44b8..16083e7a7e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1723,6 +1723,7 @@ _outPathInfo(StringInfo str, const Path *node)
 	WRITE_FLOAT_FIELD(startup_cost, "%.2f");
 	WRITE_FLOAT_FIELD(total_cost, "%.2f");
 	WRITE_NODE_FIELD(pathkeys);
+	WRITE_NODE_FIELD(uniquekeys);
 }
 
 /*
@@ -2205,6 +2206,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(eq_classes);
 	WRITE_BOOL_FIELD(ec_merging_done);
 	WRITE_NODE_FIELD(canon_pathkeys);
+	WRITE_NODE_FIELD(canon_uniquekeys);
 	WRITE_NODE_FIELD(left_join_clauses);
 	WRITE_NODE_FIELD(right_join_clauses);
 	WRITE_NODE_FIELD(full_join_clauses);
@@ -2214,6 +2216,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(placeholder_list);
 	WRITE_NODE_FIELD(fkey_list);
 	WRITE_NODE_FIELD(query_pathkeys);
+	WRITE_NODE_FIELD(query_uniquekeys);
 	WRITE_NODE_FIELD(group_pathkeys);
 	WRITE_NODE_FIELD(window_pathkeys);
 	WRITE_NODE_FIELD(distinct_pathkeys);
@@ -2401,6 +2404,14 @@ _outPathKey(StringInfo str, const PathKey *node)
 	WRITE_BOOL_FIELD(pk_nulls_first);
 }
 
+static void
+_outUniqueKey(StringInfo str, const UniqueKey *node)
+{
+	WRITE_NODE_TYPE("UNIQUEKEY");
+
+	WRITE_NODE_FIELD(eq_clause);
+}
+
 static void
 _outPathTarget(StringInfo str, const PathTarget *node)
 {
@@ -4092,6 +4103,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PathKey:
 				_outPathKey(str, obj);
 				break;
+			case T_UniqueKey:
+				_outUniqueKey(str, obj);
+				break;
 			case T_PathTarget:
 				_outPathTarget(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 42476724d8..d286b34544 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -459,6 +459,45 @@ print_pathkeys(const List *pathkeys, const List *rtable)
 	printf(")\n");
 }
 
+/*
+ * print_uniquekeys -
+ *	  uniquekeys list of UniqueKeys
+ */
+void
+print_uniquekeys(const List *uniquekeys, const List *rtable)
+{
+	ListCell   *l;
+
+	printf("(");
+	foreach(l, uniquekeys)
+	{
+		UniqueKey *unique_key = (UniqueKey *) lfirst(l);
+		EquivalenceClass *eclass = (EquivalenceClass *) unique_key->eq_clause;
+		ListCell   *k;
+		bool		first = true;
+
+		/* chase up */
+		while (eclass->ec_merged)
+			eclass = eclass->ec_merged;
+
+		printf("(");
+		foreach(k, eclass->ec_members)
+		{
+			EquivalenceMember *mem = (EquivalenceMember *) lfirst(k);
+
+			if (first)
+				first = false;
+			else
+				printf(", ");
+			print_expr((Node *) mem->em_expr, rtable);
+		}
+		printf(")");
+		if (lnext(uniquekeys, l))
+			printf(", ");
+	}
+	printf(")\n");
+}
+
 /*
  * print_tl
  *	  print targetlist in a more legible way.
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..63cc1505d9 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekey.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8286d9cf34..bbc13e6141 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3954,6 +3954,14 @@ print_path(PlannerInfo *root, Path *path, int indent)
 		print_pathkeys(path->pathkeys, root->parse->rtable);
 	}
 
+	if (path->uniquekeys)
+	{
+		for (i = 0; i < indent; i++)
+			printf("\t");
+		printf("  uniquekeys: ");
+		print_uniquekeys(path->uniquekeys, root->parse->rtable);
+	}
+
 	if (join)
 	{
 		JoinPath   *jp = (JoinPath *) path;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2a50272da6..363f5349f1 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -189,6 +189,7 @@ static Expr *match_clause_to_ordering_op(IndexOptInfo *index,
 static bool ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 									   EquivalenceClass *ec, EquivalenceMember *em,
 									   void *arg);
+static List *get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys);
 
 
 /*
@@ -874,6 +875,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	List	   *orderbyclausecols;
 	List	   *index_pathkeys;
 	List	   *useful_pathkeys;
+	List	   *useful_uniquekeys = NIL;
 	bool		found_lower_saop_clause;
 	bool		pathkeys_possibly_useful;
 	bool		index_is_ordered;
@@ -1036,11 +1038,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	if (index_clauses != NIL || useful_pathkeys != NIL || useful_predicate ||
 		index_only_scan)
 	{
+		if (has_useful_uniquekeys(root))
+			useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 		ipath = create_index_path(root, index,
 								  index_clauses,
 								  orderbyclauses,
 								  orderbyclausecols,
 								  useful_pathkeys,
+								  useful_uniquekeys,
 								  index_is_ordered ?
 								  ForwardScanDirection :
 								  NoMovementScanDirection,
@@ -1063,6 +1069,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 									  orderbyclauses,
 									  orderbyclausecols,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  index_is_ordered ?
 									  ForwardScanDirection :
 									  NoMovementScanDirection,
@@ -1093,11 +1100,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 													index_pathkeys);
 		if (useful_pathkeys != NIL)
 		{
+			if (has_useful_uniquekeys(root))
+				useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 			ipath = create_index_path(root, index,
 									  index_clauses,
 									  NIL,
 									  NIL,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  BackwardScanDirection,
 									  index_only_scan,
 									  outer_relids,
@@ -1115,6 +1126,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 										  NIL,
 										  NIL,
 										  useful_pathkeys,
+										  useful_uniquekeys,
 										  BackwardScanDirection,
 										  index_only_scan,
 										  outer_relids,
@@ -3365,6 +3377,35 @@ match_clause_to_ordering_op(IndexOptInfo *index,
 	return clause;
 }
 
+/*
+ * get_uniquekeys_for_index
+ */
+static List *
+get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys)
+{
+	ListCell *lc;
+
+	if (pathkeys)
+	{
+		List *uniquekeys = NIL;
+		foreach(lc, pathkeys)
+		{
+			UniqueKey *unique_key;
+			PathKey *pk = (PathKey *) lfirst(lc);
+			EquivalenceClass *ec = (EquivalenceClass *) pk->pk_eclass;
+
+			unique_key = makeNode(UniqueKey);
+			unique_key->eq_clause = ec;
+
+			uniquekeys = lappend(uniquekeys, unique_key);
+		}
+
+		if (uniquekeys_contained_in(root->canon_uniquekeys, uniquekeys))
+			return uniquekeys;
+	}
+
+	return NIL;
+}
 
 /****************************************************************************
  *				----  ROUTINES TO DO PARTIAL INDEX PREDICATE TESTS	----
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 71b9d42c99..054df9a617 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -29,6 +29,7 @@
 #include "utils/lsyscache.h"
 
 
+static bool pathkey_is_unique(PathKey *new_pathkey, List *pathkeys);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
 											 RelOptInfo *partrel,
@@ -96,6 +97,29 @@ make_canonical_pathkey(PlannerInfo *root,
 	return pk;
 }
 
+/*
+ * pathkey_is_unique
+ *	   Checks if the new pathkey's equivalence class is the same as that of
+ *     any existing member of the pathkey list.
+ */
+static bool
+pathkey_is_unique(PathKey *new_pathkey, List *pathkeys)
+{
+	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
+	ListCell   *lc;
+
+	/* If same EC already is already in the list, then not unique */
+	foreach(lc, pathkeys)
+	{
+		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
+
+		if (new_ec == old_pathkey->pk_eclass)
+			return false;
+	}
+
+	return true;
+}
+
 /*
  * pathkey_is_redundant
  *	   Is a pathkey redundant with one already in the given list?
@@ -135,22 +159,12 @@ static bool
 pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys)
 {
 	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
-	ListCell   *lc;
 
 	/* Check for EC containing a constant --- unconditionally redundant */
 	if (EC_MUST_BE_REDUNDANT(new_ec))
 		return true;
 
-	/* If same EC already used in list, then redundant */
-	foreach(lc, pathkeys)
-	{
-		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
-
-		if (new_ec == old_pathkey->pk_eclass)
-			return true;
-	}
-
-	return false;
+	return !pathkey_is_unique(new_pathkey, pathkeys);
 }
 
 /*
@@ -1098,6 +1112,41 @@ make_pathkeys_for_sortclauses(PlannerInfo *root,
 	return pathkeys;
 }
 
+/*
+ * make_pathkeys_for_uniquekeyclauses
+ *		Generate a pathkeys list to be used for uniquekey clauses
+ */
+List *
+make_pathkeys_for_uniquekeys(PlannerInfo *root,
+							 List *sortclauses,
+							 List *tlist)
+{
+	List	   *pathkeys = NIL;
+	ListCell   *l;
+
+	foreach(l, sortclauses)
+	{
+		SortGroupClause *sortcl = (SortGroupClause *) lfirst(l);
+		Expr	   *sortkey;
+		PathKey    *pathkey;
+
+		sortkey = (Expr *) get_sortgroupclause_expr(sortcl, tlist);
+		Assert(OidIsValid(sortcl->sortop));
+		pathkey = make_pathkey_from_sortop(root,
+										   sortkey,
+										   root->nullable_baserels,
+										   sortcl->sortop,
+										   sortcl->nulls_first,
+										   sortcl->tleSortGroupRef,
+										   true);
+
+		if (pathkey_is_unique(pathkey, pathkeys))
+			pathkeys = lappend(pathkeys, pathkey);
+	}
+
+	return pathkeys;
+}
+
 /****************************************************************************
  *		PATHKEYS AND MERGECLAUSES
  ****************************************************************************/
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 8634940efc..dd64775d8f 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -511,6 +511,7 @@ minmax_qp_callback(PlannerInfo *root, void *extra)
 									  root->parse->targetList);
 
 	root->query_pathkeys = root->sort_pathkeys;
+	root->query_uniquekeys = NIL;
 }
 
 /*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..3a372af91b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -70,6 +70,7 @@ query_planner(PlannerInfo *root,
 	root->join_rel_level = NULL;
 	root->join_cur_level = 0;
 	root->canon_pathkeys = NIL;
+	root->canon_uniquekeys = NIL;
 	root->left_join_clauses = NIL;
 	root->right_join_clauses = NIL;
 	root->full_join_clauses = NIL;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d6f2153593..a7de8476d9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3657,15 +3657,30 @@ standard_qp_callback(PlannerInfo *root, void *extra)
 	 * much easier, since we know that the parser ensured that one is a
 	 * superset of the other.
 	 */
+	root->query_uniquekeys = NIL;
+
 	if (root->group_pathkeys)
+	{
 		root->query_pathkeys = root->group_pathkeys;
+
+		if (!root->parse->hasAggs)
+			root->query_uniquekeys = build_uniquekeys(root, qp_extra->groupClause);
+	}
 	else if (root->window_pathkeys)
 		root->query_pathkeys = root->window_pathkeys;
 	else if (list_length(root->distinct_pathkeys) >
 			 list_length(root->sort_pathkeys))
+	{
 		root->query_pathkeys = root->distinct_pathkeys;
+		root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else if (root->sort_pathkeys)
+	{
 		root->query_pathkeys = root->sort_pathkeys;
+
+		if (root->distinct_pathkeys)
+			root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else
 		root->query_pathkeys = NIL;
 }
@@ -6222,7 +6237,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
 
 	/* Estimate the cost of index scan */
 	indexScanPath = create_index_path(root, indexInfo,
-									  NIL, NIL, NIL, NIL,
+									  NIL, NIL, NIL, NIL, NIL,
 									  ForwardScanDirection, false,
 									  NULL, 1.0, false);
 
@@ -7107,6 +7122,26 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
 		}
 	}
 
+	foreach(lc, rel->unique_pathlist)
+	{
+		Path	   *subpath = (Path *) lfirst(lc);
+
+		/* Shouldn't have any parameterized paths anymore */
+		Assert(subpath->param_info == NULL);
+
+		if (tlist_same_exprs)
+			subpath->pathtarget->sortgrouprefs =
+				scanjoin_target->sortgrouprefs;
+		else
+		{
+			Path	   *newpath;
+
+			newpath = (Path *) create_projection_path(root, rel, subpath,
+													  scanjoin_target);
+			lfirst(lc) = newpath;
+		}
+	}
+
 	/*
 	 * Now, if final scan/join target contains SRFs, insert ProjectSetPath(s)
 	 * atop each existing path.  (Note that this function doesn't look at the
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e6d08aede5..a4dfafbb59 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -361,9 +361,9 @@ set_cheapest(RelOptInfo *parent_rel)
 }
 
 /*
- * add_path
+ * add_path_to
  *	  Consider a potential implementation path for the specified parent rel,
- *	  and add it to the rel's pathlist if it is worthy of consideration.
+ *	  and add it to the specified pathlist if it is worthy of consideration.
  *	  A path is worthy if it has a better sort order (better pathkeys) or
  *	  cheaper cost (on either dimension), or generates fewer rows, than any
  *	  existing path that has the same or superset parameterization rels.
@@ -416,10 +416,10 @@ set_cheapest(RelOptInfo *parent_rel)
  * 'parent_rel' is the relation entry to which the path corresponds.
  * 'new_path' is a potential path for parent_rel.
  *
- * Returns nothing, but modifies parent_rel->pathlist.
+ * Returns modified pathlist.
  */
-void
-add_path(RelOptInfo *parent_rel, Path *new_path)
+static List *
+add_path_to(RelOptInfo *parent_rel, List *pathlist, Path *new_path)
 {
 	bool		accept_new = true;	/* unless we find a superior old path */
 	int			insert_at = 0;	/* where to insert new item */
@@ -440,7 +440,7 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 	 * for more than one old path to be tossed out because new_path dominates
 	 * it.
 	 */
-	foreach(p1, parent_rel->pathlist)
+	foreach(p1, pathlist)
 	{
 		Path	   *old_path = (Path *) lfirst(p1);
 		bool		remove_old = false; /* unless new proves superior */
@@ -584,8 +584,7 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 		 */
 		if (remove_old)
 		{
-			parent_rel->pathlist = foreach_delete_current(parent_rel->pathlist,
-														  p1);
+			pathlist = foreach_delete_current(pathlist, p1);
 
 			/*
 			 * Delete the data pointed-to by the deleted cell, if possible
@@ -612,8 +611,7 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 	if (accept_new)
 	{
 		/* Accept the new path: insert it at proper place in pathlist */
-		parent_rel->pathlist =
-			list_insert_nth(parent_rel->pathlist, insert_at, new_path);
+		pathlist = list_insert_nth(pathlist, insert_at, new_path);
 	}
 	else
 	{
@@ -621,6 +619,15 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 		if (!IsA(new_path, IndexPath))
 			pfree(new_path);
 	}
+
+	return pathlist;
+}
+
+void
+add_path(RelOptInfo *parent_rel, Path *new_path)
+{
+	parent_rel->pathlist = add_path_to(parent_rel,
+									   parent_rel->pathlist, new_path);
 }
 
 /*
@@ -915,6 +922,13 @@ add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost,
 	return true;
 }
 
+void
+add_unique_path(RelOptInfo *parent_rel, Path *new_path)
+{
+	parent_rel->unique_pathlist = add_path_to(parent_rel,
+											  parent_rel->unique_pathlist,
+											  new_path);
+}
 
 /*****************************************************************************
  *		PATH NODE CREATION ROUTINES
@@ -940,6 +954,7 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = parallel_workers;
 	pathnode->pathkeys = NIL;	/* seqscan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_seqscan(pathnode, root, rel, pathnode->param_info);
 
@@ -964,6 +979,7 @@ create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* samplescan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_samplescan(pathnode, root, rel, pathnode->param_info);
 
@@ -1000,6 +1016,7 @@ create_index_path(PlannerInfo *root,
 				  List *indexorderbys,
 				  List *indexorderbycols,
 				  List *pathkeys,
+				  List *uniquekeys,
 				  ScanDirection indexscandir,
 				  bool indexonly,
 				  Relids required_outer,
@@ -1018,6 +1035,7 @@ create_index_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = 0;
 	pathnode->path.pathkeys = pathkeys;
+	pathnode->path.uniquekeys = uniquekeys;
 
 	pathnode->indexinfo = index;
 	pathnode->indexclauses = indexclauses;
@@ -1061,6 +1079,7 @@ create_bitmap_heap_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_degree;
 	pathnode->path.pathkeys = NIL;	/* always unordered */
+	pathnode->path.uniquekeys = NIL;
 
 	pathnode->bitmapqual = bitmapqual;
 
@@ -1922,6 +1941,7 @@ create_functionscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = pathkeys;
+	pathnode->uniquekeys = NIL;
 
 	cost_functionscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1948,6 +1968,7 @@ create_tablefuncscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_tablefuncscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1974,6 +1995,7 @@ create_valuesscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_valuesscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1999,6 +2021,7 @@ create_ctescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* XXX for now, result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2025,6 +2048,7 @@ create_namedtuplestorescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_namedtuplestorescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2051,6 +2075,7 @@ create_resultscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_resultscan(pathnode, root, rel, pathnode->param_info);
 
@@ -2077,6 +2102,7 @@ create_worktablescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	/* Cost is the same as for a regular CTE scan */
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index baced7eec0..a1511b46ea 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 3d3be197e0..0de27f0ef3 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -269,6 +269,8 @@ struct PlannerInfo
 
 	List	   *canon_pathkeys; /* list of "canonical" PathKeys */
 
+	List	   *canon_uniquekeys; /* list of "canonical" UniqueKeys */
+
 	List	   *left_join_clauses;	/* list of RestrictInfos for mergejoinable
 									 * outer join clauses w/nonnullable var on
 									 * left */
@@ -297,6 +299,8 @@ struct PlannerInfo
 
 	List	   *query_pathkeys; /* desired pathkeys for query_planner() */
 
+	List	   *query_uniquekeys; /* unique keys used for the query */
+
 	List	   *group_pathkeys; /* groupClause pathkeys, if any */
 	List	   *window_pathkeys;	/* pathkeys of bottom window, if any */
 	List	   *distinct_pathkeys;	/* distinctClause pathkeys, if any */
@@ -657,6 +661,7 @@ typedef struct RelOptInfo
 	List	   *pathlist;		/* Path structures */
 	List	   *ppilist;		/* ParamPathInfos used in pathlist */
 	List	   *partial_pathlist;	/* partial Paths */
+	List	   *unique_pathlist;	/* unique Paths */
 	struct Path *cheapest_startup_path;
 	struct Path *cheapest_total_path;
 	struct Path *cheapest_unique_path;
@@ -1077,6 +1082,15 @@ typedef struct ParamPathInfo
 	List	   *ppi_clauses;	/* join clauses available from outer rels */
 } ParamPathInfo;
 
+/*
+ * UniqueKey
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+
+	EquivalenceClass *eq_clause;	/* equivalence class */
+} UniqueKey;
 
 /*
  * Type "Path" is used as-is for sequential-scan paths, as well as some other
@@ -1106,6 +1120,9 @@ typedef struct ParamPathInfo
  *
  * "pathkeys" is a List of PathKey nodes (see above), describing the sort
  * ordering of the path's output rows.
+ *
+ * "uniquekeys", if not NIL, is a list of UniqueKey nodes (see above),
+ * describing the XXX.
  */
 typedef struct Path
 {
@@ -1129,6 +1146,8 @@ typedef struct Path
 
 	List	   *pathkeys;		/* sort ordering of path's output */
 	/* pathkeys is a List of PathKey nodes; see above */
+
+	List	   *uniquekeys;	/* the unique keys, or NIL if none */
 } Path;
 
 /* Macro for extracting a path's parameterization relids; beware double eval */
diff --git a/src/include/nodes/print.h b/src/include/nodes/print.h
index 6126b491bf..006248bfb5 100644
--- a/src/include/nodes/print.h
+++ b/src/include/nodes/print.h
@@ -28,6 +28,7 @@ extern char *pretty_format_node_dump(const char *dump);
 extern void print_rt(const List *rtable);
 extern void print_expr(const Node *expr, const List *rtable);
 extern void print_pathkeys(const List *pathkeys, const List *rtable);
+extern void print_uniquekeys(const List *uniquekeys, const List *rtable);
 extern void print_tl(const List *tlist, const List *rtable);
 extern void print_slot(TupleTableSlot *slot);
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e450fe112a..fd25997af5 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -34,6 +34,7 @@ extern void add_partial_path(RelOptInfo *parent_rel, Path *new_path);
 extern bool add_partial_path_precheck(RelOptInfo *parent_rel,
 									  Cost total_cost, List *pathkeys);
 
+extern void add_unique_path(RelOptInfo *parent_rel, Path *new_path);
 extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
 								 Relids required_outer, int parallel_workers);
 extern Path *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel,
@@ -44,6 +45,7 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									List *indexorderbys,
 									List *indexorderbycols,
 									List *pathkeys,
+									List *uniquekeys,
 									ScanDirection indexscandir,
 									bool indexonly,
 									Relids required_outer,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ab73bd20c..5b6be383b3 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -214,6 +214,9 @@ extern List *build_join_pathkeys(PlannerInfo *root,
 extern List *make_pathkeys_for_sortclauses(PlannerInfo *root,
 										   List *sortclauses,
 										   List *tlist);
+extern List *make_pathkeys_for_uniquekeys(PlannerInfo *root,
+										  List *sortclauses,
+										  List *tlist);
 extern void initialize_mergeclause_eclasses(PlannerInfo *root,
 											RestrictInfo *restrictinfo);
 extern void update_mergeclause_eclasses(PlannerInfo *root,
@@ -240,4 +243,12 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+/*
+ * uniquekey.c
+ *	  Utilities for matching and building unique keys
+ */
+extern List *build_uniquekeys(PlannerInfo *root, List *sortclauses);
+extern bool uniquekeys_contained_in(List *keys1, List *keys2);
+extern bool has_useful_uniquekeys(PlannerInfo *root);
+
 #endif							/* PATHS_H */
-- 
2.21.0

v33-0002-Index-skip-scan.patchtext/x-diff; charset=us-asciiDownload

From 231eec6a78fb40818d1687512720fd7d61f3203a Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Fri, 15 Nov 2019 09:46:53 -0500
Subject: [PATCH v33 2/2] Index skip scan

Implementation of Index Skip Scan (see Loose Index Scan in the wiki [1])
on top of IndexOnlyScan and IndexScan. To make it suitable for both
situations when there are small number of distinct values and
significant amount of distinct values the following approach is taken -
instead of searching from the root for every value we're searching for
then first on the current page, and then if not found continue searching
from the root.

Original patch and design were proposed by Thomas Munro [2], revived and
improved by Dmitry Dolgov and Jesper Pedersen.

[1] https://wiki.postgresql.org/wiki/Loose_indexscan
[2] https://www.postgresql.org/message-id/flat/CADLWmXXbTSBxP-MzJuPAYSsL_2f0iPm5VWPbCvDbVvfX93FKkw%40mail.gmail.com

Author: Jesper Pedersen, Dmitry Dolgov
Reviewed-by: Thomas Munro, David Rowley, Floris Van Nee, Kyotaro Horiguchi, Tomas Vondra, Peter Geoghegan
---
 contrib/bloom/blutils.c                       |   1 +
 doc/src/sgml/config.sgml                      |  15 +
 doc/src/sgml/indexam.sgml                     |  63 ++
 doc/src/sgml/indices.sgml                     |  23 +
 src/backend/access/brin/brin.c                |   1 +
 src/backend/access/gin/ginutil.c              |   1 +
 src/backend/access/gist/gist.c                |   1 +
 src/backend/access/hash/hash.c                |   1 +
 src/backend/access/index/indexam.c            |  18 +
 src/backend/access/nbtree/nbtree.c            |  13 +
 src/backend/access/nbtree/nbtsearch.c         | 469 ++++++++++++-
 src/backend/access/spgist/spgutils.c          |   1 +
 src/backend/commands/explain.c                |  25 +
 src/backend/executor/nodeIndexonlyscan.c      |  97 ++-
 src/backend/executor/nodeIndexscan.c          |  56 +-
 src/backend/nodes/copyfuncs.c                 |   2 +
 src/backend/nodes/outfuncs.c                  |   2 +
 src/backend/nodes/readfuncs.c                 |   2 +
 src/backend/optimizer/path/costsize.c         |   1 +
 src/backend/optimizer/path/indxpath.c         |  37 ++
 src/backend/optimizer/plan/createplan.c       |  20 +-
 src/backend/optimizer/plan/planner.c          |  10 +-
 src/backend/optimizer/util/pathnode.c         |  68 ++
 src/backend/optimizer/util/plancat.c          |   1 +
 src/backend/utils/misc/guc.c                  |   9 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/amapi.h                    |   8 +
 src/include/access/genam.h                    |   2 +
 src/include/access/nbtree.h                   |   7 +
 src/include/access/sdir.h                     |   7 +
 src/include/nodes/execnodes.h                 |   6 +
 src/include/nodes/pathnodes.h                 |   5 +
 src/include/nodes/plannodes.h                 |   4 +
 src/include/optimizer/cost.h                  |   1 +
 src/include/optimizer/pathnode.h              |   3 +
 src/test/regress/expected/select_distinct.out | 621 ++++++++++++++++++
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/sql/select_distinct.sql      | 254 +++++++
 38 files changed, 1845 insertions(+), 14 deletions(-)

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 0104d02f67..a018b7f3d0 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -133,6 +133,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = blbulkdelete;
 	amroutine->amvacuumcleanup = blvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = blcostestimate;
 	amroutine->amoptions = bloptions;
 	amroutine->amproperty = NULL;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e07dc01e80..36ba75b077 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4517,6 +4517,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-indexskipscan" xreflabel="enable_indexskipscan">
+      <term><varname>enable_indexskipscan</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_indexskipscan</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of index-skip-scan plan
+        types (see <xref linkend="indexes-index-skip-scans"/>). The default is
+        <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-material" xreflabel="enable_material">
       <term><varname>enable_material</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 37f8d8760a..a726d80878 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -148,6 +148,7 @@ typedef struct IndexAmRoutine
     amendscan_function amendscan;
     ammarkpos_function ammarkpos;       /* can be NULL */
     amrestrpos_function amrestrpos;     /* can be NULL */
+    amskip_function amskip;             /* can be NULL */
 
     /* interface functions to support parallel index scans */
     amestimateparallelscan_function amestimateparallelscan;    /* can be NULL */
@@ -691,6 +692,68 @@ amrestrpos (IndexScanDesc scan);
 
   <para>
 <programlisting>
+bool
+amskip (IndexScanDesc scan,
+        ScanDirection direction,
+        ScanDirection indexdir,
+        bool scanstart,
+        int prefix);
+</programlisting>
+  Skip past all tuples where the first 'prefix' columns have the same value as
+  the last tuple returned in the current scan. The arguments are:
+
+   <variablelist>
+    <varlistentry>
+     <term><parameter>scan</parameter></term>
+     <listitem>
+      <para>
+       Index scan information
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>direction</parameter></term>
+     <listitem>
+      <para>
+       The direction in which data is advancing.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>indexdir</parameter></term>
+     <listitem>
+      <para>
+        The index direction, in which data must be read.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>scanstart</parameter></term>
+     <listitem>
+      <para>
+        Whether or not it is a start of the scan.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>prefix</parameter></term>
+     <listitem>
+      <para>
+        Distinct prefix size.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+
+  </para>
+
+  <para>
+<programlisting>
 Size
 amestimateparallelscan (void);
 </programlisting>
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index c54bf0dbbd..c429d98fc7 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1254,6 +1254,29 @@ SELECT target FROM tests WHERE subject = 'some-subject' AND success;
    and later will recognize such cases and allow index-only scans to be
    generated, but older versions will not.
   </para>
+
+  <sect2 id="indexes-index-skip-scans">
+    <title>Index Skip Scans</title>
+
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index</primary>
+      <secondary>index-skip scans</secondary>
+    </indexterm>
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index-skip scan</primary>
+    </indexterm>
+
+    <para>
+     When the rows retrieved from an index scan are then deduplicated by
+     eliminating rows matching on a prefix of index keys (e.g. when using
+     <literal>SELECT DISTINCT</literal>), the planner will consider
+     skipping groups of rows with a matching key prefix. When a row with
+     a particular prefix is found, remaining rows with the same key prefix
+     are skipped.  The larger the number of rows with the same key prefix
+     rows (i.e. the lower the number of distinct key prefixes in the index),
+     the more efficient this is.
+    </para>
+  </sect2>
  </sect1>
 
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2e8f67ef10..4db31bb211 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -113,6 +113,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = brinbulkdelete;
 	amroutine->amvacuumcleanup = brinvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = brincostestimate;
 	amroutine->amoptions = brinoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index a7e55caf28..8dd1d30d2a 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -65,6 +65,7 @@ ginhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = ginbulkdelete;
 	amroutine->amvacuumcleanup = ginvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = gincostestimate;
 	amroutine->amoptions = ginoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index aefc302ed2..8c692f7fb4 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -86,6 +86,7 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = gistbulkdelete;
 	amroutine->amvacuumcleanup = gistvacuumcleanup;
 	amroutine->amcanreturn = gistcanreturn;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = gistcostestimate;
 	amroutine->amoptions = gistoptions;
 	amroutine->amproperty = gistproperty;
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 4871b7ff4d..e5fa4c7864 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -83,6 +83,7 @@ hashhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = hashbulkdelete;
 	amroutine->amvacuumcleanup = hashvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = hashcostestimate;
 	amroutine->amoptions = hashoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 01539b6bd6..1047a35ade 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -33,6 +33,7 @@
  *		index_can_return	- does index support index-only scans?
  *		index_getprocid - get a support procedure OID
  *		index_getprocinfo - get a support procedure's lookup info
+ *		index_skip		- advance past duplicate key values in a scan
  *
  * NOTES
  *		This file contains the index_ routines which used
@@ -730,6 +731,23 @@ index_can_return(Relation indexRelation, int attno)
 	return indexRelation->rd_indam->amcanreturn(indexRelation, attno);
 }
 
+/* ----------------
+ *		index_skip
+ *
+ *		Skip past all tuples where the first 'prefix' columns have the
+ *		same value as the last tuple returned in the current scan.
+ * ----------------
+ */
+bool
+index_skip(IndexScanDesc scan, ScanDirection direction,
+		   ScanDirection indexdir, bool scanstart, int prefix)
+{
+	SCAN_CHECKS;
+
+	return scan->indexRelation->rd_indam->amskip(scan, direction,
+												 indexdir, scanstart, prefix);
+}
+
 /* ----------------
  *		index_getprocid
  *
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 5254bc7ef5..8fde56fe60 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -132,6 +132,7 @@ bthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = btbulkdelete;
 	amroutine->amvacuumcleanup = btvacuumcleanup;
 	amroutine->amcanreturn = btcanreturn;
+	amroutine->amskip = btskip;
 	amroutine->amcostestimate = btcostestimate;
 	amroutine->amoptions = btoptions;
 	amroutine->amproperty = btproperty;
@@ -381,6 +382,8 @@ btbeginscan(Relation rel, int nkeys, int norderbys)
 	 */
 	so->currTuples = so->markTuples = NULL;
 
+	so->skipScanKey = NULL;
+
 	scan->xs_itupdesc = RelationGetDescr(rel);
 
 	scan->opaque = so;
@@ -448,6 +451,16 @@ btrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 	_bt_preprocess_array_keys(scan);
 }
 
+/*
+ * btskip() -- skip to the beginning of the next key prefix
+ */
+bool
+btskip(IndexScanDesc scan, ScanDirection direction,
+	   ScanDirection indexdir, bool start, int prefix)
+{
+	return _bt_skip(scan, direction, indexdir, start, prefix);
+}
+
 /*
  *	btendscan() -- close down a scan
  */
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index c573814f01..e2b549355b 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -37,7 +37,10 @@ static bool _bt_parallel_readpage(IndexScanDesc scan, BlockNumber blkno,
 static Buffer _bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot);
 static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir);
 static inline void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir);
-
+static inline void _bt_update_skip_scankeys(IndexScanDesc scan,
+											Relation indexRel);
+static inline bool _bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+										Buffer buf, ScanDirection dir);
 
 /*
  *	_bt_drop_lock_and_maybe_pin()
@@ -1375,6 +1378,419 @@ _bt_next(IndexScanDesc scan, ScanDirection dir)
 	return true;
 }
 
+/*
+ *  _bt_skip() -- Skip items that have the same prefix as the most recently
+ * 				  fetched index tuple.
+ *
+ * 		The current position is set so that a subsequent call to _bt_next will
+ * 		fetch the first tuple that differs in the leading 'prefix' keys.
+ *
+ * 		There are four different kinds of skipping (depending on dir and
+ * 		indexdir, that are important to distinguish, especially in the presense
+ * 		of an index condition:
+ *
+ * 		* Advancing forward and reading forward
+ * 			simple scan
+ *
+ * 		* Advancing forward and reading backward
+ * 			scan inside a cursor fetching backward, when skipping is necessary
+ * 			right from the start
+ *
+ * 		* Advancing backward and reading forward
+ * 			scan with order by desc inside a cursor fetching forward, when
+ * 			skipping is necessary right from the start
+ *
+ * 		* Advancing backward and reading backward
+ * 			simple scan with order by desc
+ *
+ *      The current page is searched for the next unique value. If none is found
+ *      we will do a scan from the root in order to find the next page with
+ *      a unique value.
+ */
+bool
+_bt_skip(IndexScanDesc scan, ScanDirection dir,
+		 ScanDirection indexdir, bool scanstart, int prefix)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+	BTStack stack;
+	Buffer buf;
+	OffsetNumber offnum;
+	BTScanPosItem *currItem;
+	Relation 	 indexRel = scan->indexRelation;
+
+	/* We want to return tuples, and we need a starting point */
+	Assert(scan->xs_want_itup);
+	Assert(scan->xs_itup);
+
+	if (so->numKilled > 0)
+		_bt_killitems(scan);
+
+	/* If skipScanKey is NULL then we initialize it with _bt_mkscankey */
+	if (so->skipScanKey == NULL)
+	{
+		so->skipScanKey = _bt_mkscankey(indexRel, scan->xs_itup);
+		so->skipScanKey->keysz = prefix;
+		so->skipScanKey->scantid = NULL;
+	}
+	so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+	_bt_update_skip_scankeys(scan, indexRel);
+
+	/* Check if the next unique key can be found within the current page.
+	 * Since we do not lock the current page between jumps, it's possible
+	 * that it was splitted since the last time we saw it. This is fine in
+	 * case of scanning forward, since page split to the right and we are
+	 * still on the left most page. In case of scanning backwards it's
+	 * possible to loose some pages and we need to remember the previous
+	 * page, and then follow the right link from the current page until we
+	 * find the original one.
+	 *
+	 * Since the whole idea of checking the current page is to protect
+	 * ourselves and make more performant statistic mismatch case when
+	 * there are too many distinct values for jumping, it's not clear if
+	 * the complexity of this solution in case of backward scan is
+	 * justified, so for now just avoid it.
+	 */
+	if (BufferIsValid(so->currPos.buf) && ScanDirectionIsForward(dir))
+	{
+		LockBuffer(so->currPos.buf, BT_READ);
+
+		if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
+		{
+			bool keyFound = false;
+
+			offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, so->currPos.buf);
+
+			/* Lock the page for SERIALIZABLE transactions */
+			PredicateLockPage(scan->indexRelation, BufferGetBlockNumber(so->currPos.buf),
+							  scan->xs_snapshot);
+
+			/* We know in which direction to look */
+			_bt_initialize_more_data(so, dir);
+
+			/* Now read the data */
+			keyFound = _bt_readpage(scan, dir, offnum);
+
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			ReleaseBuffer(so->currPos.buf);
+			so->currPos.buf = InvalidBuffer;
+
+			if (keyFound)
+			{
+				/* set IndexTuple */
+				currItem = &so->currPos.items[so->currPos.itemIndex];
+				scan->xs_heaptid = currItem->heapTid;
+				scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+				return true;
+			}
+		}
+		else
+		{
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		}
+	}
+
+	if (BufferIsValid(so->currPos.buf))
+	{
+		ReleaseBuffer(so->currPos.buf);
+		so->currPos.buf = InvalidBuffer;
+	}
+
+	/*
+	 * We haven't found scan key within the current page, so let's scan from
+	 * the root. Use _bt_search and _bt_binsrch to get the buffer and offset
+	 * number
+	 */
+	so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+	stack = _bt_search(scan->indexRelation, so->skipScanKey,
+					   &buf, BT_READ, scan->xs_snapshot);
+	_bt_freestack(stack);
+	so->currPos.buf = buf;
+	offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+
+	/* Lock the page for SERIALIZABLE transactions */
+	PredicateLockPage(scan->indexRelation, BufferGetBlockNumber(buf),
+					  scan->xs_snapshot);
+
+	/* We know in which direction to look */
+	_bt_initialize_more_data(so, dir);
+
+	/*
+	 * Simplest case is when both directions are forward, when we are already
+	 * at the next distinct key at the beginning of the series (so everything
+	 * else would be done in _bt_readpage)
+	 *
+	 * The case when both directions are backwards is also simple, but we need
+	 * to go one step back, since we need a last element from the previous
+	 * series.
+	 */
+	if (ScanDirectionIsBackward(dir) && ScanDirectionIsBackward(indexdir))
+		 offnum = OffsetNumberPrev(offnum);
+
+	/*
+	 * Andvance backward but read forward. At this moment we are at the next
+	 * distinct key at the beginning of the series. In case if scan just
+	 * started, we can read forward without doing anything else. Otherwise
+	 * find previous distinct key and the beginning of it's series and read
+	 * forward from there. To do so, go back one step, perform binary search
+	 * to find the first item in the series and let _bt_readpage do everything
+	 * else.
+	 */
+	else if (ScanDirectionIsBackward(dir) && ScanDirectionIsForward(indexdir))
+	{
+		if (!scanstart)
+		{
+			/* Reading forward means we expect to see more data on the right */
+			so->currPos.moreRight = true;
+
+			offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+
+			/* One step back to find a previous value */
+			_bt_readpage(scan, dir, offnum);
+
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			if (_bt_next(scan, dir))
+			{
+				LockBuffer(so->currPos.buf, BT_READ);
+				_bt_update_skip_scankeys(scan, indexRel);
+
+				/*
+				 * And now find the last item from the sequence for the
+				 * current, value with the intention do OffsetNumberNext. As a
+				 * result we end up on a first element from the sequence.
+				 */
+				if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
+					offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				else
+				{
+					if (BufferIsValid(so->currPos.buf))
+					{
+						/* Before leaving current page, deal with any killed items */
+						if (so->numKilled > 0)
+							_bt_killitems(scan);
+
+						LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+						ReleaseBuffer(so->currPos.buf);
+						so->currPos.buf = InvalidBuffer;
+					}
+
+					stack = _bt_search(scan->indexRelation, so->skipScanKey,
+									   &buf, BT_READ, scan->xs_snapshot);
+					_bt_freestack(stack);
+					so->currPos.buf = buf;
+					offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				}
+			}
+			else
+			{
+				pfree(so->skipScanKey);
+				so->skipScanKey = NULL;
+				return false;
+			}
+		}
+	}
+
+	/*
+	 * Advance forward but read backward. At this moment we are at the next
+	 * distinct key at the beginning of the series. In case if scan just
+	 * started, we can go one step back and read forward without doing
+	 * anything else. Otherwise find the next distinct key and the beginning
+	 * of it's series, go one step back and read backward from there.
+	 *
+	 * An interesting situation can happen if one of distinct keys do not pass
+	 * a corresponding index condition at all. In this case reading backward
+	 * can lead to a previous distinct key being found, creating a loop. To
+	 * avoid that check the value to be returned, and jump one more time if
+	 * it's the same as at the beginning. Note that we do not check visibility
+	 * here, and dead tuples could also lead to the same situation. This has to
+	 * be checked on the caller side.
+	 */
+	else if (ScanDirectionIsForward(dir) && ScanDirectionIsBackward(indexdir))
+	{
+		if (scanstart)
+			offnum = OffsetNumberPrev(offnum);
+		else
+		{
+			OffsetNumber nextOffset,
+						startOffset,
+						jumpOffset;
+
+			IndexTuple startItup = CopyIndexTuple(scan->xs_itup);
+			Page page = BufferGetPage(so->currPos.buf);
+
+			/* We are at the end and need to return */
+			if ((offnum > PageGetMaxOffsetNumber(page)) &
+				(so->currPos.nextPage == P_NONE))
+			{
+				LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+
+				BTScanPosUnpinIfPinned(so->currPos);
+				BTScanPosInvalidate(so->currPos)
+
+				pfree(so->skipScanKey);
+				so->skipScanKey = NULL;
+				return false;
+			}
+
+			nextOffset = startOffset = ItemPointerGetOffsetNumber(&scan->xs_itup->t_tid);
+
+			/* Reading backwards means we expect to see more data on the left */
+			so->currPos.moreLeft = true;
+
+			while (nextOffset == startOffset)
+			{
+				IndexTuple itup;
+				CHECK_FOR_INTERRUPTS();
+
+				/*
+				 * Find a next index tuple to update scan key. It could be at
+				 * the end, so check for max offset
+				 */
+				if (!_bt_readpage(scan, ForwardScanDirection, offnum))
+				{
+					/*
+					 * There's no actually-matching data on this page.  Try to
+					 * advance to the next page. Return false if there's no
+					 * matching data at all.
+					 */
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+					if (!_bt_steppage(scan, dir))
+					{
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+					LockBuffer(so->currPos.buf, BT_READ);
+				}
+
+				currItem = &so->currPos.items[so->currPos.firstItem];
+				itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+
+				scan->xs_itup = itup;
+				so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+
+				_bt_update_skip_scankeys(scan, indexRel);
+				if (BufferIsValid(so->currPos.buf))
+				{
+					/* Before leaving current page, deal with any killed items */
+					if (so->numKilled > 0)
+						_bt_killitems(scan);
+
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+					ReleaseBuffer(so->currPos.buf);
+					so->currPos.buf = InvalidBuffer;
+				}
+
+				stack = _bt_search(scan->indexRelation, so->skipScanKey,
+								   &buf, BT_READ, scan->xs_snapshot);
+				_bt_freestack(stack);
+				so->currPos.buf = buf;
+				jumpOffset = offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				offnum = OffsetNumberPrev(offnum);
+
+				if (!_bt_readpage(scan, indexdir, offnum))
+				{
+					/*
+					 * There's no actually-matching data on this page.  Try to
+					 * advance to the next page. Return false if there's no
+					 * matching data at all.
+					 */
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+					if (!_bt_steppage(scan, indexdir))
+					{
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+					LockBuffer(so->currPos.buf, BT_READ);
+				}
+
+				currItem = &so->currPos.items[so->currPos.lastItem];
+				itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+				nextOffset = ItemPointerGetOffsetNumber(&itup->t_tid);
+
+				/*
+				 * To check if we returned the same tuple, try to find a
+				 * startItup on the current page. For that we need to update
+				 * scankey to match the whole tuple and set nextkey to return
+				 * an exact tuple, not the next one. If the nextOffset is the
+				 * same as before, it means we are in the loop, return offnum
+				 * to the original position and jump further
+				 */
+				scan->xs_itup = startItup;
+				_bt_update_skip_scankeys(scan, indexRel);
+
+				so->skipScanKey->keysz = IndexRelationGetNumberOfKeyAttributes(indexRel);
+				so->skipScanKey->nextkey = false;
+
+				if (_bt_scankey_within_page(scan, so->skipScanKey,
+											so->currPos.buf, dir))
+				{
+					OffsetNumber maxoff;
+					startOffset = _bt_binsrch(scan->indexRelation,
+											  so->skipScanKey,
+											  so->currPos.buf);
+
+					page = BufferGetPage(so->currPos.buf);
+					maxoff = PageGetMaxOffsetNumber(page);
+
+					if (nextOffset <= startOffset)
+					{
+						offnum = jumpOffset;
+						nextOffset = startOffset;
+					}
+
+					if ((offnum > maxoff) & (so->currPos.nextPage == P_NONE))
+					{
+						LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+
+						BTScanPosUnpinIfPinned(so->currPos);
+						BTScanPosInvalidate(so->currPos)
+
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+				}
+
+				/* Return original scankey options */
+				so->skipScanKey->keysz = prefix;
+				so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+			}
+		}
+	}
+
+	/* Now read the data */
+	if (!_bt_readpage(scan, indexdir, offnum))
+	{
+		/*
+		 * There's no actually-matching data on this page.  Try to advance to
+		 * the next page.  Return false if there's no matching data at all.
+		 */
+		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		if (!_bt_steppage(scan, dir))
+		{
+			pfree(so->skipScanKey);
+			so->skipScanKey = NULL;
+			return false;
+		}
+	}
+	else
+	{
+		/* Drop the lock, and maybe the pin, on the current page */
+		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+	}
+
+	/* And set IndexTuple */
+	currItem = &so->currPos.items[so->currPos.itemIndex];
+	scan->xs_heaptid = currItem->heapTid;
+	scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+
+	so->currPos.moreLeft = true;
+	so->currPos.moreRight = true;
+
+	return true;
+}
+
 /*
  *	_bt_readpage() -- Load data from current index page into so->currPos
  *
@@ -2246,3 +2662,54 @@ _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir)
 	so->numKilled = 0;			/* just paranoia */
 	so->markItemIndex = -1;		/* ditto */
 }
+
+/*
+ * _bt_update_skip_scankeys() -- set up a new values for the existing scankeys
+ * 								 based on the current index tuple
+ */
+static inline void
+_bt_update_skip_scankeys(IndexScanDesc scan, Relation indexRel)
+{
+	TupleDesc		itupdesc;
+	int			indnkeyatts,
+				i;
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	ScanKey			scankeys = so->skipScanKey->scankeys;
+
+	itupdesc = RelationGetDescr(indexRel);
+	indnkeyatts = IndexRelationGetNumberOfKeyAttributes(indexRel);
+	for (i = 0; i < indnkeyatts; i++)
+	{
+		Datum datum;
+		bool null;
+		int flags;
+
+		datum = index_getattr(scan->xs_itup, i + 1, itupdesc, &null);
+		flags = (null ? SK_ISNULL : 0) |
+				(indexRel->rd_indoption[i] << SK_BT_INDOPTION_SHIFT);
+		scankeys[i].sk_flags = flags;
+		scankeys[i].sk_argument = datum;
+	}
+}
+
+/*
+ * _bt_scankey_within_page() -- check if the provided scankey could be found
+ * 								within a page, specified by the buffer.
+ */
+static inline bool
+_bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+						Buffer buf, ScanDirection dir)
+{
+	OffsetNumber low, high;
+	Page page = BufferGetPage(buf);
+	BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+
+	low = P_FIRSTDATAKEY(opaque);
+	high = PageGetMaxOffsetNumber(page);
+
+	if (unlikely(high < low))
+		return false;
+
+	return (_bt_compare(scan->indexRelation, key, page, low) > 0 &&
+			_bt_compare(scan->indexRelation, key, page, high) < 1);
+}
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 4924ae1c59..fa09a4685e 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -68,6 +68,7 @@ spghandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = spgbulkdelete;
 	amroutine->amvacuumcleanup = spgvacuumcleanup;
 	amroutine->amcanreturn = spgcanreturn;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = spgcostestimate;
 	amroutine->amoptions = spgoptions;
 	amroutine->amproperty = spgproperty;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c367c750b1..a7dd874531 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -141,6 +141,7 @@ static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
 static void ExplainIndentText(ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
+static void ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
 
 
@@ -1052,6 +1053,22 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 	return planstate_tree_walker(planstate, ExplainPreScanNode, rels_used);
 }
 
+/*
+ * ExplainIndexSkipScanKeys -
+ *	  Append information about index skip scan to es->str.
+ *
+ * Can be used to print the skip prefix size.
+ */
+static void
+ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es)
+{
+	if (skipPrefixSize > 0)
+	{
+		if (es->format != EXPLAIN_FORMAT_TEXT)
+			ExplainPropertyInteger("Distinct Prefix", NULL, skipPrefixSize, es);
+	}
+}
+
 /*
  * ExplainNode -
  *	  Appends a description of a plan tree to es->str
@@ -1386,6 +1403,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
 
+				ExplainIndexSkipScanKeys(indexscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexscan->indexid,
 										indexscan->indexorderdir,
 										es);
@@ -1396,6 +1415,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) plan;
 
+				ExplainIndexSkipScanKeys(indexonlyscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexonlyscan->indexid,
 										indexonlyscan->indexorderdir,
 										es);
@@ -1655,6 +1676,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	switch (nodeTag(plan))
 	{
 		case T_IndexScan:
+			if (((IndexScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyBool("Skip scan", true, es);
 			show_scan_qual(((IndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexScan *) plan)->indexqualorig)
@@ -1668,6 +1691,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			break;
 		case T_IndexOnlyScan:
+			if (((IndexOnlyScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyBool("Skip scan", true, es);
 			show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexOnlyScan *) plan)->indexqual)
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 5617ac29e7..c4e4b087a7 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -41,6 +41,7 @@
 #include "miscadmin.h"
 #include "storage/bufmgr.h"
 #include "storage/predicate.h"
+#include "storage/itemptr.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -62,9 +63,26 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	EState	   *estate;
 	ExprContext *econtext;
 	ScanDirection direction;
+	ScanDirection readDirection;
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
 	ItemPointer tid;
+	ItemPointerData startTid;
+	IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) node->ss.ps.plan;
+
+	/*
+	 * Tells if the current position was reached via skipping. In this case
+	 * there is no nead for the index_getnext_tid
+	 */
+	bool skipped = false;
+
+	/*
+	 * Index only scan must be aware that in case of skipping we can return to
+	 * the starting point due to visibility checks. In this situation we need
+	 * to jump further, and number of skipping attempts tell us how far do we
+	 * need to do so.
+	 */
+	int skipAttempts = 0;
 
 	/*
 	 * extract necessary information from index scan node
@@ -72,7 +90,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexOnlyScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexonlyscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -114,16 +132,87 @@ IndexOnlyNext(IndexOnlyScanState *node)
 						 node->ioss_OrderByKeys,
 						 node->ioss_NumOrderByKeys);
 	}
+	else
+	{
+		ItemPointerCopy(&scandesc->xs_heaptid, &startTid);
+	}
+
+	/*
+	 * Check if we need to skip to the next key prefix, because we've been
+	 * asked to implement DISTINCT.
+	 *
+	 * When fetching a cursor in the direction opposite to a general scan
+	 * direction, the result must be what normal fetching should have
+	 * returned, but in reversed order. In other words, return the last or
+	 * first scanned tuple in a DISTINCT set, depending on a cursor direction.
+	 * Due to that we skip also when the first tuple wasn't emitted yet, but
+	 * the directions are opposite.
+	 */
+	if (node->ioss_SkipPrefixSize > 0 &&
+		(node->ioss_FirstTupleEmitted ||
+		 ScanDirectionsAreOpposite(direction, indexonlyscan->indexorderdir)))
+	{
+		if (!index_skip(scandesc, direction, indexonlyscan->indexorderdir,
+						!node->ioss_FirstTupleEmitted, node->ioss_SkipPrefixSize))
+		{
+			/*
+			 * Reached end of index. At this point currPos is invalidated, and
+			 * we need to reset ioss_FirstTupleEmitted, since otherwise after
+			 * going backwards, reaching the end of index, and going forward
+			 * again we apply skip again. It would be incorrect and lead to an
+			 * extra skipped item.
+			 */
+			node->ioss_FirstTupleEmitted = false;
+			return ExecClearTuple(slot);
+		}
+		else
+		{
+			skipAttempts = 1;
+			skipped = true;
+			tid = &scandesc->xs_heaptid;
+		}
+	}
+
+	readDirection = skipped ? indexonlyscan->indexorderdir : direction;
 
 	/*
 	 * OK, now that we have what we need, fetch the next tuple.
 	 */
-	while ((tid = index_getnext_tid(scandesc, direction)) != NULL)
+	while (skipped || (tid = index_getnext_tid(scandesc, readDirection)) != NULL)
 	{
 		bool		tuple_from_heap = false;
 
 		CHECK_FOR_INTERRUPTS();
 
+		/*
+		 * While doing index only skip scan with advancing and reading in
+		 * different directions we can return to the same position where we
+		 * started after visibility check. Recognize such situations and skip
+		 * more.
+		 */
+		if ((readDirection != direction) &&
+			ItemPointerIsValid(&startTid) && ItemPointerEquals(&startTid, tid))
+		{
+			int i;
+			skipAttempts += 1;
+
+			for (i = 0; i < skipAttempts; i++)
+			{
+				if (!index_skip(scandesc, direction,
+								indexonlyscan->indexorderdir,
+								!node->ioss_FirstTupleEmitted,
+								node->ioss_SkipPrefixSize))
+				{
+					node->ioss_FirstTupleEmitted = false;
+					return ExecClearTuple(slot);
+				}
+			}
+
+			tid = &scandesc->xs_heaptid;
+		}
+
+		skipped = false;
+
 		/*
 		 * We can skip the heap fetch if the TID references a heap page on
 		 * which all tuples are known visible to everybody.  In any case,
@@ -250,6 +339,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 							  ItemPointerGetBlockNumber(tid),
 							  estate->es_snapshot);
 
+		node->ioss_FirstTupleEmitted = true;
+
 		return slot;
 	}
 
@@ -504,6 +595,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
 	indexstate->ss.ps.ExecProcNode = ExecIndexOnlyScan;
+	indexstate->ioss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->ioss_FirstTupleEmitted = false;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index d0a96a38e0..449aaec3ac 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -85,6 +85,13 @@ IndexNext(IndexScanState *node)
 	ScanDirection direction;
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
+	IndexScan *indexscan = (IndexScan *) node->ss.ps.plan;
+
+	/*
+	 * tells if the current position was reached via skipping. In this case
+	 * there is no nead for the index_getnext_tid
+	 */
+	bool skipped = false;
 
 	/*
 	 * extract necessary information from index scan node
@@ -92,7 +99,7 @@ IndexNext(IndexScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -117,6 +124,12 @@ IndexNext(IndexScanState *node)
 
 		node->iss_ScanDesc = scandesc;
 
+		/* Index skip scan assumes xs_want_itup, so set it to true */
+		if (indexscan->indexskipprefixsize > 0)
+			node->iss_ScanDesc->xs_want_itup = true;
+		else
+			node->iss_ScanDesc->xs_want_itup = false;
+
 		/*
 		 * If no run-time keys to calculate or they are ready, go ahead and
 		 * pass the scankeys to the index AM.
@@ -127,12 +140,48 @@ IndexNext(IndexScanState *node)
 						 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
 	}
 
+	/*
+	 * Check if we need to skip to the next key prefix, because we've been
+	 * asked to implement DISTINCT.
+	 *
+	 * When fetching a cursor in the direction opposite to a general scan
+	 * direction, the result must be what normal fetching should have
+	 * returned, but in reversed order. In other words, return the last or
+	 * first scanned tuple in a DISTINCT set, depending on a cursor direction.
+	 * Due to that we skip also when the first tuple wasn't emitted yet, but
+	 * the directions are opposite.
+	 */
+	if (node->iss_SkipPrefixSize > 0 &&
+		(node->iss_FirstTupleEmitted ||
+		 ScanDirectionsAreOpposite(direction, indexscan->indexorderdir)))
+	{
+		if (!index_skip(scandesc, direction, indexscan->indexorderdir,
+					   !node->iss_FirstTupleEmitted, node->iss_SkipPrefixSize))
+		{
+			/*
+			 * Reached end of index. At this point currPos is invalidated, and
+			 * we need to reset iss_FirstTupleEmitted, since otherwise after
+			 * going backwards, reaching the end of index, and going forward
+			 * again we apply skip again. It would be incorrect and lead to an
+			 * extra skipped item.
+			 */
+			node->iss_FirstTupleEmitted = false;
+			return ExecClearTuple(slot);
+		}
+		else
+		{
+			skipped = true;
+			index_fetch_heap(scandesc, slot);
+		}
+	}
+
 	/*
 	 * ok, now that we have what we need, fetch the next tuple.
 	 */
-	while (index_getnext_slot(scandesc, direction, slot))
+	while (skipped || index_getnext_slot(scandesc, direction, slot))
 	{
 		CHECK_FOR_INTERRUPTS();
+		skipped = false;
 
 		/*
 		 * If the index was lossy, we have to recheck the index quals using
@@ -149,6 +198,7 @@ IndexNext(IndexScanState *node)
 			}
 		}
 
+		node->iss_FirstTupleEmitted = true;
 		return slot;
 	}
 
@@ -910,6 +960,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
 	indexstate->ss.ps.ExecProcNode = ExecIndexScan;
+	indexstate->iss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->iss_FirstTupleEmitted = false;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 54ad62bb7f..e0cfd710c4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -493,6 +493,7 @@ _copyIndexScan(const IndexScan *from)
 	COPY_NODE_FIELD(indexorderbyorig);
 	COPY_NODE_FIELD(indexorderbyops);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
 
 	return newnode;
 }
@@ -518,6 +519,7 @@ _copyIndexOnlyScan(const IndexOnlyScan *from)
 	COPY_NODE_FIELD(indexorderby);
 	COPY_NODE_FIELD(indextlist);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 16083e7a7e..5f723cda4b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -562,6 +562,7 @@ _outIndexScan(StringInfo str, const IndexScan *node)
 	WRITE_NODE_FIELD(indexorderbyorig);
 	WRITE_NODE_FIELD(indexorderbyops);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
 }
 
 static void
@@ -576,6 +577,7 @@ _outIndexOnlyScan(StringInfo str, const IndexOnlyScan *node)
 	WRITE_NODE_FIELD(indexorderby);
 	WRITE_NODE_FIELD(indextlist);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 551ce6c41c..028d03a56d 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1820,6 +1820,7 @@ _readIndexScan(void)
 	READ_NODE_FIELD(indexorderbyorig);
 	READ_NODE_FIELD(indexorderbyops);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
 
 	READ_DONE();
 }
@@ -1839,6 +1840,7 @@ _readIndexOnlyScan(void)
 	READ_NODE_FIELD(indexorderby);
 	READ_NODE_FIELD(indextlist);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b5a0033721..710edf160a 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -124,6 +124,7 @@ int			max_parallel_workers_per_gather = 2;
 bool		enable_seqscan = true;
 bool		enable_indexscan = true;
 bool		enable_indexonlyscan = true;
+bool		enable_indexskipscan = true;
 bool		enable_bitmapscan = true;
 bool		enable_tidscan = true;
 bool		enable_sort = true;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 363f5349f1..fc3ec200d4 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -791,6 +791,16 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexPath  *ipath = (IndexPath *) lfirst(lc);
 
+		/*
+		 * To prevent unique paths from index skip scans being potentially used
+		 * when not needed scan keep them in a separate pathlist.
+		*/
+		if (ipath->indexskipprefix != 0)
+		{
+			add_unique_path(rel, (Path *) ipath);
+			continue;
+		}
+
 		if (index->amhasgettuple)
 			add_path(rel, (Path *) ipath);
 
@@ -880,6 +890,8 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	bool		pathkeys_possibly_useful;
 	bool		index_is_ordered;
 	bool		index_only_scan;
+	bool		not_empty_qual;
+	bool		can_skip;
 	int			indexcol;
 
 	/*
@@ -1029,6 +1041,17 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	index_only_scan = (scantype != ST_BITMAPSCAN &&
 					   check_index_only(rel, index));
 
+	/* Check if an index skip scan is possible. */
+	can_skip = enable_indexskipscan & index->amcanskip;
+
+	/*
+	 * In case of index scan (not index-only scan) skip scan is not supported
+	 * when there are qual conditions present. Check if they are.
+	 */
+	not_empty_qual = (root->parse->jointree != NULL &&
+					  root->parse->jointree->quals != NULL &&
+					  list_length((List *) root->parse->jointree->quals) != 0);
+
 	/*
 	 * 4. Generate an indexscan path if there are relevant restriction clauses
 	 * in the current clauses, OR the index ordering is potentially useful for
@@ -1056,6 +1079,13 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 								  false);
 		result = lappend(result, ipath);
 
+		/* Consider index skip scan as well */
+		if (useful_uniquekeys != NULL && can_skip &&
+			(index_only_scan || !not_empty_qual))
+			result = lappend(result,
+							 create_skipscan_unique_path(root, index,
+								 						 (Path *) ipath));
+
 		/*
 		 * If appropriate, consider parallel index scan.  We don't allow
 		 * parallel index scan for bitmap index scans.
@@ -1116,6 +1146,13 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 									  false);
 			result = lappend(result, ipath);
 
+			/* Consider index skip scan as well */
+			if (useful_uniquekeys != NULL && can_skip &&
+				(index_only_scan || !not_empty_qual))
+				result = lappend(result,
+								 create_skipscan_unique_path(root, index,
+															 (Path *) ipath));
+
 			/* If appropriate, consider parallel index scan */
 			if (index->amcanparallel &&
 				rel->consider_parallel && outer_relids == NULL &&
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index dff826a828..7b32f2cc7e 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -175,12 +175,14 @@ static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
 								 Oid indexid, List *indexqual, List *indexqualorig,
 								 List *indexorderby, List *indexorderbyorig,
 								 List *indexorderbyops,
-								 ScanDirection indexscandir);
+								 ScanDirection indexscandir,
+								 int skipprefix);
 static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 Index scanrelid, Oid indexid,
 										 List *indexqual, List *indexorderby,
 										 List *indextlist,
-										 ScanDirection indexscandir);
+										 ScanDirection indexscandir,
+										 int skipprefix);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
 											  List *indexqualorig);
@@ -2910,7 +2912,8 @@ create_indexscan_plan(PlannerInfo *root,
 												fixed_indexquals,
 												fixed_indexorderbys,
 												best_path->indexinfo->indextlist,
-												best_path->indexscandir);
+												best_path->indexscandir,
+												best_path->indexskipprefix);
 	else
 		scan_plan = (Scan *) make_indexscan(tlist,
 											qpqual,
@@ -2921,7 +2924,8 @@ create_indexscan_plan(PlannerInfo *root,
 											fixed_indexorderbys,
 											indexorderbys,
 											indexorderbyops,
-											best_path->indexscandir);
+											best_path->indexscandir,
+											best_path->indexskipprefix);
 
 	copy_generic_path_info(&scan_plan->plan, &best_path->path);
 
@@ -5184,7 +5188,8 @@ make_indexscan(List *qptlist,
 			   List *indexorderby,
 			   List *indexorderbyorig,
 			   List *indexorderbyops,
-			   ScanDirection indexscandir)
+			   ScanDirection indexscandir,
+			   int skipPrefixSize)
 {
 	IndexScan  *node = makeNode(IndexScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5201,6 +5206,7 @@ make_indexscan(List *qptlist,
 	node->indexorderbyorig = indexorderbyorig;
 	node->indexorderbyops = indexorderbyops;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
 
 	return node;
 }
@@ -5213,7 +5219,8 @@ make_indexonlyscan(List *qptlist,
 				   List *indexqual,
 				   List *indexorderby,
 				   List *indextlist,
-				   ScanDirection indexscandir)
+				   ScanDirection indexscandir,
+				   int skipPrefixSize)
 {
 	IndexOnlyScan *node = makeNode(IndexOnlyScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5228,6 +5235,7 @@ make_indexonlyscan(List *qptlist,
 	node->indexorderby = indexorderby;
 	node->indextlist = indextlist;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a7de8476d9..88305df5c3 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4828,13 +4828,19 @@ create_distinct_paths(PlannerInfo *root,
 			Path	   *path = (Path *) lfirst(lc);
 
 			if (pathkeys_contained_in(needed_pathkeys, path->pathkeys))
-			{
 				add_path(distinct_rel, (Path *)
 						 create_upper_unique_path(root, distinct_rel,
 												  path,
 												  list_length(root->distinct_pathkeys),
 												  numDistinctRows));
-			}
+		}
+
+		foreach(lc, input_rel->unique_pathlist)
+		{
+			Path	   *path = (Path *) lfirst(lc);
+
+			if (uniquekeys_contained_in(needed_pathkeys, path->uniquekeys))
+				add_path(distinct_rel, path);
 		}
 
 		/* For explicit-sort case, always use the more rigorous clause */
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index a4dfafbb59..b0ce17b0d6 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2564,6 +2564,7 @@ create_projection_path(PlannerInfo *root,
 	pathnode->path.pathkeys = subpath->pathkeys;
 
 	pathnode->subpath = subpath;
+	pathnode->path.uniquekeys = subpath->uniquekeys;
 
 	/*
 	 * We might not need a separate Result node.  If the input plan node type
@@ -2929,6 +2930,73 @@ create_upper_unique_path(PlannerInfo *root,
 	return pathnode;
 }
 
+/*
+ * create_skipscan_unique_path
+ *	  Creates a pathnode the same as an existing IndexPath except based on
+ *	  skipping duplicate values.  This may or may not be cheaper than using
+ *	  create_upper_unique_path.
+ *
+ * The input path must be an IndexPath for an index that supports amskip.
+ */
+IndexPath *
+create_skipscan_unique_path(PlannerInfo *root, IndexOptInfo *index,
+							Path *basepath)
+{
+	IndexPath 	*pathnode = makeNode(IndexPath);
+	int 		numDistinctRows;
+	int 		distinctPrefixKeys;
+	ListCell 	*lc;
+	List 	   	*exprs = NIL;
+
+
+	distinctPrefixKeys = list_length(root->query_uniquekeys);
+
+	Assert(IsA(basepath, IndexPath));
+
+	/* We don't want to modify basepath, so make a copy. */
+	memcpy(pathnode, basepath, sizeof(IndexPath));
+
+	/*
+	 * Normally we can think about distinctPrefixKeys as just
+	 * a number of distinct keys. But if lets say we have a
+	 * distinct key a, and the index contains b, a in exactly
+	 * this order. In such situation we need to use position
+	 * of a in the index as distinctPrefixKeys, otherwise skip
+	 * will happen only by the first column.
+	 */
+	foreach(lc, root->query_uniquekeys)
+	{
+		UniqueKey *uniquekey = (UniqueKey *) lfirst(lc);
+		EquivalenceMember *em =
+			lfirst_node(EquivalenceMember,
+						list_head(uniquekey->eq_clause->ec_members));
+		Var *var = (Var *) em->em_expr;
+
+		exprs = lappend(exprs, em->em_expr);
+
+		for (int i = 0; i < index->ncolumns; i++)
+		{
+			if (index->indexkeys[i] == var->varattno)
+			{
+				distinctPrefixKeys = Max(i + 1, distinctPrefixKeys);
+				break;
+			}
+		}
+	}
+
+	Assert(distinctPrefixKeys > 0);
+	pathnode->indexskipprefix = distinctPrefixKeys;
+
+	numDistinctRows = estimate_num_groups(root, exprs,
+										  pathnode->path.rows,
+										  NULL);
+
+	pathnode->path.total_cost = pathnode->path.startup_cost * numDistinctRows;
+	pathnode->path.rows = numDistinctRows;
+
+	return pathnode;
+}
+
 /*
  * create_agg_path
  *	  Creates a pathnode that represents performing aggregation/grouping
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d82fc5ab8b..f65b299f37 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -271,6 +271,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			info->amoptionalkey = amroutine->amoptionalkey;
 			info->amsearcharray = amroutine->amsearcharray;
 			info->amsearchnulls = amroutine->amsearchnulls;
+			info->amcanskip = (amroutine->amskip != NULL);
 			info->amcanparallel = amroutine->amcanparallel;
 			info->amhasgettuple = (amroutine->amgettuple != NULL);
 			info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index cacbe904db..7c71ee4499 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -923,6 +923,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_indexskipscan", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of index-skip-scan plans."),
+			NULL
+		},
+		&enable_indexskipscan,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_bitmapscan", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of bitmap-scan plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e1048c0047..a002ee2143 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -353,6 +353,7 @@
 #enable_hashjoin = on
 #enable_indexscan = on
 #enable_indexonlyscan = on
+#enable_indexskipscan = on
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 3b3e22f73d..3d39cd9d07 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -130,6 +130,13 @@ typedef void (*amrescan_function) (IndexScanDesc scan,
 typedef bool (*amgettuple_function) (IndexScanDesc scan,
 									 ScanDirection direction);
 
+/* skip past duplicates in a given prefix */
+typedef bool (*amskip_function) (IndexScanDesc scan,
+								 ScanDirection dir,
+								 ScanDirection indexdir,
+								 bool start,
+								 int prefix);
+
 /* fetch all valid tuples */
 typedef int64 (*amgetbitmap_function) (IndexScanDesc scan,
 									   TIDBitmap *tbm);
@@ -229,6 +236,7 @@ typedef struct IndexAmRoutine
 	amendscan_function amendscan;
 	ammarkpos_function ammarkpos;	/* can be NULL */
 	amrestrpos_function amrestrpos; /* can be NULL */
+	amskip_function amskip;				/* can be NULL */
 
 	/* interface functions to support parallel index scans */
 	amestimateparallelscan_function amestimateparallelscan; /* can be NULL */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 7e9364a50c..815de4e4dd 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,8 @@ extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
 extern IndexBulkDeleteResult *index_vacuum_cleanup(IndexVacuumInfo *info,
 												   IndexBulkDeleteResult *stats);
 extern bool index_can_return(Relation indexRelation, int attno);
+extern bool index_skip(IndexScanDesc scan, ScanDirection direction,
+					   ScanDirection indexdir, bool start, int prefix);
 extern RegProcedure index_getprocid(Relation irel, AttrNumber attnum,
 									uint16 procnum);
 extern FmgrInfo *index_getprocinfo(Relation irel, AttrNumber attnum,
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 20ace69dab..e098c6a1ab 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -662,6 +662,9 @@ typedef struct BTScanOpaqueData
 	 */
 	int			markItemIndex;	/* itemIndex, or -1 if not valid */
 
+	/* Work space for _bt_skip */
+	BTScanInsert	skipScanKey;	/* used to control skipping */
+
 	/* keep these last in struct for efficiency */
 	BTScanPosData currPos;		/* current position data */
 	BTScanPosData markPos;		/* marked position, if any */
@@ -793,6 +796,8 @@ extern OffsetNumber _bt_binsrch_insert(Relation rel, BTInsertState insertstate);
 extern int32 _bt_compare(Relation rel, BTScanInsert key, Page page, OffsetNumber offnum);
 extern bool _bt_first(IndexScanDesc scan, ScanDirection dir);
 extern bool _bt_next(IndexScanDesc scan, ScanDirection dir);
+extern bool _bt_skip(IndexScanDesc scan, ScanDirection dir,
+					 ScanDirection indexdir, bool start, int prefix);
 extern Buffer _bt_get_endpoint(Relation rel, uint32 level, bool rightmost,
 							   Snapshot snapshot);
 
@@ -817,6 +822,8 @@ extern void _bt_end_vacuum_callback(int code, Datum arg);
 extern Size BTreeShmemSize(void);
 extern void BTreeShmemInit(void);
 extern bytea *btoptions(Datum reloptions, bool validate);
+extern bool btskip(IndexScanDesc scan, ScanDirection dir,
+				   ScanDirection indexdir, bool start, int prefix);
 extern bool btproperty(Oid index_oid, int attno,
 					   IndexAMProperty prop, const char *propname,
 					   bool *res, bool *isnull);
diff --git a/src/include/access/sdir.h b/src/include/access/sdir.h
index 23feb90986..094a127464 100644
--- a/src/include/access/sdir.h
+++ b/src/include/access/sdir.h
@@ -55,4 +55,11 @@ typedef enum ScanDirection
 #define ScanDirectionIsForward(direction) \
 	((bool) ((direction) == ForwardScanDirection))
 
+/*
+ * ScanDirectionsAreOpposite
+ *		True iff scan directions are backward/forward or forward/backward.
+ */
+#define ScanDirectionsAreOpposite(dirA, dirB) \
+	((bool) (dirA != NoMovementScanDirection && dirA == -dirB))
+
 #endif							/* SDIR_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 1f6f5bbc20..2c6acc160a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1423,6 +1423,8 @@ typedef struct IndexScanState
 	ExprContext *iss_RuntimeContext;
 	Relation	iss_RelationDesc;
 	struct IndexScanDescData *iss_ScanDesc;
+	int         iss_SkipPrefixSize;
+	bool		iss_FirstTupleEmitted;
 
 	/* These are needed for re-checking ORDER BY expr ordering */
 	pairingheap *iss_ReorderQueue;
@@ -1452,6 +1454,8 @@ typedef struct IndexScanState
  *		TableSlot		   slot for holding tuples fetched from the table
  *		VMBuffer		   buffer in use for visibility map testing, if any
  *		PscanLen		   size of parallel index-only scan descriptor
+ *		SkipPrefixSize	   number of keys for skip-based DISTINCT
+ *		FirstTupleEmitted  has the first tuple been emitted
  * ----------------
  */
 typedef struct IndexOnlyScanState
@@ -1470,6 +1474,8 @@ typedef struct IndexOnlyScanState
 	struct IndexScanDescData *ioss_ScanDesc;
 	TupleTableSlot *ioss_TableSlot;
 	Buffer		ioss_VMBuffer;
+	int         ioss_SkipPrefixSize;
+	bool		ioss_FirstTupleEmitted;
 	Size		ioss_PscanLen;
 } IndexOnlyScanState;
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0de27f0ef3..ce00060ee0 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -840,6 +840,7 @@ struct IndexOptInfo
 	bool		amsearchnulls;	/* can AM search for NULL/NOT NULL entries? */
 	bool		amhasgettuple;	/* does AM have amgettuple interface? */
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
+	bool		amcanskip;		/* can AM skip duplicate values? */
 	bool		amcanparallel;	/* does AM support parallel scan? */
 	/* Rather than include amapi.h here, we declare amcostestimate like this */
 	void		(*amcostestimate) ();	/* AM's cost estimator */
@@ -1190,6 +1191,9 @@ typedef struct Path
  * we need not recompute them when considering using the same index in a
  * bitmap index/heap scan (see BitmapHeapPath).  The costs of the IndexPath
  * itself represent the costs of an IndexScan or IndexOnlyScan plan type.
+ *
+ * 'indexskipprefix' represents the number of columns to consider for skip
+ * scans.
  *----------
  */
 typedef struct IndexPath
@@ -1202,6 +1206,7 @@ typedef struct IndexPath
 	ScanDirection indexscandir;
 	Cost		indextotalcost;
 	Selectivity indexselectivity;
+	int			indexskipprefix;
 } IndexPath;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 32c0d87f80..03a00e8e1d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -409,6 +409,8 @@ typedef struct IndexScan
 	List	   *indexorderbyorig;	/* the same in original form */
 	List	   *indexorderbyops;	/* OIDs of sort ops for ORDER BY exprs */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for distinct
+										 * scans */
 } IndexScan;
 
 /* ----------------
@@ -436,6 +438,8 @@ typedef struct IndexOnlyScan
 	List	   *indexorderby;	/* list of index ORDER BY exprs */
 	List	   *indextlist;		/* TargetEntry list describing index's cols */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for distinct
+										 * scans */
 } IndexOnlyScan;
 
 /* ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index cb012ba198..847f34f02b 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -50,6 +50,7 @@ extern PGDLLIMPORT int max_parallel_workers_per_gather;
 extern PGDLLIMPORT bool enable_seqscan;
 extern PGDLLIMPORT bool enable_indexscan;
 extern PGDLLIMPORT bool enable_indexonlyscan;
+extern PGDLLIMPORT bool enable_indexskipscan;
 extern PGDLLIMPORT bool enable_bitmapscan;
 extern PGDLLIMPORT bool enable_tidscan;
 extern PGDLLIMPORT bool enable_sort;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index fd25997af5..ba3eaffd8a 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -202,6 +202,9 @@ extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
 												 Path *subpath,
 												 int numCols,
 												 double numGroups);
+extern IndexPath *create_skipscan_unique_path(PlannerInfo *root,
+											  IndexOptInfo *index,
+											  Path *subpath);
 extern AggPath *create_agg_path(PlannerInfo *root,
 								RelOptInfo *rel,
 								Path *subpath,
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index f3696c6d1d..c50c6d1866 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -244,3 +244,624 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+CREATE INDEX ON distinct_a ((a + 1));
+ANALYZE distinct_a;
+SELECT DISTINCT a FROM distinct_a;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+ a 
+---
+ 1
+(1 row)
+
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Index Only Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: true
+(2 rows)
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+-- test index skip scan for expressions
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT (a + 1) FROM distinct_a;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Scan using distinct_a_expr_idx on distinct_a
+   Skip scan: true
+(2 rows)
+
+SELECT DISTINCT (a + 1) FROM distinct_a;
+ ?column? 
+----------
+        2
+        3
+        4
+        5
+        6
+(5 rows)
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: true
+   Index Cond: (b = 2)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: true
+   Index Cond: (b = 2)
+(3 rows)
+
+DROP INDEX distinct_a_b_a;
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+FETCH FROM c;
+ a | b 
+---+---
+ 1 | 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+END;
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+FETCH FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+END;
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Index Only Scan using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: true
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 1 | 2
+ 3 | 1 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 1 | 2
+ 1 | 1 | 2
+(2 rows)
+
+END;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Index Only Scan Backward using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: true
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 2 | 2
+ 1 | 2 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 2 | 2
+ 3 | 2 | 2
+(2 rows)
+
+END;
+DROP TABLE distinct_abc;
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+ 2 | 1 | 10
+ 3 | 1 | 10
+ 4 | 1 | 10
+ 5 | 1 | 10
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Index Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: true
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Unique
+   ->  Bitmap Heap Scan on distinct_a
+         Recheck Cond: (a = 1)
+         ->  Bitmap Index Scan on distinct_a_a_b_idx
+               Index Cond: (a = 1)
+(5 rows)
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Unique
+   ->  Index Scan using distinct_a_a_b_idx on distinct_a
+         Index Cond: (b = 2)
+         Filter: (c = 10)
+(4 rows)
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+ a | a 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+ 4 | 4
+ 5 | 5
+(5 rows)
+
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+ a | ?column? 
+---+----------
+ 1 |        1
+ 2 |        1
+ 3 |        1
+ 4 |        1
+ 5 |        1
+(5 rows)
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+FETCH FROM c;
+ a 
+---
+ 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a 
+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+END;
+DROP TABLE distinct_a;
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 |  9999
+ 1 | 10000
+(5 rows)
+
+DROP TABLE distinct_visibility;
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Index Only Scan using distinct_boundaries_a_b_c_idx on distinct_boundaries
+   Skip scan: true
+   Index Cond: ((b >= 1) AND (c = 0))
+(3 rows)
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+ a | b | c 
+---+---+---
+ 1 | 2 | 0
+ 2 | 2 | 0
+ 3 | 2 | 0
+ 4 | 2 | 0
+ 5 | 2 | 0
+(5 rows)
+
+DROP TABLE distinct_boundaries;
+-- test tuple killing
+-- DESC ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed where a = 3;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 5 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 1 | 1000 | 0 | 10
+(4 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 1 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 5 | 1000 | 0 | 10
+(4 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
+-- regular ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed where a = 3;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a, b;
+    FETCH FORWARD ALL FROM c;
+ a | b | c | d  
+---+---+---+----
+ 1 | 1 | 1 | 10
+ 2 | 1 | 1 | 10
+ 4 | 1 | 1 | 10
+ 5 | 1 | 1 | 10
+(4 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a | b | c | d  
+---+---+---+----
+ 5 | 1 | 1 | 10
+ 4 | 1 | 1 | 10
+ 2 | 1 | 1 | 10
+ 1 | 1 | 1 | 10
+(4 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
+-- partial delete
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed WHERE a = 3 AND b <= 999;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 5 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 3 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 1 | 1000 | 0 | 10
+(5 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 1 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 3 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 5 | 1000 | 0 | 10
+(5 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index a1c90eb905..bd3b373515 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -78,6 +78,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashjoin                | on
  enable_indexonlyscan           | on
  enable_indexscan               | on
+ enable_indexskipscan           | on
  enable_material                | on
  enable_mergejoin               | on
  enable_nestloop                | on
@@ -89,7 +90,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(17 rows)
+(18 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index a605e86449..3441a0efc6 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -73,3 +73,257 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+CREATE INDEX ON distinct_a ((a + 1));
+ANALYZE distinct_a;
+
+SELECT DISTINCT a FROM distinct_a;
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+
+-- test index skip scan for expressions
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT (a + 1) FROM distinct_a;
+SELECT DISTINCT (a + 1) FROM distinct_a;
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+DROP INDEX distinct_a_b_a;
+
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+DROP TABLE distinct_abc;
+
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+DROP TABLE distinct_a;
+
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DROP TABLE distinct_visibility;
+
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+DROP TABLE distinct_boundaries;
+
+-- test tuple killing
+
+-- DESC ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed where a = 3;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
+
+-- regular ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed where a = 3;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a, b;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
+
+-- partial delete
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed WHERE a = 3 AND b <= 999;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
-- 
2.21.0

#59

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Andy Fan (#52)

Re: Index Skip Scan

On Wed, Mar 11, 2020 at 06:56:09PM +0800, Andy Fan wrote:

There was a dedicated thread [1] where David explain his idea very
detailed, and you can also check that messages around that message for
the context. hope it helps.

Thanks for pointing out to this thread! Somehow I've missed it, and now
looks like we need to make some efforts to align patches for index skip
scan and distincClause elimination.

#60

Andy Fan

zhihui.fan1213@gmail.com

almost 6 years ago

In reply to: Dmitry Dolgov (#59)

Re: Index Skip Scan

On Wed, Mar 25, 2020 at 12:41 AM Dmitry Dolgov <9erthalion6@gmail.com>
wrote:

On Wed, Mar 11, 2020 at 06:56:09PM +0800, Andy Fan wrote:

There was a dedicated thread [1] where David explain his idea very
detailed, and you can also check that messages around that message for
the context. hope it helps.

Thanks for pointing out to this thread! Somehow I've missed it, and now
looks like we need to make some efforts to align patches for index skip
scan and distincClause elimination.

Yes:). Looks Index skip scan is a way of make a distinct result without a
real
distinct node, which happens after the UniqueKeys check where I try to see
if
the result is unique already and before the place where create a unique node
for distinct node(With index skip scan we don't need it all). Currently in
my patch,
the logical here is 1). Check the UniqueKey to see if the result is not
unique already.
if not, go to next 2). After the distinct paths are created, I will add
the result of distinct
path as a unique key. Will you add the index skip scan path during
create_distincts_paths
and add the UniqueKey to RelOptInfo? if so I guess my current patch can
handle it since
it cares about the result of distinct path but no worried about how we
archive that.

Best Regards
Andy Fan

#61

Dilip Kumar

dilipbalaut@gmail.com

almost 6 years ago

In reply to: Dmitry Dolgov (#58)

Re: Index Skip Scan

On Tue, Mar 24, 2020 at 10:08 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Wed, Mar 11, 2020 at 11:17:51AM +1300, David Rowley wrote:

Yes, I was complaining that a ProjectionPath breaks the optimisation
and I don't believe there's any reason that it should.

I believe the way to make that work correctly requires paying
attention to the Path's uniquekeys rather than what type of path it
is.

Thanks for the suggestion. As a result of the discussion I've modified
the patch, does it look similar to what you had in mind?

In this version if all conditions are met and there are corresponding
unique keys, a new index skip scan path will be added to
unique_pathlists. In case if requested distinct clauses match with
unique keys, create_distinct_paths can choose this path without needen
to know what kind of path is it. Also unique_keys are passed through
ProjectionPath, so optimization for the example mentioned in this thread
before now should work (I've added one test for that).

I haven't changed anything about UniqueKey structure itself (one of the
suggestions was about Expr instead of EquivalenceClass), but I believe
we need anyway to figure out how two existing imlementation (in this
patch and from [1]) of this idea can be connected.

[1]: /messages/by-id/CAKU4AWrwZMAL=uaFUDMf4WGOVkEL3ONbatqju9nSXTUucpp_pw@mail.gmail.com

---
src/backend/nodes/outfuncs.c | 14 ++++++
src/backend/nodes/print.c | 39 +++++++++++++++
src/backend/optimizer/path/Makefile | 3 +-
src/backend/optimizer/path/allpaths.c | 8 +++
src/backend/optimizer/path/indxpath.c | 41 ++++++++++++++++
src/backend/optimizer/path/pathkeys.c | 71 ++++++++++++++++++++++-----
src/backend/optimizer/plan/planagg.c | 1 +
src/backend/optimizer/plan/planmain.c | 1 +
src/backend/optimizer/plan/planner.c | 37 +++++++++++++-
src/backend/optimizer/util/pathnode.c | 46 +++++++++++++----
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 19 +++++++
src/include/nodes/print.h | 1 +
src/include/optimizer/pathnode.h | 2 +
src/include/optimizer/paths.h | 11 +++++
15 files changed, 272 insertions(+), 23 deletions(-)

Seems like you forgot to add the uniquekey.c file in the
v33-0001-Unique-key.patch.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#62

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Dilip Kumar (#61)

2 attachment(s)

Re: Index Skip Scan

On Wed, Mar 25, 2020 at 11:31:56AM +0530, Dilip Kumar wrote:

Seems like you forgot to add the uniquekey.c file in the
v33-0001-Unique-key.patch.

Oh, you're right, thanks. Here is the corrected patch.

Attachments:

v33-0001-Unique-key.patchtext/x-diff; charset=us-asciiDownload

From 15989c5250214fea8606a56afd1eeaf760b8723e Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Tue, 24 Mar 2020 17:04:32 +0100
Subject: [PATCH v33 1/2] Unique key

Design by David Rowley.

Author: Jesper Pedersen
---
 src/backend/nodes/outfuncs.c           |  14 +++
 src/backend/nodes/print.c              |  39 +++++++
 src/backend/optimizer/path/Makefile    |   3 +-
 src/backend/optimizer/path/allpaths.c  |   8 ++
 src/backend/optimizer/path/indxpath.c  |  41 ++++++++
 src/backend/optimizer/path/pathkeys.c  |  71 +++++++++++--
 src/backend/optimizer/path/uniquekey.c | 136 +++++++++++++++++++++++++
 src/backend/optimizer/plan/planagg.c   |   1 +
 src/backend/optimizer/plan/planmain.c  |   1 +
 src/backend/optimizer/plan/planner.c   |  37 ++++++-
 src/backend/optimizer/util/pathnode.c  |  46 +++++++--
 src/include/nodes/nodes.h              |   1 +
 src/include/nodes/pathnodes.h          |  19 ++++
 src/include/nodes/print.h              |   1 +
 src/include/optimizer/pathnode.h       |   2 +
 src/include/optimizer/paths.h          |  11 ++
 16 files changed, 408 insertions(+), 23 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekey.c

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index d76fae44b8..16083e7a7e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1723,6 +1723,7 @@ _outPathInfo(StringInfo str, const Path *node)
 	WRITE_FLOAT_FIELD(startup_cost, "%.2f");
 	WRITE_FLOAT_FIELD(total_cost, "%.2f");
 	WRITE_NODE_FIELD(pathkeys);
+	WRITE_NODE_FIELD(uniquekeys);
 }
 
 /*
@@ -2205,6 +2206,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(eq_classes);
 	WRITE_BOOL_FIELD(ec_merging_done);
 	WRITE_NODE_FIELD(canon_pathkeys);
+	WRITE_NODE_FIELD(canon_uniquekeys);
 	WRITE_NODE_FIELD(left_join_clauses);
 	WRITE_NODE_FIELD(right_join_clauses);
 	WRITE_NODE_FIELD(full_join_clauses);
@@ -2214,6 +2216,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(placeholder_list);
 	WRITE_NODE_FIELD(fkey_list);
 	WRITE_NODE_FIELD(query_pathkeys);
+	WRITE_NODE_FIELD(query_uniquekeys);
 	WRITE_NODE_FIELD(group_pathkeys);
 	WRITE_NODE_FIELD(window_pathkeys);
 	WRITE_NODE_FIELD(distinct_pathkeys);
@@ -2401,6 +2404,14 @@ _outPathKey(StringInfo str, const PathKey *node)
 	WRITE_BOOL_FIELD(pk_nulls_first);
 }
 
+static void
+_outUniqueKey(StringInfo str, const UniqueKey *node)
+{
+	WRITE_NODE_TYPE("UNIQUEKEY");
+
+	WRITE_NODE_FIELD(eq_clause);
+}
+
 static void
 _outPathTarget(StringInfo str, const PathTarget *node)
 {
@@ -4092,6 +4103,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PathKey:
 				_outPathKey(str, obj);
 				break;
+			case T_UniqueKey:
+				_outUniqueKey(str, obj);
+				break;
 			case T_PathTarget:
 				_outPathTarget(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 42476724d8..d286b34544 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -459,6 +459,45 @@ print_pathkeys(const List *pathkeys, const List *rtable)
 	printf(")\n");
 }
 
+/*
+ * print_uniquekeys -
+ *	  uniquekeys list of UniqueKeys
+ */
+void
+print_uniquekeys(const List *uniquekeys, const List *rtable)
+{
+	ListCell   *l;
+
+	printf("(");
+	foreach(l, uniquekeys)
+	{
+		UniqueKey *unique_key = (UniqueKey *) lfirst(l);
+		EquivalenceClass *eclass = (EquivalenceClass *) unique_key->eq_clause;
+		ListCell   *k;
+		bool		first = true;
+
+		/* chase up */
+		while (eclass->ec_merged)
+			eclass = eclass->ec_merged;
+
+		printf("(");
+		foreach(k, eclass->ec_members)
+		{
+			EquivalenceMember *mem = (EquivalenceMember *) lfirst(k);
+
+			if (first)
+				first = false;
+			else
+				printf(", ");
+			print_expr((Node *) mem->em_expr, rtable);
+		}
+		printf(")");
+		if (lnext(uniquekeys, l))
+			printf(", ");
+	}
+	printf(")\n");
+}
+
 /*
  * print_tl
  *	  print targetlist in a more legible way.
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..63cc1505d9 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekey.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8286d9cf34..bbc13e6141 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3954,6 +3954,14 @@ print_path(PlannerInfo *root, Path *path, int indent)
 		print_pathkeys(path->pathkeys, root->parse->rtable);
 	}
 
+	if (path->uniquekeys)
+	{
+		for (i = 0; i < indent; i++)
+			printf("\t");
+		printf("  uniquekeys: ");
+		print_uniquekeys(path->uniquekeys, root->parse->rtable);
+	}
+
 	if (join)
 	{
 		JoinPath   *jp = (JoinPath *) path;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2a50272da6..363f5349f1 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -189,6 +189,7 @@ static Expr *match_clause_to_ordering_op(IndexOptInfo *index,
 static bool ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 									   EquivalenceClass *ec, EquivalenceMember *em,
 									   void *arg);
+static List *get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys);
 
 
 /*
@@ -874,6 +875,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	List	   *orderbyclausecols;
 	List	   *index_pathkeys;
 	List	   *useful_pathkeys;
+	List	   *useful_uniquekeys = NIL;
 	bool		found_lower_saop_clause;
 	bool		pathkeys_possibly_useful;
 	bool		index_is_ordered;
@@ -1036,11 +1038,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	if (index_clauses != NIL || useful_pathkeys != NIL || useful_predicate ||
 		index_only_scan)
 	{
+		if (has_useful_uniquekeys(root))
+			useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 		ipath = create_index_path(root, index,
 								  index_clauses,
 								  orderbyclauses,
 								  orderbyclausecols,
 								  useful_pathkeys,
+								  useful_uniquekeys,
 								  index_is_ordered ?
 								  ForwardScanDirection :
 								  NoMovementScanDirection,
@@ -1063,6 +1069,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 									  orderbyclauses,
 									  orderbyclausecols,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  index_is_ordered ?
 									  ForwardScanDirection :
 									  NoMovementScanDirection,
@@ -1093,11 +1100,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 													index_pathkeys);
 		if (useful_pathkeys != NIL)
 		{
+			if (has_useful_uniquekeys(root))
+				useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 			ipath = create_index_path(root, index,
 									  index_clauses,
 									  NIL,
 									  NIL,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  BackwardScanDirection,
 									  index_only_scan,
 									  outer_relids,
@@ -1115,6 +1126,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 										  NIL,
 										  NIL,
 										  useful_pathkeys,
+										  useful_uniquekeys,
 										  BackwardScanDirection,
 										  index_only_scan,
 										  outer_relids,
@@ -3365,6 +3377,35 @@ match_clause_to_ordering_op(IndexOptInfo *index,
 	return clause;
 }
 
+/*
+ * get_uniquekeys_for_index
+ */
+static List *
+get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys)
+{
+	ListCell *lc;
+
+	if (pathkeys)
+	{
+		List *uniquekeys = NIL;
+		foreach(lc, pathkeys)
+		{
+			UniqueKey *unique_key;
+			PathKey *pk = (PathKey *) lfirst(lc);
+			EquivalenceClass *ec = (EquivalenceClass *) pk->pk_eclass;
+
+			unique_key = makeNode(UniqueKey);
+			unique_key->eq_clause = ec;
+
+			uniquekeys = lappend(uniquekeys, unique_key);
+		}
+
+		if (uniquekeys_contained_in(root->canon_uniquekeys, uniquekeys))
+			return uniquekeys;
+	}
+
+	return NIL;
+}
 
 /****************************************************************************
  *				----  ROUTINES TO DO PARTIAL INDEX PREDICATE TESTS	----
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 71b9d42c99..054df9a617 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -29,6 +29,7 @@
 #include "utils/lsyscache.h"
 
 
+static bool pathkey_is_unique(PathKey *new_pathkey, List *pathkeys);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
 											 RelOptInfo *partrel,
@@ -96,6 +97,29 @@ make_canonical_pathkey(PlannerInfo *root,
 	return pk;
 }
 
+/*
+ * pathkey_is_unique
+ *	   Checks if the new pathkey's equivalence class is the same as that of
+ *     any existing member of the pathkey list.
+ */
+static bool
+pathkey_is_unique(PathKey *new_pathkey, List *pathkeys)
+{
+	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
+	ListCell   *lc;
+
+	/* If same EC already is already in the list, then not unique */
+	foreach(lc, pathkeys)
+	{
+		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
+
+		if (new_ec == old_pathkey->pk_eclass)
+			return false;
+	}
+
+	return true;
+}
+
 /*
  * pathkey_is_redundant
  *	   Is a pathkey redundant with one already in the given list?
@@ -135,22 +159,12 @@ static bool
 pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys)
 {
 	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
-	ListCell   *lc;
 
 	/* Check for EC containing a constant --- unconditionally redundant */
 	if (EC_MUST_BE_REDUNDANT(new_ec))
 		return true;
 
-	/* If same EC already used in list, then redundant */
-	foreach(lc, pathkeys)
-	{
-		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
-
-		if (new_ec == old_pathkey->pk_eclass)
-			return true;
-	}
-
-	return false;
+	return !pathkey_is_unique(new_pathkey, pathkeys);
 }
 
 /*
@@ -1098,6 +1112,41 @@ make_pathkeys_for_sortclauses(PlannerInfo *root,
 	return pathkeys;
 }
 
+/*
+ * make_pathkeys_for_uniquekeyclauses
+ *		Generate a pathkeys list to be used for uniquekey clauses
+ */
+List *
+make_pathkeys_for_uniquekeys(PlannerInfo *root,
+							 List *sortclauses,
+							 List *tlist)
+{
+	List	   *pathkeys = NIL;
+	ListCell   *l;
+
+	foreach(l, sortclauses)
+	{
+		SortGroupClause *sortcl = (SortGroupClause *) lfirst(l);
+		Expr	   *sortkey;
+		PathKey    *pathkey;
+
+		sortkey = (Expr *) get_sortgroupclause_expr(sortcl, tlist);
+		Assert(OidIsValid(sortcl->sortop));
+		pathkey = make_pathkey_from_sortop(root,
+										   sortkey,
+										   root->nullable_baserels,
+										   sortcl->sortop,
+										   sortcl->nulls_first,
+										   sortcl->tleSortGroupRef,
+										   true);
+
+		if (pathkey_is_unique(pathkey, pathkeys))
+			pathkeys = lappend(pathkeys, pathkey);
+	}
+
+	return pathkeys;
+}
+
 /****************************************************************************
  *		PATHKEYS AND MERGECLAUSES
  ****************************************************************************/
diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c
new file mode 100644
index 0000000000..c421401d0f
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekey.c
@@ -0,0 +1,136 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekey.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekey.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "nodes/pg_list.h"
+
+static UniqueKey *make_canonical_uniquekey(PlannerInfo *root, EquivalenceClass *eclass);
+
+/*
+ * Build a list of unique keys
+ */
+List*
+build_uniquekeys(PlannerInfo *root, List *sortclauses)
+{
+	List *result = NIL;
+	List *sortkeys;
+	ListCell *l;
+
+	sortkeys = make_pathkeys_for_uniquekeys(root,
+											sortclauses,
+											root->processed_tlist);
+
+	/* Create a uniquekey and add it to the list */
+	foreach(l, sortkeys)
+	{
+		PathKey    *pathkey = (PathKey *) lfirst(l);
+		EquivalenceClass *ec = pathkey->pk_eclass;
+		UniqueKey *unique_key = make_canonical_uniquekey(root, ec);
+
+		result = lappend(result, unique_key);
+	}
+
+	return result;
+}
+
+/*
+ * uniquekeys_contained_in
+ *	  Are the keys2 included in the keys1 superset
+ */
+bool
+uniquekeys_contained_in(List *keys1, List *keys2)
+{
+	ListCell   *key1,
+			   *key2;
+
+	foreach(key2, keys2)
+	{
+		bool found = false;
+		UniqueKey  *uniquekey2 = (UniqueKey *) lfirst(key2);
+
+		foreach(key1, keys1)
+		{
+			UniqueKey  *uniquekey1 = (UniqueKey *) lfirst(key1);
+
+			if (uniquekey1->eq_clause == uniquekey2->eq_clause)
+				return true;
+		}
+
+		if (!found)
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * has_useful_uniquekeys
+ *		Detect whether the planner could have any uniquekeys that are
+ *		useful.
+ */
+bool
+has_useful_uniquekeys(PlannerInfo *root)
+{
+	if (root->query_uniquekeys != NIL)
+		return true;	/* there are some */
+	return false;		/* definitely useless */
+}
+
+/*
+ * make_canonical_uniquekey
+ *	  Given the parameters for a UniqueKey, find any pre-existing matching
+ *	  uniquekey in the query's list of "canonical" uniquekeys.  Make a new
+ *	  entry if there's not one already.
+ *
+ * Note that this function must not be used until after we have completed
+ * merging EquivalenceClasses.  (We don't try to enforce that here; instead,
+ * equivclass.c will complain if a merge occurs after root->canon_uniquekeys
+ * has become nonempty.)
+ */
+static UniqueKey *
+make_canonical_uniquekey(PlannerInfo *root,
+						 EquivalenceClass *eclass)
+{
+	UniqueKey  *uk;
+	ListCell   *lc;
+	MemoryContext oldcontext;
+
+	/* The passed eclass might be non-canonical, so chase up to the top */
+	while (eclass->ec_merged)
+		eclass = eclass->ec_merged;
+
+	foreach(lc, root->canon_uniquekeys)
+	{
+		uk = (UniqueKey *) lfirst(lc);
+		if (eclass == uk->eq_clause)
+			return uk;
+	}
+
+	/*
+	 * Be sure canonical uniquekeys are allocated in the main planning context.
+	 * Not an issue in normal planning, but it is for GEQO.
+	 */
+	oldcontext = MemoryContextSwitchTo(root->planner_cxt);
+
+	uk = makeNode(UniqueKey);
+	uk->eq_clause = eclass;
+
+	root->canon_uniquekeys = lappend(root->canon_uniquekeys, uk);
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return uk;
+}
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 8634940efc..dd64775d8f 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -511,6 +511,7 @@ minmax_qp_callback(PlannerInfo *root, void *extra)
 									  root->parse->targetList);
 
 	root->query_pathkeys = root->sort_pathkeys;
+	root->query_uniquekeys = NIL;
 }
 
 /*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..3a372af91b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -70,6 +70,7 @@ query_planner(PlannerInfo *root,
 	root->join_rel_level = NULL;
 	root->join_cur_level = 0;
 	root->canon_pathkeys = NIL;
+	root->canon_uniquekeys = NIL;
 	root->left_join_clauses = NIL;
 	root->right_join_clauses = NIL;
 	root->full_join_clauses = NIL;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d6f2153593..a7de8476d9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3657,15 +3657,30 @@ standard_qp_callback(PlannerInfo *root, void *extra)
 	 * much easier, since we know that the parser ensured that one is a
 	 * superset of the other.
 	 */
+	root->query_uniquekeys = NIL;
+
 	if (root->group_pathkeys)
+	{
 		root->query_pathkeys = root->group_pathkeys;
+
+		if (!root->parse->hasAggs)
+			root->query_uniquekeys = build_uniquekeys(root, qp_extra->groupClause);
+	}
 	else if (root->window_pathkeys)
 		root->query_pathkeys = root->window_pathkeys;
 	else if (list_length(root->distinct_pathkeys) >
 			 list_length(root->sort_pathkeys))
+	{
 		root->query_pathkeys = root->distinct_pathkeys;
+		root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else if (root->sort_pathkeys)
+	{
 		root->query_pathkeys = root->sort_pathkeys;
+
+		if (root->distinct_pathkeys)
+			root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else
 		root->query_pathkeys = NIL;
 }
@@ -6222,7 +6237,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
 
 	/* Estimate the cost of index scan */
 	indexScanPath = create_index_path(root, indexInfo,
-									  NIL, NIL, NIL, NIL,
+									  NIL, NIL, NIL, NIL, NIL,
 									  ForwardScanDirection, false,
 									  NULL, 1.0, false);
 
@@ -7107,6 +7122,26 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
 		}
 	}
 
+	foreach(lc, rel->unique_pathlist)
+	{
+		Path	   *subpath = (Path *) lfirst(lc);
+
+		/* Shouldn't have any parameterized paths anymore */
+		Assert(subpath->param_info == NULL);
+
+		if (tlist_same_exprs)
+			subpath->pathtarget->sortgrouprefs =
+				scanjoin_target->sortgrouprefs;
+		else
+		{
+			Path	   *newpath;
+
+			newpath = (Path *) create_projection_path(root, rel, subpath,
+													  scanjoin_target);
+			lfirst(lc) = newpath;
+		}
+	}
+
 	/*
 	 * Now, if final scan/join target contains SRFs, insert ProjectSetPath(s)
 	 * atop each existing path.  (Note that this function doesn't look at the
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e6d08aede5..a4dfafbb59 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -361,9 +361,9 @@ set_cheapest(RelOptInfo *parent_rel)
 }
 
 /*
- * add_path
+ * add_path_to
  *	  Consider a potential implementation path for the specified parent rel,
- *	  and add it to the rel's pathlist if it is worthy of consideration.
+ *	  and add it to the specified pathlist if it is worthy of consideration.
  *	  A path is worthy if it has a better sort order (better pathkeys) or
  *	  cheaper cost (on either dimension), or generates fewer rows, than any
  *	  existing path that has the same or superset parameterization rels.
@@ -416,10 +416,10 @@ set_cheapest(RelOptInfo *parent_rel)
  * 'parent_rel' is the relation entry to which the path corresponds.
  * 'new_path' is a potential path for parent_rel.
  *
- * Returns nothing, but modifies parent_rel->pathlist.
+ * Returns modified pathlist.
  */
-void
-add_path(RelOptInfo *parent_rel, Path *new_path)
+static List *
+add_path_to(RelOptInfo *parent_rel, List *pathlist, Path *new_path)
 {
 	bool		accept_new = true;	/* unless we find a superior old path */
 	int			insert_at = 0;	/* where to insert new item */
@@ -440,7 +440,7 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 	 * for more than one old path to be tossed out because new_path dominates
 	 * it.
 	 */
-	foreach(p1, parent_rel->pathlist)
+	foreach(p1, pathlist)
 	{
 		Path	   *old_path = (Path *) lfirst(p1);
 		bool		remove_old = false; /* unless new proves superior */
@@ -584,8 +584,7 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 		 */
 		if (remove_old)
 		{
-			parent_rel->pathlist = foreach_delete_current(parent_rel->pathlist,
-														  p1);
+			pathlist = foreach_delete_current(pathlist, p1);
 
 			/*
 			 * Delete the data pointed-to by the deleted cell, if possible
@@ -612,8 +611,7 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 	if (accept_new)
 	{
 		/* Accept the new path: insert it at proper place in pathlist */
-		parent_rel->pathlist =
-			list_insert_nth(parent_rel->pathlist, insert_at, new_path);
+		pathlist = list_insert_nth(pathlist, insert_at, new_path);
 	}
 	else
 	{
@@ -621,6 +619,15 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 		if (!IsA(new_path, IndexPath))
 			pfree(new_path);
 	}
+
+	return pathlist;
+}
+
+void
+add_path(RelOptInfo *parent_rel, Path *new_path)
+{
+	parent_rel->pathlist = add_path_to(parent_rel,
+									   parent_rel->pathlist, new_path);
 }
 
 /*
@@ -915,6 +922,13 @@ add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost,
 	return true;
 }
 
+void
+add_unique_path(RelOptInfo *parent_rel, Path *new_path)
+{
+	parent_rel->unique_pathlist = add_path_to(parent_rel,
+											  parent_rel->unique_pathlist,
+											  new_path);
+}
 
 /*****************************************************************************
  *		PATH NODE CREATION ROUTINES
@@ -940,6 +954,7 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = parallel_workers;
 	pathnode->pathkeys = NIL;	/* seqscan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_seqscan(pathnode, root, rel, pathnode->param_info);
 
@@ -964,6 +979,7 @@ create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* samplescan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_samplescan(pathnode, root, rel, pathnode->param_info);
 
@@ -1000,6 +1016,7 @@ create_index_path(PlannerInfo *root,
 				  List *indexorderbys,
 				  List *indexorderbycols,
 				  List *pathkeys,
+				  List *uniquekeys,
 				  ScanDirection indexscandir,
 				  bool indexonly,
 				  Relids required_outer,
@@ -1018,6 +1035,7 @@ create_index_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = 0;
 	pathnode->path.pathkeys = pathkeys;
+	pathnode->path.uniquekeys = uniquekeys;
 
 	pathnode->indexinfo = index;
 	pathnode->indexclauses = indexclauses;
@@ -1061,6 +1079,7 @@ create_bitmap_heap_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_degree;
 	pathnode->path.pathkeys = NIL;	/* always unordered */
+	pathnode->path.uniquekeys = NIL;
 
 	pathnode->bitmapqual = bitmapqual;
 
@@ -1922,6 +1941,7 @@ create_functionscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = pathkeys;
+	pathnode->uniquekeys = NIL;
 
 	cost_functionscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1948,6 +1968,7 @@ create_tablefuncscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_tablefuncscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1974,6 +1995,7 @@ create_valuesscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_valuesscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1999,6 +2021,7 @@ create_ctescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* XXX for now, result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2025,6 +2048,7 @@ create_namedtuplestorescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_namedtuplestorescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2051,6 +2075,7 @@ create_resultscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_resultscan(pathnode, root, rel, pathnode->param_info);
 
@@ -2077,6 +2102,7 @@ create_worktablescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	/* Cost is the same as for a regular CTE scan */
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index baced7eec0..a1511b46ea 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 3d3be197e0..0de27f0ef3 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -269,6 +269,8 @@ struct PlannerInfo
 
 	List	   *canon_pathkeys; /* list of "canonical" PathKeys */
 
+	List	   *canon_uniquekeys; /* list of "canonical" UniqueKeys */
+
 	List	   *left_join_clauses;	/* list of RestrictInfos for mergejoinable
 									 * outer join clauses w/nonnullable var on
 									 * left */
@@ -297,6 +299,8 @@ struct PlannerInfo
 
 	List	   *query_pathkeys; /* desired pathkeys for query_planner() */
 
+	List	   *query_uniquekeys; /* unique keys used for the query */
+
 	List	   *group_pathkeys; /* groupClause pathkeys, if any */
 	List	   *window_pathkeys;	/* pathkeys of bottom window, if any */
 	List	   *distinct_pathkeys;	/* distinctClause pathkeys, if any */
@@ -657,6 +661,7 @@ typedef struct RelOptInfo
 	List	   *pathlist;		/* Path structures */
 	List	   *ppilist;		/* ParamPathInfos used in pathlist */
 	List	   *partial_pathlist;	/* partial Paths */
+	List	   *unique_pathlist;	/* unique Paths */
 	struct Path *cheapest_startup_path;
 	struct Path *cheapest_total_path;
 	struct Path *cheapest_unique_path;
@@ -1077,6 +1082,15 @@ typedef struct ParamPathInfo
 	List	   *ppi_clauses;	/* join clauses available from outer rels */
 } ParamPathInfo;
 
+/*
+ * UniqueKey
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+
+	EquivalenceClass *eq_clause;	/* equivalence class */
+} UniqueKey;
 
 /*
  * Type "Path" is used as-is for sequential-scan paths, as well as some other
@@ -1106,6 +1120,9 @@ typedef struct ParamPathInfo
  *
  * "pathkeys" is a List of PathKey nodes (see above), describing the sort
  * ordering of the path's output rows.
+ *
+ * "uniquekeys", if not NIL, is a list of UniqueKey nodes (see above),
+ * describing the XXX.
  */
 typedef struct Path
 {
@@ -1129,6 +1146,8 @@ typedef struct Path
 
 	List	   *pathkeys;		/* sort ordering of path's output */
 	/* pathkeys is a List of PathKey nodes; see above */
+
+	List	   *uniquekeys;	/* the unique keys, or NIL if none */
 } Path;
 
 /* Macro for extracting a path's parameterization relids; beware double eval */
diff --git a/src/include/nodes/print.h b/src/include/nodes/print.h
index 6126b491bf..006248bfb5 100644
--- a/src/include/nodes/print.h
+++ b/src/include/nodes/print.h
@@ -28,6 +28,7 @@ extern char *pretty_format_node_dump(const char *dump);
 extern void print_rt(const List *rtable);
 extern void print_expr(const Node *expr, const List *rtable);
 extern void print_pathkeys(const List *pathkeys, const List *rtable);
+extern void print_uniquekeys(const List *uniquekeys, const List *rtable);
 extern void print_tl(const List *tlist, const List *rtable);
 extern void print_slot(TupleTableSlot *slot);
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e450fe112a..fd25997af5 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -34,6 +34,7 @@ extern void add_partial_path(RelOptInfo *parent_rel, Path *new_path);
 extern bool add_partial_path_precheck(RelOptInfo *parent_rel,
 									  Cost total_cost, List *pathkeys);
 
+extern void add_unique_path(RelOptInfo *parent_rel, Path *new_path);
 extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
 								 Relids required_outer, int parallel_workers);
 extern Path *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel,
@@ -44,6 +45,7 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									List *indexorderbys,
 									List *indexorderbycols,
 									List *pathkeys,
+									List *uniquekeys,
 									ScanDirection indexscandir,
 									bool indexonly,
 									Relids required_outer,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ab73bd20c..5b6be383b3 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -214,6 +214,9 @@ extern List *build_join_pathkeys(PlannerInfo *root,
 extern List *make_pathkeys_for_sortclauses(PlannerInfo *root,
 										   List *sortclauses,
 										   List *tlist);
+extern List *make_pathkeys_for_uniquekeys(PlannerInfo *root,
+										  List *sortclauses,
+										  List *tlist);
 extern void initialize_mergeclause_eclasses(PlannerInfo *root,
 											RestrictInfo *restrictinfo);
 extern void update_mergeclause_eclasses(PlannerInfo *root,
@@ -240,4 +243,12 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+/*
+ * uniquekey.c
+ *	  Utilities for matching and building unique keys
+ */
+extern List *build_uniquekeys(PlannerInfo *root, List *sortclauses);
+extern bool uniquekeys_contained_in(List *keys1, List *keys2);
+extern bool has_useful_uniquekeys(PlannerInfo *root);
+
 #endif							/* PATHS_H */
-- 
2.21.0

v33-0002-Index-skip-scan.patchtext/x-diff; charset=us-asciiDownload

From 666a8095b700365f78de80fa0febc4a0ac24ae7a Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Fri, 15 Nov 2019 09:46:53 -0500
Subject: [PATCH v33 2/2] Index skip scan

Implementation of Index Skip Scan (see Loose Index Scan in the wiki [1])
on top of IndexOnlyScan and IndexScan. To make it suitable for both
situations when there are small number of distinct values and
significant amount of distinct values the following approach is taken -
instead of searching from the root for every value we're searching for
then first on the current page, and then if not found continue searching
from the root.

Original patch and design were proposed by Thomas Munro [2], revived and
improved by Dmitry Dolgov and Jesper Pedersen.

[1] https://wiki.postgresql.org/wiki/Loose_indexscan
[2] https://www.postgresql.org/message-id/flat/CADLWmXXbTSBxP-MzJuPAYSsL_2f0iPm5VWPbCvDbVvfX93FKkw%40mail.gmail.com

Author: Jesper Pedersen, Dmitry Dolgov
Reviewed-by: Thomas Munro, David Rowley, Floris Van Nee, Kyotaro Horiguchi, Tomas Vondra, Peter Geoghegan
---
 contrib/bloom/blutils.c                       |   1 +
 doc/src/sgml/config.sgml                      |  15 +
 doc/src/sgml/indexam.sgml                     |  63 ++
 doc/src/sgml/indices.sgml                     |  23 +
 src/backend/access/brin/brin.c                |   1 +
 src/backend/access/gin/ginutil.c              |   1 +
 src/backend/access/gist/gist.c                |   1 +
 src/backend/access/hash/hash.c                |   1 +
 src/backend/access/index/indexam.c            |  18 +
 src/backend/access/nbtree/nbtree.c            |  13 +
 src/backend/access/nbtree/nbtsearch.c         | 469 ++++++++++++-
 src/backend/access/spgist/spgutils.c          |   1 +
 src/backend/commands/explain.c                |  25 +
 src/backend/executor/nodeIndexonlyscan.c      |  97 ++-
 src/backend/executor/nodeIndexscan.c          |  56 +-
 src/backend/nodes/copyfuncs.c                 |   2 +
 src/backend/nodes/outfuncs.c                  |   2 +
 src/backend/nodes/readfuncs.c                 |   2 +
 src/backend/optimizer/path/costsize.c         |   1 +
 src/backend/optimizer/path/indxpath.c         |  37 ++
 src/backend/optimizer/plan/createplan.c       |  20 +-
 src/backend/optimizer/plan/planner.c          |  10 +-
 src/backend/optimizer/util/pathnode.c         |  68 ++
 src/backend/optimizer/util/plancat.c          |   1 +
 src/backend/utils/misc/guc.c                  |   9 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/amapi.h                    |   8 +
 src/include/access/genam.h                    |   2 +
 src/include/access/nbtree.h                   |   7 +
 src/include/access/sdir.h                     |   7 +
 src/include/nodes/execnodes.h                 |   6 +
 src/include/nodes/pathnodes.h                 |   5 +
 src/include/nodes/plannodes.h                 |   4 +
 src/include/optimizer/cost.h                  |   1 +
 src/include/optimizer/pathnode.h              |   3 +
 src/test/regress/expected/select_distinct.out | 621 ++++++++++++++++++
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/sql/select_distinct.sql      | 254 +++++++
 38 files changed, 1845 insertions(+), 14 deletions(-)

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 0104d02f67..a018b7f3d0 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -133,6 +133,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = blbulkdelete;
 	amroutine->amvacuumcleanup = blvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = blcostestimate;
 	amroutine->amoptions = bloptions;
 	amroutine->amproperty = NULL;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e07dc01e80..36ba75b077 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4517,6 +4517,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-indexskipscan" xreflabel="enable_indexskipscan">
+      <term><varname>enable_indexskipscan</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_indexskipscan</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of index-skip-scan plan
+        types (see <xref linkend="indexes-index-skip-scans"/>). The default is
+        <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-material" xreflabel="enable_material">
       <term><varname>enable_material</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 37f8d8760a..a726d80878 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -148,6 +148,7 @@ typedef struct IndexAmRoutine
     amendscan_function amendscan;
     ammarkpos_function ammarkpos;       /* can be NULL */
     amrestrpos_function amrestrpos;     /* can be NULL */
+    amskip_function amskip;             /* can be NULL */
 
     /* interface functions to support parallel index scans */
     amestimateparallelscan_function amestimateparallelscan;    /* can be NULL */
@@ -691,6 +692,68 @@ amrestrpos (IndexScanDesc scan);
 
   <para>
 <programlisting>
+bool
+amskip (IndexScanDesc scan,
+        ScanDirection direction,
+        ScanDirection indexdir,
+        bool scanstart,
+        int prefix);
+</programlisting>
+  Skip past all tuples where the first 'prefix' columns have the same value as
+  the last tuple returned in the current scan. The arguments are:
+
+   <variablelist>
+    <varlistentry>
+     <term><parameter>scan</parameter></term>
+     <listitem>
+      <para>
+       Index scan information
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>direction</parameter></term>
+     <listitem>
+      <para>
+       The direction in which data is advancing.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>indexdir</parameter></term>
+     <listitem>
+      <para>
+        The index direction, in which data must be read.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>scanstart</parameter></term>
+     <listitem>
+      <para>
+        Whether or not it is a start of the scan.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>prefix</parameter></term>
+     <listitem>
+      <para>
+        Distinct prefix size.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+
+  </para>
+
+  <para>
+<programlisting>
 Size
 amestimateparallelscan (void);
 </programlisting>
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index c54bf0dbbd..c429d98fc7 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1254,6 +1254,29 @@ SELECT target FROM tests WHERE subject = 'some-subject' AND success;
    and later will recognize such cases and allow index-only scans to be
    generated, but older versions will not.
   </para>
+
+  <sect2 id="indexes-index-skip-scans">
+    <title>Index Skip Scans</title>
+
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index</primary>
+      <secondary>index-skip scans</secondary>
+    </indexterm>
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index-skip scan</primary>
+    </indexterm>
+
+    <para>
+     When the rows retrieved from an index scan are then deduplicated by
+     eliminating rows matching on a prefix of index keys (e.g. when using
+     <literal>SELECT DISTINCT</literal>), the planner will consider
+     skipping groups of rows with a matching key prefix. When a row with
+     a particular prefix is found, remaining rows with the same key prefix
+     are skipped.  The larger the number of rows with the same key prefix
+     rows (i.e. the lower the number of distinct key prefixes in the index),
+     the more efficient this is.
+    </para>
+  </sect2>
  </sect1>
 
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2e8f67ef10..4db31bb211 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -113,6 +113,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = brinbulkdelete;
 	amroutine->amvacuumcleanup = brinvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = brincostestimate;
 	amroutine->amoptions = brinoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index a7e55caf28..8dd1d30d2a 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -65,6 +65,7 @@ ginhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = ginbulkdelete;
 	amroutine->amvacuumcleanup = ginvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = gincostestimate;
 	amroutine->amoptions = ginoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index aefc302ed2..8c692f7fb4 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -86,6 +86,7 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = gistbulkdelete;
 	amroutine->amvacuumcleanup = gistvacuumcleanup;
 	amroutine->amcanreturn = gistcanreturn;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = gistcostestimate;
 	amroutine->amoptions = gistoptions;
 	amroutine->amproperty = gistproperty;
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 4871b7ff4d..e5fa4c7864 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -83,6 +83,7 @@ hashhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = hashbulkdelete;
 	amroutine->amvacuumcleanup = hashvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = hashcostestimate;
 	amroutine->amoptions = hashoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 01539b6bd6..1047a35ade 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -33,6 +33,7 @@
  *		index_can_return	- does index support index-only scans?
  *		index_getprocid - get a support procedure OID
  *		index_getprocinfo - get a support procedure's lookup info
+ *		index_skip		- advance past duplicate key values in a scan
  *
  * NOTES
  *		This file contains the index_ routines which used
@@ -730,6 +731,23 @@ index_can_return(Relation indexRelation, int attno)
 	return indexRelation->rd_indam->amcanreturn(indexRelation, attno);
 }
 
+/* ----------------
+ *		index_skip
+ *
+ *		Skip past all tuples where the first 'prefix' columns have the
+ *		same value as the last tuple returned in the current scan.
+ * ----------------
+ */
+bool
+index_skip(IndexScanDesc scan, ScanDirection direction,
+		   ScanDirection indexdir, bool scanstart, int prefix)
+{
+	SCAN_CHECKS;
+
+	return scan->indexRelation->rd_indam->amskip(scan, direction,
+												 indexdir, scanstart, prefix);
+}
+
 /* ----------------
  *		index_getprocid
  *
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 5254bc7ef5..8fde56fe60 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -132,6 +132,7 @@ bthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = btbulkdelete;
 	amroutine->amvacuumcleanup = btvacuumcleanup;
 	amroutine->amcanreturn = btcanreturn;
+	amroutine->amskip = btskip;
 	amroutine->amcostestimate = btcostestimate;
 	amroutine->amoptions = btoptions;
 	amroutine->amproperty = btproperty;
@@ -381,6 +382,8 @@ btbeginscan(Relation rel, int nkeys, int norderbys)
 	 */
 	so->currTuples = so->markTuples = NULL;
 
+	so->skipScanKey = NULL;
+
 	scan->xs_itupdesc = RelationGetDescr(rel);
 
 	scan->opaque = so;
@@ -448,6 +451,16 @@ btrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 	_bt_preprocess_array_keys(scan);
 }
 
+/*
+ * btskip() -- skip to the beginning of the next key prefix
+ */
+bool
+btskip(IndexScanDesc scan, ScanDirection direction,
+	   ScanDirection indexdir, bool start, int prefix)
+{
+	return _bt_skip(scan, direction, indexdir, start, prefix);
+}
+
 /*
  *	btendscan() -- close down a scan
  */
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index c573814f01..e2b549355b 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -37,7 +37,10 @@ static bool _bt_parallel_readpage(IndexScanDesc scan, BlockNumber blkno,
 static Buffer _bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot);
 static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir);
 static inline void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir);
-
+static inline void _bt_update_skip_scankeys(IndexScanDesc scan,
+											Relation indexRel);
+static inline bool _bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+										Buffer buf, ScanDirection dir);
 
 /*
  *	_bt_drop_lock_and_maybe_pin()
@@ -1375,6 +1378,419 @@ _bt_next(IndexScanDesc scan, ScanDirection dir)
 	return true;
 }
 
+/*
+ *  _bt_skip() -- Skip items that have the same prefix as the most recently
+ * 				  fetched index tuple.
+ *
+ * 		The current position is set so that a subsequent call to _bt_next will
+ * 		fetch the first tuple that differs in the leading 'prefix' keys.
+ *
+ * 		There are four different kinds of skipping (depending on dir and
+ * 		indexdir, that are important to distinguish, especially in the presense
+ * 		of an index condition:
+ *
+ * 		* Advancing forward and reading forward
+ * 			simple scan
+ *
+ * 		* Advancing forward and reading backward
+ * 			scan inside a cursor fetching backward, when skipping is necessary
+ * 			right from the start
+ *
+ * 		* Advancing backward and reading forward
+ * 			scan with order by desc inside a cursor fetching forward, when
+ * 			skipping is necessary right from the start
+ *
+ * 		* Advancing backward and reading backward
+ * 			simple scan with order by desc
+ *
+ *      The current page is searched for the next unique value. If none is found
+ *      we will do a scan from the root in order to find the next page with
+ *      a unique value.
+ */
+bool
+_bt_skip(IndexScanDesc scan, ScanDirection dir,
+		 ScanDirection indexdir, bool scanstart, int prefix)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+	BTStack stack;
+	Buffer buf;
+	OffsetNumber offnum;
+	BTScanPosItem *currItem;
+	Relation 	 indexRel = scan->indexRelation;
+
+	/* We want to return tuples, and we need a starting point */
+	Assert(scan->xs_want_itup);
+	Assert(scan->xs_itup);
+
+	if (so->numKilled > 0)
+		_bt_killitems(scan);
+
+	/* If skipScanKey is NULL then we initialize it with _bt_mkscankey */
+	if (so->skipScanKey == NULL)
+	{
+		so->skipScanKey = _bt_mkscankey(indexRel, scan->xs_itup);
+		so->skipScanKey->keysz = prefix;
+		so->skipScanKey->scantid = NULL;
+	}
+	so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+	_bt_update_skip_scankeys(scan, indexRel);
+
+	/* Check if the next unique key can be found within the current page.
+	 * Since we do not lock the current page between jumps, it's possible
+	 * that it was splitted since the last time we saw it. This is fine in
+	 * case of scanning forward, since page split to the right and we are
+	 * still on the left most page. In case of scanning backwards it's
+	 * possible to loose some pages and we need to remember the previous
+	 * page, and then follow the right link from the current page until we
+	 * find the original one.
+	 *
+	 * Since the whole idea of checking the current page is to protect
+	 * ourselves and make more performant statistic mismatch case when
+	 * there are too many distinct values for jumping, it's not clear if
+	 * the complexity of this solution in case of backward scan is
+	 * justified, so for now just avoid it.
+	 */
+	if (BufferIsValid(so->currPos.buf) && ScanDirectionIsForward(dir))
+	{
+		LockBuffer(so->currPos.buf, BT_READ);
+
+		if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
+		{
+			bool keyFound = false;
+
+			offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, so->currPos.buf);
+
+			/* Lock the page for SERIALIZABLE transactions */
+			PredicateLockPage(scan->indexRelation, BufferGetBlockNumber(so->currPos.buf),
+							  scan->xs_snapshot);
+
+			/* We know in which direction to look */
+			_bt_initialize_more_data(so, dir);
+
+			/* Now read the data */
+			keyFound = _bt_readpage(scan, dir, offnum);
+
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			ReleaseBuffer(so->currPos.buf);
+			so->currPos.buf = InvalidBuffer;
+
+			if (keyFound)
+			{
+				/* set IndexTuple */
+				currItem = &so->currPos.items[so->currPos.itemIndex];
+				scan->xs_heaptid = currItem->heapTid;
+				scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+				return true;
+			}
+		}
+		else
+		{
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		}
+	}
+
+	if (BufferIsValid(so->currPos.buf))
+	{
+		ReleaseBuffer(so->currPos.buf);
+		so->currPos.buf = InvalidBuffer;
+	}
+
+	/*
+	 * We haven't found scan key within the current page, so let's scan from
+	 * the root. Use _bt_search and _bt_binsrch to get the buffer and offset
+	 * number
+	 */
+	so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+	stack = _bt_search(scan->indexRelation, so->skipScanKey,
+					   &buf, BT_READ, scan->xs_snapshot);
+	_bt_freestack(stack);
+	so->currPos.buf = buf;
+	offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+
+	/* Lock the page for SERIALIZABLE transactions */
+	PredicateLockPage(scan->indexRelation, BufferGetBlockNumber(buf),
+					  scan->xs_snapshot);
+
+	/* We know in which direction to look */
+	_bt_initialize_more_data(so, dir);
+
+	/*
+	 * Simplest case is when both directions are forward, when we are already
+	 * at the next distinct key at the beginning of the series (so everything
+	 * else would be done in _bt_readpage)
+	 *
+	 * The case when both directions are backwards is also simple, but we need
+	 * to go one step back, since we need a last element from the previous
+	 * series.
+	 */
+	if (ScanDirectionIsBackward(dir) && ScanDirectionIsBackward(indexdir))
+		 offnum = OffsetNumberPrev(offnum);
+
+	/*
+	 * Andvance backward but read forward. At this moment we are at the next
+	 * distinct key at the beginning of the series. In case if scan just
+	 * started, we can read forward without doing anything else. Otherwise
+	 * find previous distinct key and the beginning of it's series and read
+	 * forward from there. To do so, go back one step, perform binary search
+	 * to find the first item in the series and let _bt_readpage do everything
+	 * else.
+	 */
+	else if (ScanDirectionIsBackward(dir) && ScanDirectionIsForward(indexdir))
+	{
+		if (!scanstart)
+		{
+			/* Reading forward means we expect to see more data on the right */
+			so->currPos.moreRight = true;
+
+			offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+
+			/* One step back to find a previous value */
+			_bt_readpage(scan, dir, offnum);
+
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			if (_bt_next(scan, dir))
+			{
+				LockBuffer(so->currPos.buf, BT_READ);
+				_bt_update_skip_scankeys(scan, indexRel);
+
+				/*
+				 * And now find the last item from the sequence for the
+				 * current, value with the intention do OffsetNumberNext. As a
+				 * result we end up on a first element from the sequence.
+				 */
+				if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
+					offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				else
+				{
+					if (BufferIsValid(so->currPos.buf))
+					{
+						/* Before leaving current page, deal with any killed items */
+						if (so->numKilled > 0)
+							_bt_killitems(scan);
+
+						LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+						ReleaseBuffer(so->currPos.buf);
+						so->currPos.buf = InvalidBuffer;
+					}
+
+					stack = _bt_search(scan->indexRelation, so->skipScanKey,
+									   &buf, BT_READ, scan->xs_snapshot);
+					_bt_freestack(stack);
+					so->currPos.buf = buf;
+					offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				}
+			}
+			else
+			{
+				pfree(so->skipScanKey);
+				so->skipScanKey = NULL;
+				return false;
+			}
+		}
+	}
+
+	/*
+	 * Advance forward but read backward. At this moment we are at the next
+	 * distinct key at the beginning of the series. In case if scan just
+	 * started, we can go one step back and read forward without doing
+	 * anything else. Otherwise find the next distinct key and the beginning
+	 * of it's series, go one step back and read backward from there.
+	 *
+	 * An interesting situation can happen if one of distinct keys do not pass
+	 * a corresponding index condition at all. In this case reading backward
+	 * can lead to a previous distinct key being found, creating a loop. To
+	 * avoid that check the value to be returned, and jump one more time if
+	 * it's the same as at the beginning. Note that we do not check visibility
+	 * here, and dead tuples could also lead to the same situation. This has to
+	 * be checked on the caller side.
+	 */
+	else if (ScanDirectionIsForward(dir) && ScanDirectionIsBackward(indexdir))
+	{
+		if (scanstart)
+			offnum = OffsetNumberPrev(offnum);
+		else
+		{
+			OffsetNumber nextOffset,
+						startOffset,
+						jumpOffset;
+
+			IndexTuple startItup = CopyIndexTuple(scan->xs_itup);
+			Page page = BufferGetPage(so->currPos.buf);
+
+			/* We are at the end and need to return */
+			if ((offnum > PageGetMaxOffsetNumber(page)) &
+				(so->currPos.nextPage == P_NONE))
+			{
+				LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+
+				BTScanPosUnpinIfPinned(so->currPos);
+				BTScanPosInvalidate(so->currPos)
+
+				pfree(so->skipScanKey);
+				so->skipScanKey = NULL;
+				return false;
+			}
+
+			nextOffset = startOffset = ItemPointerGetOffsetNumber(&scan->xs_itup->t_tid);
+
+			/* Reading backwards means we expect to see more data on the left */
+			so->currPos.moreLeft = true;
+
+			while (nextOffset == startOffset)
+			{
+				IndexTuple itup;
+				CHECK_FOR_INTERRUPTS();
+
+				/*
+				 * Find a next index tuple to update scan key. It could be at
+				 * the end, so check for max offset
+				 */
+				if (!_bt_readpage(scan, ForwardScanDirection, offnum))
+				{
+					/*
+					 * There's no actually-matching data on this page.  Try to
+					 * advance to the next page. Return false if there's no
+					 * matching data at all.
+					 */
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+					if (!_bt_steppage(scan, dir))
+					{
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+					LockBuffer(so->currPos.buf, BT_READ);
+				}
+
+				currItem = &so->currPos.items[so->currPos.firstItem];
+				itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+
+				scan->xs_itup = itup;
+				so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+
+				_bt_update_skip_scankeys(scan, indexRel);
+				if (BufferIsValid(so->currPos.buf))
+				{
+					/* Before leaving current page, deal with any killed items */
+					if (so->numKilled > 0)
+						_bt_killitems(scan);
+
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+					ReleaseBuffer(so->currPos.buf);
+					so->currPos.buf = InvalidBuffer;
+				}
+
+				stack = _bt_search(scan->indexRelation, so->skipScanKey,
+								   &buf, BT_READ, scan->xs_snapshot);
+				_bt_freestack(stack);
+				so->currPos.buf = buf;
+				jumpOffset = offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				offnum = OffsetNumberPrev(offnum);
+
+				if (!_bt_readpage(scan, indexdir, offnum))
+				{
+					/*
+					 * There's no actually-matching data on this page.  Try to
+					 * advance to the next page. Return false if there's no
+					 * matching data at all.
+					 */
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+					if (!_bt_steppage(scan, indexdir))
+					{
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+					LockBuffer(so->currPos.buf, BT_READ);
+				}
+
+				currItem = &so->currPos.items[so->currPos.lastItem];
+				itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+				nextOffset = ItemPointerGetOffsetNumber(&itup->t_tid);
+
+				/*
+				 * To check if we returned the same tuple, try to find a
+				 * startItup on the current page. For that we need to update
+				 * scankey to match the whole tuple and set nextkey to return
+				 * an exact tuple, not the next one. If the nextOffset is the
+				 * same as before, it means we are in the loop, return offnum
+				 * to the original position and jump further
+				 */
+				scan->xs_itup = startItup;
+				_bt_update_skip_scankeys(scan, indexRel);
+
+				so->skipScanKey->keysz = IndexRelationGetNumberOfKeyAttributes(indexRel);
+				so->skipScanKey->nextkey = false;
+
+				if (_bt_scankey_within_page(scan, so->skipScanKey,
+											so->currPos.buf, dir))
+				{
+					OffsetNumber maxoff;
+					startOffset = _bt_binsrch(scan->indexRelation,
+											  so->skipScanKey,
+											  so->currPos.buf);
+
+					page = BufferGetPage(so->currPos.buf);
+					maxoff = PageGetMaxOffsetNumber(page);
+
+					if (nextOffset <= startOffset)
+					{
+						offnum = jumpOffset;
+						nextOffset = startOffset;
+					}
+
+					if ((offnum > maxoff) & (so->currPos.nextPage == P_NONE))
+					{
+						LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+
+						BTScanPosUnpinIfPinned(so->currPos);
+						BTScanPosInvalidate(so->currPos)
+
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+				}
+
+				/* Return original scankey options */
+				so->skipScanKey->keysz = prefix;
+				so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+			}
+		}
+	}
+
+	/* Now read the data */
+	if (!_bt_readpage(scan, indexdir, offnum))
+	{
+		/*
+		 * There's no actually-matching data on this page.  Try to advance to
+		 * the next page.  Return false if there's no matching data at all.
+		 */
+		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		if (!_bt_steppage(scan, dir))
+		{
+			pfree(so->skipScanKey);
+			so->skipScanKey = NULL;
+			return false;
+		}
+	}
+	else
+	{
+		/* Drop the lock, and maybe the pin, on the current page */
+		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+	}
+
+	/* And set IndexTuple */
+	currItem = &so->currPos.items[so->currPos.itemIndex];
+	scan->xs_heaptid = currItem->heapTid;
+	scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+
+	so->currPos.moreLeft = true;
+	so->currPos.moreRight = true;
+
+	return true;
+}
+
 /*
  *	_bt_readpage() -- Load data from current index page into so->currPos
  *
@@ -2246,3 +2662,54 @@ _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir)
 	so->numKilled = 0;			/* just paranoia */
 	so->markItemIndex = -1;		/* ditto */
 }
+
+/*
+ * _bt_update_skip_scankeys() -- set up a new values for the existing scankeys
+ * 								 based on the current index tuple
+ */
+static inline void
+_bt_update_skip_scankeys(IndexScanDesc scan, Relation indexRel)
+{
+	TupleDesc		itupdesc;
+	int			indnkeyatts,
+				i;
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	ScanKey			scankeys = so->skipScanKey->scankeys;
+
+	itupdesc = RelationGetDescr(indexRel);
+	indnkeyatts = IndexRelationGetNumberOfKeyAttributes(indexRel);
+	for (i = 0; i < indnkeyatts; i++)
+	{
+		Datum datum;
+		bool null;
+		int flags;
+
+		datum = index_getattr(scan->xs_itup, i + 1, itupdesc, &null);
+		flags = (null ? SK_ISNULL : 0) |
+				(indexRel->rd_indoption[i] << SK_BT_INDOPTION_SHIFT);
+		scankeys[i].sk_flags = flags;
+		scankeys[i].sk_argument = datum;
+	}
+}
+
+/*
+ * _bt_scankey_within_page() -- check if the provided scankey could be found
+ * 								within a page, specified by the buffer.
+ */
+static inline bool
+_bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+						Buffer buf, ScanDirection dir)
+{
+	OffsetNumber low, high;
+	Page page = BufferGetPage(buf);
+	BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+
+	low = P_FIRSTDATAKEY(opaque);
+	high = PageGetMaxOffsetNumber(page);
+
+	if (unlikely(high < low))
+		return false;
+
+	return (_bt_compare(scan->indexRelation, key, page, low) > 0 &&
+			_bt_compare(scan->indexRelation, key, page, high) < 1);
+}
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 4924ae1c59..fa09a4685e 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -68,6 +68,7 @@ spghandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = spgbulkdelete;
 	amroutine->amvacuumcleanup = spgvacuumcleanup;
 	amroutine->amcanreturn = spgcanreturn;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = spgcostestimate;
 	amroutine->amoptions = spgoptions;
 	amroutine->amproperty = spgproperty;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c367c750b1..a7dd874531 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -141,6 +141,7 @@ static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
 static void ExplainIndentText(ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
+static void ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
 
 
@@ -1052,6 +1053,22 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 	return planstate_tree_walker(planstate, ExplainPreScanNode, rels_used);
 }
 
+/*
+ * ExplainIndexSkipScanKeys -
+ *	  Append information about index skip scan to es->str.
+ *
+ * Can be used to print the skip prefix size.
+ */
+static void
+ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es)
+{
+	if (skipPrefixSize > 0)
+	{
+		if (es->format != EXPLAIN_FORMAT_TEXT)
+			ExplainPropertyInteger("Distinct Prefix", NULL, skipPrefixSize, es);
+	}
+}
+
 /*
  * ExplainNode -
  *	  Appends a description of a plan tree to es->str
@@ -1386,6 +1403,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
 
+				ExplainIndexSkipScanKeys(indexscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexscan->indexid,
 										indexscan->indexorderdir,
 										es);
@@ -1396,6 +1415,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) plan;
 
+				ExplainIndexSkipScanKeys(indexonlyscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexonlyscan->indexid,
 										indexonlyscan->indexorderdir,
 										es);
@@ -1655,6 +1676,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	switch (nodeTag(plan))
 	{
 		case T_IndexScan:
+			if (((IndexScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyBool("Skip scan", true, es);
 			show_scan_qual(((IndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexScan *) plan)->indexqualorig)
@@ -1668,6 +1691,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			break;
 		case T_IndexOnlyScan:
+			if (((IndexOnlyScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyBool("Skip scan", true, es);
 			show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexOnlyScan *) plan)->indexqual)
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 5617ac29e7..c4e4b087a7 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -41,6 +41,7 @@
 #include "miscadmin.h"
 #include "storage/bufmgr.h"
 #include "storage/predicate.h"
+#include "storage/itemptr.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -62,9 +63,26 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	EState	   *estate;
 	ExprContext *econtext;
 	ScanDirection direction;
+	ScanDirection readDirection;
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
 	ItemPointer tid;
+	ItemPointerData startTid;
+	IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) node->ss.ps.plan;
+
+	/*
+	 * Tells if the current position was reached via skipping. In this case
+	 * there is no nead for the index_getnext_tid
+	 */
+	bool skipped = false;
+
+	/*
+	 * Index only scan must be aware that in case of skipping we can return to
+	 * the starting point due to visibility checks. In this situation we need
+	 * to jump further, and number of skipping attempts tell us how far do we
+	 * need to do so.
+	 */
+	int skipAttempts = 0;
 
 	/*
 	 * extract necessary information from index scan node
@@ -72,7 +90,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexOnlyScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexonlyscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -114,16 +132,87 @@ IndexOnlyNext(IndexOnlyScanState *node)
 						 node->ioss_OrderByKeys,
 						 node->ioss_NumOrderByKeys);
 	}
+	else
+	{
+		ItemPointerCopy(&scandesc->xs_heaptid, &startTid);
+	}
+
+	/*
+	 * Check if we need to skip to the next key prefix, because we've been
+	 * asked to implement DISTINCT.
+	 *
+	 * When fetching a cursor in the direction opposite to a general scan
+	 * direction, the result must be what normal fetching should have
+	 * returned, but in reversed order. In other words, return the last or
+	 * first scanned tuple in a DISTINCT set, depending on a cursor direction.
+	 * Due to that we skip also when the first tuple wasn't emitted yet, but
+	 * the directions are opposite.
+	 */
+	if (node->ioss_SkipPrefixSize > 0 &&
+		(node->ioss_FirstTupleEmitted ||
+		 ScanDirectionsAreOpposite(direction, indexonlyscan->indexorderdir)))
+	{
+		if (!index_skip(scandesc, direction, indexonlyscan->indexorderdir,
+						!node->ioss_FirstTupleEmitted, node->ioss_SkipPrefixSize))
+		{
+			/*
+			 * Reached end of index. At this point currPos is invalidated, and
+			 * we need to reset ioss_FirstTupleEmitted, since otherwise after
+			 * going backwards, reaching the end of index, and going forward
+			 * again we apply skip again. It would be incorrect and lead to an
+			 * extra skipped item.
+			 */
+			node->ioss_FirstTupleEmitted = false;
+			return ExecClearTuple(slot);
+		}
+		else
+		{
+			skipAttempts = 1;
+			skipped = true;
+			tid = &scandesc->xs_heaptid;
+		}
+	}
+
+	readDirection = skipped ? indexonlyscan->indexorderdir : direction;
 
 	/*
 	 * OK, now that we have what we need, fetch the next tuple.
 	 */
-	while ((tid = index_getnext_tid(scandesc, direction)) != NULL)
+	while (skipped || (tid = index_getnext_tid(scandesc, readDirection)) != NULL)
 	{
 		bool		tuple_from_heap = false;
 
 		CHECK_FOR_INTERRUPTS();
 
+		/*
+		 * While doing index only skip scan with advancing and reading in
+		 * different directions we can return to the same position where we
+		 * started after visibility check. Recognize such situations and skip
+		 * more.
+		 */
+		if ((readDirection != direction) &&
+			ItemPointerIsValid(&startTid) && ItemPointerEquals(&startTid, tid))
+		{
+			int i;
+			skipAttempts += 1;
+
+			for (i = 0; i < skipAttempts; i++)
+			{
+				if (!index_skip(scandesc, direction,
+								indexonlyscan->indexorderdir,
+								!node->ioss_FirstTupleEmitted,
+								node->ioss_SkipPrefixSize))
+				{
+					node->ioss_FirstTupleEmitted = false;
+					return ExecClearTuple(slot);
+				}
+			}
+
+			tid = &scandesc->xs_heaptid;
+		}
+
+		skipped = false;
+
 		/*
 		 * We can skip the heap fetch if the TID references a heap page on
 		 * which all tuples are known visible to everybody.  In any case,
@@ -250,6 +339,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 							  ItemPointerGetBlockNumber(tid),
 							  estate->es_snapshot);
 
+		node->ioss_FirstTupleEmitted = true;
+
 		return slot;
 	}
 
@@ -504,6 +595,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
 	indexstate->ss.ps.ExecProcNode = ExecIndexOnlyScan;
+	indexstate->ioss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->ioss_FirstTupleEmitted = false;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index d0a96a38e0..449aaec3ac 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -85,6 +85,13 @@ IndexNext(IndexScanState *node)
 	ScanDirection direction;
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
+	IndexScan *indexscan = (IndexScan *) node->ss.ps.plan;
+
+	/*
+	 * tells if the current position was reached via skipping. In this case
+	 * there is no nead for the index_getnext_tid
+	 */
+	bool skipped = false;
 
 	/*
 	 * extract necessary information from index scan node
@@ -92,7 +99,7 @@ IndexNext(IndexScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -117,6 +124,12 @@ IndexNext(IndexScanState *node)
 
 		node->iss_ScanDesc = scandesc;
 
+		/* Index skip scan assumes xs_want_itup, so set it to true */
+		if (indexscan->indexskipprefixsize > 0)
+			node->iss_ScanDesc->xs_want_itup = true;
+		else
+			node->iss_ScanDesc->xs_want_itup = false;
+
 		/*
 		 * If no run-time keys to calculate or they are ready, go ahead and
 		 * pass the scankeys to the index AM.
@@ -127,12 +140,48 @@ IndexNext(IndexScanState *node)
 						 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
 	}
 
+	/*
+	 * Check if we need to skip to the next key prefix, because we've been
+	 * asked to implement DISTINCT.
+	 *
+	 * When fetching a cursor in the direction opposite to a general scan
+	 * direction, the result must be what normal fetching should have
+	 * returned, but in reversed order. In other words, return the last or
+	 * first scanned tuple in a DISTINCT set, depending on a cursor direction.
+	 * Due to that we skip also when the first tuple wasn't emitted yet, but
+	 * the directions are opposite.
+	 */
+	if (node->iss_SkipPrefixSize > 0 &&
+		(node->iss_FirstTupleEmitted ||
+		 ScanDirectionsAreOpposite(direction, indexscan->indexorderdir)))
+	{
+		if (!index_skip(scandesc, direction, indexscan->indexorderdir,
+					   !node->iss_FirstTupleEmitted, node->iss_SkipPrefixSize))
+		{
+			/*
+			 * Reached end of index. At this point currPos is invalidated, and
+			 * we need to reset iss_FirstTupleEmitted, since otherwise after
+			 * going backwards, reaching the end of index, and going forward
+			 * again we apply skip again. It would be incorrect and lead to an
+			 * extra skipped item.
+			 */
+			node->iss_FirstTupleEmitted = false;
+			return ExecClearTuple(slot);
+		}
+		else
+		{
+			skipped = true;
+			index_fetch_heap(scandesc, slot);
+		}
+	}
+
 	/*
 	 * ok, now that we have what we need, fetch the next tuple.
 	 */
-	while (index_getnext_slot(scandesc, direction, slot))
+	while (skipped || index_getnext_slot(scandesc, direction, slot))
 	{
 		CHECK_FOR_INTERRUPTS();
+		skipped = false;
 
 		/*
 		 * If the index was lossy, we have to recheck the index quals using
@@ -149,6 +198,7 @@ IndexNext(IndexScanState *node)
 			}
 		}
 
+		node->iss_FirstTupleEmitted = true;
 		return slot;
 	}
 
@@ -910,6 +960,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
 	indexstate->ss.ps.ExecProcNode = ExecIndexScan;
+	indexstate->iss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->iss_FirstTupleEmitted = false;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 54ad62bb7f..e0cfd710c4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -493,6 +493,7 @@ _copyIndexScan(const IndexScan *from)
 	COPY_NODE_FIELD(indexorderbyorig);
 	COPY_NODE_FIELD(indexorderbyops);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
 
 	return newnode;
 }
@@ -518,6 +519,7 @@ _copyIndexOnlyScan(const IndexOnlyScan *from)
 	COPY_NODE_FIELD(indexorderby);
 	COPY_NODE_FIELD(indextlist);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 16083e7a7e..5f723cda4b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -562,6 +562,7 @@ _outIndexScan(StringInfo str, const IndexScan *node)
 	WRITE_NODE_FIELD(indexorderbyorig);
 	WRITE_NODE_FIELD(indexorderbyops);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
 }
 
 static void
@@ -576,6 +577,7 @@ _outIndexOnlyScan(StringInfo str, const IndexOnlyScan *node)
 	WRITE_NODE_FIELD(indexorderby);
 	WRITE_NODE_FIELD(indextlist);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 551ce6c41c..028d03a56d 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1820,6 +1820,7 @@ _readIndexScan(void)
 	READ_NODE_FIELD(indexorderbyorig);
 	READ_NODE_FIELD(indexorderbyops);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
 
 	READ_DONE();
 }
@@ -1839,6 +1840,7 @@ _readIndexOnlyScan(void)
 	READ_NODE_FIELD(indexorderby);
 	READ_NODE_FIELD(indextlist);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b5a0033721..710edf160a 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -124,6 +124,7 @@ int			max_parallel_workers_per_gather = 2;
 bool		enable_seqscan = true;
 bool		enable_indexscan = true;
 bool		enable_indexonlyscan = true;
+bool		enable_indexskipscan = true;
 bool		enable_bitmapscan = true;
 bool		enable_tidscan = true;
 bool		enable_sort = true;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 363f5349f1..fc3ec200d4 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -791,6 +791,16 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexPath  *ipath = (IndexPath *) lfirst(lc);
 
+		/*
+		 * To prevent unique paths from index skip scans being potentially used
+		 * when not needed scan keep them in a separate pathlist.
+		*/
+		if (ipath->indexskipprefix != 0)
+		{
+			add_unique_path(rel, (Path *) ipath);
+			continue;
+		}
+
 		if (index->amhasgettuple)
 			add_path(rel, (Path *) ipath);
 
@@ -880,6 +890,8 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	bool		pathkeys_possibly_useful;
 	bool		index_is_ordered;
 	bool		index_only_scan;
+	bool		not_empty_qual;
+	bool		can_skip;
 	int			indexcol;
 
 	/*
@@ -1029,6 +1041,17 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	index_only_scan = (scantype != ST_BITMAPSCAN &&
 					   check_index_only(rel, index));
 
+	/* Check if an index skip scan is possible. */
+	can_skip = enable_indexskipscan & index->amcanskip;
+
+	/*
+	 * In case of index scan (not index-only scan) skip scan is not supported
+	 * when there are qual conditions present. Check if they are.
+	 */
+	not_empty_qual = (root->parse->jointree != NULL &&
+					  root->parse->jointree->quals != NULL &&
+					  list_length((List *) root->parse->jointree->quals) != 0);
+
 	/*
 	 * 4. Generate an indexscan path if there are relevant restriction clauses
 	 * in the current clauses, OR the index ordering is potentially useful for
@@ -1056,6 +1079,13 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 								  false);
 		result = lappend(result, ipath);
 
+		/* Consider index skip scan as well */
+		if (useful_uniquekeys != NULL && can_skip &&
+			(index_only_scan || !not_empty_qual))
+			result = lappend(result,
+							 create_skipscan_unique_path(root, index,
+								 						 (Path *) ipath));
+
 		/*
 		 * If appropriate, consider parallel index scan.  We don't allow
 		 * parallel index scan for bitmap index scans.
@@ -1116,6 +1146,13 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 									  false);
 			result = lappend(result, ipath);
 
+			/* Consider index skip scan as well */
+			if (useful_uniquekeys != NULL && can_skip &&
+				(index_only_scan || !not_empty_qual))
+				result = lappend(result,
+								 create_skipscan_unique_path(root, index,
+															 (Path *) ipath));
+
 			/* If appropriate, consider parallel index scan */
 			if (index->amcanparallel &&
 				rel->consider_parallel && outer_relids == NULL &&
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index dff826a828..7b32f2cc7e 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -175,12 +175,14 @@ static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
 								 Oid indexid, List *indexqual, List *indexqualorig,
 								 List *indexorderby, List *indexorderbyorig,
 								 List *indexorderbyops,
-								 ScanDirection indexscandir);
+								 ScanDirection indexscandir,
+								 int skipprefix);
 static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 Index scanrelid, Oid indexid,
 										 List *indexqual, List *indexorderby,
 										 List *indextlist,
-										 ScanDirection indexscandir);
+										 ScanDirection indexscandir,
+										 int skipprefix);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
 											  List *indexqualorig);
@@ -2910,7 +2912,8 @@ create_indexscan_plan(PlannerInfo *root,
 												fixed_indexquals,
 												fixed_indexorderbys,
 												best_path->indexinfo->indextlist,
-												best_path->indexscandir);
+												best_path->indexscandir,
+												best_path->indexskipprefix);
 	else
 		scan_plan = (Scan *) make_indexscan(tlist,
 											qpqual,
@@ -2921,7 +2924,8 @@ create_indexscan_plan(PlannerInfo *root,
 											fixed_indexorderbys,
 											indexorderbys,
 											indexorderbyops,
-											best_path->indexscandir);
+											best_path->indexscandir,
+											best_path->indexskipprefix);
 
 	copy_generic_path_info(&scan_plan->plan, &best_path->path);
 
@@ -5184,7 +5188,8 @@ make_indexscan(List *qptlist,
 			   List *indexorderby,
 			   List *indexorderbyorig,
 			   List *indexorderbyops,
-			   ScanDirection indexscandir)
+			   ScanDirection indexscandir,
+			   int skipPrefixSize)
 {
 	IndexScan  *node = makeNode(IndexScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5201,6 +5206,7 @@ make_indexscan(List *qptlist,
 	node->indexorderbyorig = indexorderbyorig;
 	node->indexorderbyops = indexorderbyops;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
 
 	return node;
 }
@@ -5213,7 +5219,8 @@ make_indexonlyscan(List *qptlist,
 				   List *indexqual,
 				   List *indexorderby,
 				   List *indextlist,
-				   ScanDirection indexscandir)
+				   ScanDirection indexscandir,
+				   int skipPrefixSize)
 {
 	IndexOnlyScan *node = makeNode(IndexOnlyScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5228,6 +5235,7 @@ make_indexonlyscan(List *qptlist,
 	node->indexorderby = indexorderby;
 	node->indextlist = indextlist;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a7de8476d9..88305df5c3 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4828,13 +4828,19 @@ create_distinct_paths(PlannerInfo *root,
 			Path	   *path = (Path *) lfirst(lc);
 
 			if (pathkeys_contained_in(needed_pathkeys, path->pathkeys))
-			{
 				add_path(distinct_rel, (Path *)
 						 create_upper_unique_path(root, distinct_rel,
 												  path,
 												  list_length(root->distinct_pathkeys),
 												  numDistinctRows));
-			}
+		}
+
+		foreach(lc, input_rel->unique_pathlist)
+		{
+			Path	   *path = (Path *) lfirst(lc);
+
+			if (uniquekeys_contained_in(needed_pathkeys, path->uniquekeys))
+				add_path(distinct_rel, path);
 		}
 
 		/* For explicit-sort case, always use the more rigorous clause */
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index a4dfafbb59..b0ce17b0d6 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2564,6 +2564,7 @@ create_projection_path(PlannerInfo *root,
 	pathnode->path.pathkeys = subpath->pathkeys;
 
 	pathnode->subpath = subpath;
+	pathnode->path.uniquekeys = subpath->uniquekeys;
 
 	/*
 	 * We might not need a separate Result node.  If the input plan node type
@@ -2929,6 +2930,73 @@ create_upper_unique_path(PlannerInfo *root,
 	return pathnode;
 }
 
+/*
+ * create_skipscan_unique_path
+ *	  Creates a pathnode the same as an existing IndexPath except based on
+ *	  skipping duplicate values.  This may or may not be cheaper than using
+ *	  create_upper_unique_path.
+ *
+ * The input path must be an IndexPath for an index that supports amskip.
+ */
+IndexPath *
+create_skipscan_unique_path(PlannerInfo *root, IndexOptInfo *index,
+							Path *basepath)
+{
+	IndexPath 	*pathnode = makeNode(IndexPath);
+	int 		numDistinctRows;
+	int 		distinctPrefixKeys;
+	ListCell 	*lc;
+	List 	   	*exprs = NIL;
+
+
+	distinctPrefixKeys = list_length(root->query_uniquekeys);
+
+	Assert(IsA(basepath, IndexPath));
+
+	/* We don't want to modify basepath, so make a copy. */
+	memcpy(pathnode, basepath, sizeof(IndexPath));
+
+	/*
+	 * Normally we can think about distinctPrefixKeys as just
+	 * a number of distinct keys. But if lets say we have a
+	 * distinct key a, and the index contains b, a in exactly
+	 * this order. In such situation we need to use position
+	 * of a in the index as distinctPrefixKeys, otherwise skip
+	 * will happen only by the first column.
+	 */
+	foreach(lc, root->query_uniquekeys)
+	{
+		UniqueKey *uniquekey = (UniqueKey *) lfirst(lc);
+		EquivalenceMember *em =
+			lfirst_node(EquivalenceMember,
+						list_head(uniquekey->eq_clause->ec_members));
+		Var *var = (Var *) em->em_expr;
+
+		exprs = lappend(exprs, em->em_expr);
+
+		for (int i = 0; i < index->ncolumns; i++)
+		{
+			if (index->indexkeys[i] == var->varattno)
+			{
+				distinctPrefixKeys = Max(i + 1, distinctPrefixKeys);
+				break;
+			}
+		}
+	}
+
+	Assert(distinctPrefixKeys > 0);
+	pathnode->indexskipprefix = distinctPrefixKeys;
+
+	numDistinctRows = estimate_num_groups(root, exprs,
+										  pathnode->path.rows,
+										  NULL);
+
+	pathnode->path.total_cost = pathnode->path.startup_cost * numDistinctRows;
+	pathnode->path.rows = numDistinctRows;
+
+	return pathnode;
+}
+
 /*
  * create_agg_path
  *	  Creates a pathnode that represents performing aggregation/grouping
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d82fc5ab8b..f65b299f37 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -271,6 +271,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			info->amoptionalkey = amroutine->amoptionalkey;
 			info->amsearcharray = amroutine->amsearcharray;
 			info->amsearchnulls = amroutine->amsearchnulls;
+			info->amcanskip = (amroutine->amskip != NULL);
 			info->amcanparallel = amroutine->amcanparallel;
 			info->amhasgettuple = (amroutine->amgettuple != NULL);
 			info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index cacbe904db..7c71ee4499 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -923,6 +923,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_indexskipscan", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of index-skip-scan plans."),
+			NULL
+		},
+		&enable_indexskipscan,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_bitmapscan", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of bitmap-scan plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e1048c0047..a002ee2143 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -353,6 +353,7 @@
 #enable_hashjoin = on
 #enable_indexscan = on
 #enable_indexonlyscan = on
+#enable_indexskipscan = on
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 3b3e22f73d..3d39cd9d07 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -130,6 +130,13 @@ typedef void (*amrescan_function) (IndexScanDesc scan,
 typedef bool (*amgettuple_function) (IndexScanDesc scan,
 									 ScanDirection direction);
 
+/* skip past duplicates in a given prefix */
+typedef bool (*amskip_function) (IndexScanDesc scan,
+								 ScanDirection dir,
+								 ScanDirection indexdir,
+								 bool start,
+								 int prefix);
+
 /* fetch all valid tuples */
 typedef int64 (*amgetbitmap_function) (IndexScanDesc scan,
 									   TIDBitmap *tbm);
@@ -229,6 +236,7 @@ typedef struct IndexAmRoutine
 	amendscan_function amendscan;
 	ammarkpos_function ammarkpos;	/* can be NULL */
 	amrestrpos_function amrestrpos; /* can be NULL */
+	amskip_function amskip;				/* can be NULL */
 
 	/* interface functions to support parallel index scans */
 	amestimateparallelscan_function amestimateparallelscan; /* can be NULL */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 7e9364a50c..815de4e4dd 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,8 @@ extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
 extern IndexBulkDeleteResult *index_vacuum_cleanup(IndexVacuumInfo *info,
 												   IndexBulkDeleteResult *stats);
 extern bool index_can_return(Relation indexRelation, int attno);
+extern bool index_skip(IndexScanDesc scan, ScanDirection direction,
+					   ScanDirection indexdir, bool start, int prefix);
 extern RegProcedure index_getprocid(Relation irel, AttrNumber attnum,
 									uint16 procnum);
 extern FmgrInfo *index_getprocinfo(Relation irel, AttrNumber attnum,
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 20ace69dab..e098c6a1ab 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -662,6 +662,9 @@ typedef struct BTScanOpaqueData
 	 */
 	int			markItemIndex;	/* itemIndex, or -1 if not valid */
 
+	/* Work space for _bt_skip */
+	BTScanInsert	skipScanKey;	/* used to control skipping */
+
 	/* keep these last in struct for efficiency */
 	BTScanPosData currPos;		/* current position data */
 	BTScanPosData markPos;		/* marked position, if any */
@@ -793,6 +796,8 @@ extern OffsetNumber _bt_binsrch_insert(Relation rel, BTInsertState insertstate);
 extern int32 _bt_compare(Relation rel, BTScanInsert key, Page page, OffsetNumber offnum);
 extern bool _bt_first(IndexScanDesc scan, ScanDirection dir);
 extern bool _bt_next(IndexScanDesc scan, ScanDirection dir);
+extern bool _bt_skip(IndexScanDesc scan, ScanDirection dir,
+					 ScanDirection indexdir, bool start, int prefix);
 extern Buffer _bt_get_endpoint(Relation rel, uint32 level, bool rightmost,
 							   Snapshot snapshot);
 
@@ -817,6 +822,8 @@ extern void _bt_end_vacuum_callback(int code, Datum arg);
 extern Size BTreeShmemSize(void);
 extern void BTreeShmemInit(void);
 extern bytea *btoptions(Datum reloptions, bool validate);
+extern bool btskip(IndexScanDesc scan, ScanDirection dir,
+				   ScanDirection indexdir, bool start, int prefix);
 extern bool btproperty(Oid index_oid, int attno,
 					   IndexAMProperty prop, const char *propname,
 					   bool *res, bool *isnull);
diff --git a/src/include/access/sdir.h b/src/include/access/sdir.h
index 23feb90986..094a127464 100644
--- a/src/include/access/sdir.h
+++ b/src/include/access/sdir.h
@@ -55,4 +55,11 @@ typedef enum ScanDirection
 #define ScanDirectionIsForward(direction) \
 	((bool) ((direction) == ForwardScanDirection))
 
+/*
+ * ScanDirectionsAreOpposite
+ *		True iff scan directions are backward/forward or forward/backward.
+ */
+#define ScanDirectionsAreOpposite(dirA, dirB) \
+	((bool) (dirA != NoMovementScanDirection && dirA == -dirB))
+
 #endif							/* SDIR_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 1f6f5bbc20..2c6acc160a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1423,6 +1423,8 @@ typedef struct IndexScanState
 	ExprContext *iss_RuntimeContext;
 	Relation	iss_RelationDesc;
 	struct IndexScanDescData *iss_ScanDesc;
+	int         iss_SkipPrefixSize;
+	bool		iss_FirstTupleEmitted;
 
 	/* These are needed for re-checking ORDER BY expr ordering */
 	pairingheap *iss_ReorderQueue;
@@ -1452,6 +1454,8 @@ typedef struct IndexScanState
  *		TableSlot		   slot for holding tuples fetched from the table
  *		VMBuffer		   buffer in use for visibility map testing, if any
  *		PscanLen		   size of parallel index-only scan descriptor
+ *		SkipPrefixSize	   number of keys for skip-based DISTINCT
+ *		FirstTupleEmitted  has the first tuple been emitted
  * ----------------
  */
 typedef struct IndexOnlyScanState
@@ -1470,6 +1474,8 @@ typedef struct IndexOnlyScanState
 	struct IndexScanDescData *ioss_ScanDesc;
 	TupleTableSlot *ioss_TableSlot;
 	Buffer		ioss_VMBuffer;
+	int         ioss_SkipPrefixSize;
+	bool		ioss_FirstTupleEmitted;
 	Size		ioss_PscanLen;
 } IndexOnlyScanState;
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0de27f0ef3..ce00060ee0 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -840,6 +840,7 @@ struct IndexOptInfo
 	bool		amsearchnulls;	/* can AM search for NULL/NOT NULL entries? */
 	bool		amhasgettuple;	/* does AM have amgettuple interface? */
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
+	bool		amcanskip;		/* can AM skip duplicate values? */
 	bool		amcanparallel;	/* does AM support parallel scan? */
 	/* Rather than include amapi.h here, we declare amcostestimate like this */
 	void		(*amcostestimate) ();	/* AM's cost estimator */
@@ -1190,6 +1191,9 @@ typedef struct Path
  * we need not recompute them when considering using the same index in a
  * bitmap index/heap scan (see BitmapHeapPath).  The costs of the IndexPath
  * itself represent the costs of an IndexScan or IndexOnlyScan plan type.
+ *
+ * 'indexskipprefix' represents the number of columns to consider for skip
+ * scans.
  *----------
  */
 typedef struct IndexPath
@@ -1202,6 +1206,7 @@ typedef struct IndexPath
 	ScanDirection indexscandir;
 	Cost		indextotalcost;
 	Selectivity indexselectivity;
+	int			indexskipprefix;
 } IndexPath;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 32c0d87f80..03a00e8e1d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -409,6 +409,8 @@ typedef struct IndexScan
 	List	   *indexorderbyorig;	/* the same in original form */
 	List	   *indexorderbyops;	/* OIDs of sort ops for ORDER BY exprs */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for distinct
+										 * scans */
 } IndexScan;
 
 /* ----------------
@@ -436,6 +438,8 @@ typedef struct IndexOnlyScan
 	List	   *indexorderby;	/* list of index ORDER BY exprs */
 	List	   *indextlist;		/* TargetEntry list describing index's cols */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for distinct
+										 * scans */
 } IndexOnlyScan;
 
 /* ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index cb012ba198..847f34f02b 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -50,6 +50,7 @@ extern PGDLLIMPORT int max_parallel_workers_per_gather;
 extern PGDLLIMPORT bool enable_seqscan;
 extern PGDLLIMPORT bool enable_indexscan;
 extern PGDLLIMPORT bool enable_indexonlyscan;
+extern PGDLLIMPORT bool enable_indexskipscan;
 extern PGDLLIMPORT bool enable_bitmapscan;
 extern PGDLLIMPORT bool enable_tidscan;
 extern PGDLLIMPORT bool enable_sort;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index fd25997af5..ba3eaffd8a 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -202,6 +202,9 @@ extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
 												 Path *subpath,
 												 int numCols,
 												 double numGroups);
+extern IndexPath *create_skipscan_unique_path(PlannerInfo *root,
+											  IndexOptInfo *index,
+											  Path *subpath);
 extern AggPath *create_agg_path(PlannerInfo *root,
 								RelOptInfo *rel,
 								Path *subpath,
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index f3696c6d1d..c50c6d1866 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -244,3 +244,624 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+CREATE INDEX ON distinct_a ((a + 1));
+ANALYZE distinct_a;
+SELECT DISTINCT a FROM distinct_a;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+ a 
+---
+ 1
+(1 row)
+
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Index Only Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: true
+(2 rows)
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+-- test index skip scan for expressions
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT (a + 1) FROM distinct_a;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Scan using distinct_a_expr_idx on distinct_a
+   Skip scan: true
+(2 rows)
+
+SELECT DISTINCT (a + 1) FROM distinct_a;
+ ?column? 
+----------
+        2
+        3
+        4
+        5
+        6
+(5 rows)
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: true
+   Index Cond: (b = 2)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: true
+   Index Cond: (b = 2)
+(3 rows)
+
+DROP INDEX distinct_a_b_a;
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+FETCH FROM c;
+ a | b 
+---+---
+ 1 | 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+END;
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+FETCH FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+END;
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Index Only Scan using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: true
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 1 | 2
+ 3 | 1 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 1 | 2
+ 1 | 1 | 2
+(2 rows)
+
+END;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Index Only Scan Backward using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: true
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 2 | 2
+ 1 | 2 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 2 | 2
+ 3 | 2 | 2
+(2 rows)
+
+END;
+DROP TABLE distinct_abc;
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+ 2 | 1 | 10
+ 3 | 1 | 10
+ 4 | 1 | 10
+ 5 | 1 | 10
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Index Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: true
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Unique
+   ->  Bitmap Heap Scan on distinct_a
+         Recheck Cond: (a = 1)
+         ->  Bitmap Index Scan on distinct_a_a_b_idx
+               Index Cond: (a = 1)
+(5 rows)
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Unique
+   ->  Index Scan using distinct_a_a_b_idx on distinct_a
+         Index Cond: (b = 2)
+         Filter: (c = 10)
+(4 rows)
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+ a | a 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+ 4 | 4
+ 5 | 5
+(5 rows)
+
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+ a | ?column? 
+---+----------
+ 1 |        1
+ 2 |        1
+ 3 |        1
+ 4 |        1
+ 5 |        1
+(5 rows)
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+FETCH FROM c;
+ a 
+---
+ 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a 
+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+END;
+DROP TABLE distinct_a;
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 |  9999
+ 1 | 10000
+(5 rows)
+
+DROP TABLE distinct_visibility;
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Index Only Scan using distinct_boundaries_a_b_c_idx on distinct_boundaries
+   Skip scan: true
+   Index Cond: ((b >= 1) AND (c = 0))
+(3 rows)
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+ a | b | c 
+---+---+---
+ 1 | 2 | 0
+ 2 | 2 | 0
+ 3 | 2 | 0
+ 4 | 2 | 0
+ 5 | 2 | 0
+(5 rows)
+
+DROP TABLE distinct_boundaries;
+-- test tuple killing
+-- DESC ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed where a = 3;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 5 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 1 | 1000 | 0 | 10
+(4 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 1 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 5 | 1000 | 0 | 10
+(4 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
+-- regular ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed where a = 3;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a, b;
+    FETCH FORWARD ALL FROM c;
+ a | b | c | d  
+---+---+---+----
+ 1 | 1 | 1 | 10
+ 2 | 1 | 1 | 10
+ 4 | 1 | 1 | 10
+ 5 | 1 | 1 | 10
+(4 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a | b | c | d  
+---+---+---+----
+ 5 | 1 | 1 | 10
+ 4 | 1 | 1 | 10
+ 2 | 1 | 1 | 10
+ 1 | 1 | 1 | 10
+(4 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
+-- partial delete
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed WHERE a = 3 AND b <= 999;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 5 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 3 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 1 | 1000 | 0 | 10
+(5 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 1 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 3 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 5 | 1000 | 0 | 10
+(5 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index a1c90eb905..bd3b373515 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -78,6 +78,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashjoin                | on
  enable_indexonlyscan           | on
  enable_indexscan               | on
+ enable_indexskipscan           | on
  enable_material                | on
  enable_mergejoin               | on
  enable_nestloop                | on
@@ -89,7 +90,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(17 rows)
+(18 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index a605e86449..3441a0efc6 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -73,3 +73,257 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+CREATE INDEX ON distinct_a ((a + 1));
+ANALYZE distinct_a;
+
+SELECT DISTINCT a FROM distinct_a;
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+
+-- test index skip scan for expressions
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT (a + 1) FROM distinct_a;
+SELECT DISTINCT (a + 1) FROM distinct_a;
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+DROP INDEX distinct_a_b_a;
+
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+DROP TABLE distinct_abc;
+
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+DROP TABLE distinct_a;
+
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DROP TABLE distinct_visibility;
+
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+DROP TABLE distinct_boundaries;
+
+-- test tuple killing
+
+-- DESC ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed where a = 3;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
+
+-- regular ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed where a = 3;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a, b;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
+
+-- partial delete
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed WHERE a = 3 AND b <= 999;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
-- 
2.21.0

#63

Dilip Kumar

dilipbalaut@gmail.com

almost 6 years ago

In reply to: Dmitry Dolgov (#62)

Re: Index Skip Scan

On Wed, Mar 25, 2020 at 2:19 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Wed, Mar 25, 2020 at 11:31:56AM +0530, Dilip Kumar wrote:

Seems like you forgot to add the uniquekey.c file in the
v33-0001-Unique-key.patch.

Oh, you're right, thanks. Here is the corrected patch.

I was just wondering how the distinct will work with the "skip scan"
if we have some filter? I mean every time we select the unique row
based on the prefix key and that might get rejected by an external
filter right? So I tried an example to check this.

postgres[50006]=# insert into t select 2, i from generate_series(1, 200)i;
INSERT 0 200
postgres[50006]=# insert into t select 1, i from generate_series(1, 200)i;
INSERT 0 200

postgres[50006]=# set enable_indexskipscan =off;
SET
postgres[50006]=# select distinct(a) from t where b%100=0;
a
---
1
2
(2 rows)

postgres[50006]=# set enable_indexskipscan =on;
SET
postgres[50006]=# select distinct(a) from t where b%100=0;
a
---
(0 rows)

postgres[50006]=# explain select distinct(a) from t where b%100=0;
QUERY PLAN
-------------------------------------------------------------------
Index Only Scan using idx on t (cost=0.15..1.55 rows=10 width=4)
Skip scan: true
Filter: ((b % 100) = 0)
(3 rows)

I think in such cases we should not select the skip scan. This should
behave like we have a filter on the non-index field.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#64

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Dilip Kumar (#63)

Re: Index Skip Scan

On Sun, Apr 05, 2020 at 04:30:51PM +0530, Dilip Kumar wrote:

I was just wondering how the distinct will work with the "skip scan"
if we have some filter? I mean every time we select the unique row
based on the prefix key and that might get rejected by an external
filter right?

Not exactly. In the case of index-only scan, we skipping to the first
unique position, and then use already existing functionality
(_bt_readpage with stepping to the next pages) to filter out those
records that do not pass the condition. There are even couple of tests
in the patch for this. In case of index scan, when there are some
conditions, current implementation do not consider skipping.

So I tried an example to check this.

Can you tell on which version of the patch you were testing?

#65

Dilip Kumar

dilipbalaut@gmail.com

almost 6 years ago

In reply to: Dmitry Dolgov (#64)

Re: Index Skip Scan

On Sun, Apr 5, 2020 at 9:39 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Sun, Apr 05, 2020 at 04:30:51PM +0530, Dilip Kumar wrote:

I was just wondering how the distinct will work with the "skip scan"
if we have some filter? I mean every time we select the unique row
based on the prefix key and that might get rejected by an external
filter right?

Not exactly. In the case of index-only scan, we skipping to the first
unique position, and then use already existing functionality
(_bt_readpage with stepping to the next pages) to filter out those
records that do not pass the condition.

I agree but that will work if we have a valid index clause, but
"b%100=0" condition will not create an index clause, right? However,
if we change the query to
select distinct(a) from t where b=100 then it works fine because this
condition will create an index clause.

There are even couple of tests

in the patch for this. In case of index scan, when there are some
conditions, current implementation do not consider skipping.

So I tried an example to check this.

Can you tell on which version of the patch you were testing?

I have tested on v33.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#66

Floris Van Nee

florisvannee@Optiver.com

almost 6 years ago

In reply to: Dilip Kumar (#65)

On Sun, Apr 05, 2020 at 04:30:51PM +0530, Dilip Kumar wrote:

I was just wondering how the distinct will work with the "skip scan"
if we have some filter? I mean every time we select the unique row
based on the prefix key and that might get rejected by an external
filter right?

Yeah, you're correct. This patch only handles the index conditions and doesn't handle any filters correctly. There's a check in the planner for the IndexScan for example that only columns that exist in the index are used. However, this check is not sufficient as your example shows. There's a number of ways we can force a 'filter' rather than an 'index condition' and still choose a skip scan (WHERE b!=0 is another one I think). This leads to incorrect query results.

This patch would need some logic in the planner to never choose the skip scan in these cases. Better long-term solution is to adapt the rest of the executor to work correctly in the cases of external filters (this ties in with the previous visibility discussion as well, as that's basically also an external filter, although a special case).
In the patch I posted a week ago these cases are all handled correctly, as it introduces this extra logic in the Executor.

-Floris

#67

Dilip Kumar

dilipbalaut@gmail.com

almost 6 years ago

In reply to: Floris Van Nee (#66)

Re: Index Skip Scan

On Mon, Apr 6, 2020 at 1:14 PM Floris Van Nee <florisvannee@optiver.com> wrote:

On Sun, Apr 05, 2020 at 04:30:51PM +0530, Dilip Kumar wrote:

I was just wondering how the distinct will work with the "skip scan"
if we have some filter? I mean every time we select the unique row
based on the prefix key and that might get rejected by an external
filter right?

Yeah, you're correct. This patch only handles the index conditions and doesn't handle any filters correctly. There's a check in the planner for the IndexScan for example that only columns that exist in the index are used. However, this check is not sufficient as your example shows. There's a number of ways we can force a 'filter' rather than an 'index condition' and still choose a skip scan (WHERE b!=0 is another one I think). This leads to incorrect query results.

Right

This patch would need some logic in the planner to never choose the skip scan in these cases. Better long-term solution is to adapt the rest of the executor to work correctly in the cases of external filters (this ties in with the previous visibility discussion as well, as that's basically also an external filter, although a special case).

I agree

In the patch I posted a week ago these cases are all handled correctly, as it introduces this extra logic in the Executor.

Okay, So I think we can merge those fixes in Dmitry's patch set.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#68

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Dilip Kumar (#67)

Re: Index Skip Scan

On Mon, Apr 6, 2020 at 1:14 PM Floris Van Nee <florisvannee@optiver.com> wrote:

There's a number of ways we can force a 'filter' rather than an
'index condition'.

Hm, I wasn't aware about this one, thanks for bringing this up. Btw,
Floris, I would appreciate if in the future you can make it more visible
that changes you suggest contain some fixes. E.g. it wasn't clear for me
from your previous email that that's the case, and it doesn't make sense
to pull into different direction when we're trying to achieve the same
goal :)

In the patch I posted a week ago these cases are all handled
correctly, as it introduces this extra logic in the Executor.

Okay, So I think we can merge those fixes in Dmitry's patch set.

I'll definitely take a look at suggested changes in filtering part.

#69

Floris Van Nee

florisvannee@Optiver.com

almost 6 years ago

In reply to: Dmitry Dolgov (#68)

Hm, I wasn't aware about this one, thanks for bringing this up. Btw, Floris, I
would appreciate if in the future you can make it more visible that changes you
suggest contain some fixes. E.g. it wasn't clear for me from your previous email
that that's the case, and it doesn't make sense to pull into different direction
when we're trying to achieve the same goal :)

I wasn't aware that this particular case could be triggered before I saw Dilip's email, otherwise I'd have mentioned it here of course. It's just that because my patch handles filter conditions in general, it works for this case too.

In the patch I posted a week ago these cases are all handled
correctly, as it introduces this extra logic in the Executor.

Okay, So I think we can merge those fixes in Dmitry's patch set.

I'll definitely take a look at suggested changes in filtering part.

It may be possible to just merge the filtering part into your patch, but I'm not entirely sure. Basically you have to pull the information about skipping one level up, out of the node, into the generic IndexNext code.

I'm eager to get some form of skip scans into master - any kind of patch that makes this possible is fine by me. Long term I think my version provides a more generic approach, with which we can optimize a much broader range of queries. However, since many more eyes have seen your patch so far, I hope yours can be committed much sooner. My knowledge on this committer process is limited though. That's why I've just posted mine so far in the hope of collecting some feedback, also on how we should continue with the process.

-Floris

#70

Dmitry Dolgov

9erthalion6@gmail.com

almost 6 years ago

In reply to: Floris Van Nee (#69)

2 attachment(s)

Re: Index Skip Scan

On Mon, Apr 06, 2020 at 06:31:08PM +0000, Floris Van Nee wrote:

Hm, I wasn't aware about this one, thanks for bringing this up. Btw, Floris, I
would appreciate if in the future you can make it more visible that changes you
suggest contain some fixes. E.g. it wasn't clear for me from your previous email
that that's the case, and it doesn't make sense to pull into different direction
when we're trying to achieve the same goal :)

I wasn't aware that this particular case could be triggered before I saw Dilip's email, otherwise I'd have mentioned it here of course. It's just that because my patch handles filter conditions in general, it works for this case too.

Oh, then fortunately I've got a wrong impression, sorry and thanks for
clarification :)

In the patch I posted a week ago these cases are all handled
correctly, as it introduces this extra logic in the Executor.

Okay, So I think we can merge those fixes in Dmitry's patch set.

I'll definitely take a look at suggested changes in filtering part.

It may be possible to just merge the filtering part into your patch, but I'm not entirely sure. Basically you have to pull the information about skipping one level up, out of the node, into the generic IndexNext code.

I was actually thinking more about just preventing skip scan in this
situation, which is if I'm not mistaken could be solved by inspecting
qual conditions to figure out if they're covered in the index -
something like in attachments (this implementation is actually too
restrictive in this sense and will not allow e.g. expressions, that's
why I haven't bumped patch set version for it - soon I'll post an
extended version).

Other than that to summarize current open points for future readers
(this thread somehow became quite big):

* Making UniqueKeys usage more generic to allow using skip scan for more
use cases (hopefully it was covered by the v33, but I still need a
confirmation from David, like blinking twice or something).

* Suspicious performance difference between different type of workload,
mentioned by Tomas (unfortunately I had no chance yet to investigate).

* Thinking about supporting conditions, that are not covered by the index,
to make skipping more flexible (one of the potential next steps in the
future, as suggested by Floris).

Attachments:

v33-0001-Unique-key.patchtext/x-diff; charset=us-asciiDownload

From 15989c5250214fea8606a56afd1eeaf760b8723e Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Tue, 24 Mar 2020 17:04:32 +0100
Subject: [PATCH v33 1/2] Unique key

Design by David Rowley.

Author: Jesper Pedersen
---
 src/backend/nodes/outfuncs.c           |  14 +++
 src/backend/nodes/print.c              |  39 +++++++
 src/backend/optimizer/path/Makefile    |   3 +-
 src/backend/optimizer/path/allpaths.c  |   8 ++
 src/backend/optimizer/path/indxpath.c  |  41 ++++++++
 src/backend/optimizer/path/pathkeys.c  |  71 +++++++++++--
 src/backend/optimizer/path/uniquekey.c | 136 +++++++++++++++++++++++++
 src/backend/optimizer/plan/planagg.c   |   1 +
 src/backend/optimizer/plan/planmain.c  |   1 +
 src/backend/optimizer/plan/planner.c   |  37 ++++++-
 src/backend/optimizer/util/pathnode.c  |  46 +++++++--
 src/include/nodes/nodes.h              |   1 +
 src/include/nodes/pathnodes.h          |  19 ++++
 src/include/nodes/print.h              |   1 +
 src/include/optimizer/pathnode.h       |   2 +
 src/include/optimizer/paths.h          |  11 ++
 16 files changed, 408 insertions(+), 23 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekey.c

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index d76fae44b8..16083e7a7e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1723,6 +1723,7 @@ _outPathInfo(StringInfo str, const Path *node)
 	WRITE_FLOAT_FIELD(startup_cost, "%.2f");
 	WRITE_FLOAT_FIELD(total_cost, "%.2f");
 	WRITE_NODE_FIELD(pathkeys);
+	WRITE_NODE_FIELD(uniquekeys);
 }
 
 /*
@@ -2205,6 +2206,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(eq_classes);
 	WRITE_BOOL_FIELD(ec_merging_done);
 	WRITE_NODE_FIELD(canon_pathkeys);
+	WRITE_NODE_FIELD(canon_uniquekeys);
 	WRITE_NODE_FIELD(left_join_clauses);
 	WRITE_NODE_FIELD(right_join_clauses);
 	WRITE_NODE_FIELD(full_join_clauses);
@@ -2214,6 +2216,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(placeholder_list);
 	WRITE_NODE_FIELD(fkey_list);
 	WRITE_NODE_FIELD(query_pathkeys);
+	WRITE_NODE_FIELD(query_uniquekeys);
 	WRITE_NODE_FIELD(group_pathkeys);
 	WRITE_NODE_FIELD(window_pathkeys);
 	WRITE_NODE_FIELD(distinct_pathkeys);
@@ -2401,6 +2404,14 @@ _outPathKey(StringInfo str, const PathKey *node)
 	WRITE_BOOL_FIELD(pk_nulls_first);
 }
 
+static void
+_outUniqueKey(StringInfo str, const UniqueKey *node)
+{
+	WRITE_NODE_TYPE("UNIQUEKEY");
+
+	WRITE_NODE_FIELD(eq_clause);
+}
+
 static void
 _outPathTarget(StringInfo str, const PathTarget *node)
 {
@@ -4092,6 +4103,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PathKey:
 				_outPathKey(str, obj);
 				break;
+			case T_UniqueKey:
+				_outUniqueKey(str, obj);
+				break;
 			case T_PathTarget:
 				_outPathTarget(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 42476724d8..d286b34544 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -459,6 +459,45 @@ print_pathkeys(const List *pathkeys, const List *rtable)
 	printf(")\n");
 }
 
+/*
+ * print_uniquekeys -
+ *	  uniquekeys list of UniqueKeys
+ */
+void
+print_uniquekeys(const List *uniquekeys, const List *rtable)
+{
+	ListCell   *l;
+
+	printf("(");
+	foreach(l, uniquekeys)
+	{
+		UniqueKey *unique_key = (UniqueKey *) lfirst(l);
+		EquivalenceClass *eclass = (EquivalenceClass *) unique_key->eq_clause;
+		ListCell   *k;
+		bool		first = true;
+
+		/* chase up */
+		while (eclass->ec_merged)
+			eclass = eclass->ec_merged;
+
+		printf("(");
+		foreach(k, eclass->ec_members)
+		{
+			EquivalenceMember *mem = (EquivalenceMember *) lfirst(k);
+
+			if (first)
+				first = false;
+			else
+				printf(", ");
+			print_expr((Node *) mem->em_expr, rtable);
+		}
+		printf(")");
+		if (lnext(uniquekeys, l))
+			printf(", ");
+	}
+	printf(")\n");
+}
+
 /*
  * print_tl
  *	  print targetlist in a more legible way.
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..63cc1505d9 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekey.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8286d9cf34..bbc13e6141 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3954,6 +3954,14 @@ print_path(PlannerInfo *root, Path *path, int indent)
 		print_pathkeys(path->pathkeys, root->parse->rtable);
 	}
 
+	if (path->uniquekeys)
+	{
+		for (i = 0; i < indent; i++)
+			printf("\t");
+		printf("  uniquekeys: ");
+		print_uniquekeys(path->uniquekeys, root->parse->rtable);
+	}
+
 	if (join)
 	{
 		JoinPath   *jp = (JoinPath *) path;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2a50272da6..363f5349f1 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -189,6 +189,7 @@ static Expr *match_clause_to_ordering_op(IndexOptInfo *index,
 static bool ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 									   EquivalenceClass *ec, EquivalenceMember *em,
 									   void *arg);
+static List *get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys);
 
 
 /*
@@ -874,6 +875,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	List	   *orderbyclausecols;
 	List	   *index_pathkeys;
 	List	   *useful_pathkeys;
+	List	   *useful_uniquekeys = NIL;
 	bool		found_lower_saop_clause;
 	bool		pathkeys_possibly_useful;
 	bool		index_is_ordered;
@@ -1036,11 +1038,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	if (index_clauses != NIL || useful_pathkeys != NIL || useful_predicate ||
 		index_only_scan)
 	{
+		if (has_useful_uniquekeys(root))
+			useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 		ipath = create_index_path(root, index,
 								  index_clauses,
 								  orderbyclauses,
 								  orderbyclausecols,
 								  useful_pathkeys,
+								  useful_uniquekeys,
 								  index_is_ordered ?
 								  ForwardScanDirection :
 								  NoMovementScanDirection,
@@ -1063,6 +1069,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 									  orderbyclauses,
 									  orderbyclausecols,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  index_is_ordered ?
 									  ForwardScanDirection :
 									  NoMovementScanDirection,
@@ -1093,11 +1100,15 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 													index_pathkeys);
 		if (useful_pathkeys != NIL)
 		{
+			if (has_useful_uniquekeys(root))
+				useful_uniquekeys = get_uniquekeys_for_index(root, useful_pathkeys);
+
 			ipath = create_index_path(root, index,
 									  index_clauses,
 									  NIL,
 									  NIL,
 									  useful_pathkeys,
+									  useful_uniquekeys,
 									  BackwardScanDirection,
 									  index_only_scan,
 									  outer_relids,
@@ -1115,6 +1126,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 										  NIL,
 										  NIL,
 										  useful_pathkeys,
+										  useful_uniquekeys,
 										  BackwardScanDirection,
 										  index_only_scan,
 										  outer_relids,
@@ -3365,6 +3377,35 @@ match_clause_to_ordering_op(IndexOptInfo *index,
 	return clause;
 }
 
+/*
+ * get_uniquekeys_for_index
+ */
+static List *
+get_uniquekeys_for_index(PlannerInfo *root, List *pathkeys)
+{
+	ListCell *lc;
+
+	if (pathkeys)
+	{
+		List *uniquekeys = NIL;
+		foreach(lc, pathkeys)
+		{
+			UniqueKey *unique_key;
+			PathKey *pk = (PathKey *) lfirst(lc);
+			EquivalenceClass *ec = (EquivalenceClass *) pk->pk_eclass;
+
+			unique_key = makeNode(UniqueKey);
+			unique_key->eq_clause = ec;
+
+			uniquekeys = lappend(uniquekeys, unique_key);
+		}
+
+		if (uniquekeys_contained_in(root->canon_uniquekeys, uniquekeys))
+			return uniquekeys;
+	}
+
+	return NIL;
+}
 
 /****************************************************************************
  *				----  ROUTINES TO DO PARTIAL INDEX PREDICATE TESTS	----
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 71b9d42c99..054df9a617 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -29,6 +29,7 @@
 #include "utils/lsyscache.h"
 
 
+static bool pathkey_is_unique(PathKey *new_pathkey, List *pathkeys);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
 											 RelOptInfo *partrel,
@@ -96,6 +97,29 @@ make_canonical_pathkey(PlannerInfo *root,
 	return pk;
 }
 
+/*
+ * pathkey_is_unique
+ *	   Checks if the new pathkey's equivalence class is the same as that of
+ *     any existing member of the pathkey list.
+ */
+static bool
+pathkey_is_unique(PathKey *new_pathkey, List *pathkeys)
+{
+	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
+	ListCell   *lc;
+
+	/* If same EC already is already in the list, then not unique */
+	foreach(lc, pathkeys)
+	{
+		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
+
+		if (new_ec == old_pathkey->pk_eclass)
+			return false;
+	}
+
+	return true;
+}
+
 /*
  * pathkey_is_redundant
  *	   Is a pathkey redundant with one already in the given list?
@@ -135,22 +159,12 @@ static bool
 pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys)
 {
 	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
-	ListCell   *lc;
 
 	/* Check for EC containing a constant --- unconditionally redundant */
 	if (EC_MUST_BE_REDUNDANT(new_ec))
 		return true;
 
-	/* If same EC already used in list, then redundant */
-	foreach(lc, pathkeys)
-	{
-		PathKey    *old_pathkey = (PathKey *) lfirst(lc);
-
-		if (new_ec == old_pathkey->pk_eclass)
-			return true;
-	}
-
-	return false;
+	return !pathkey_is_unique(new_pathkey, pathkeys);
 }
 
 /*
@@ -1098,6 +1112,41 @@ make_pathkeys_for_sortclauses(PlannerInfo *root,
 	return pathkeys;
 }
 
+/*
+ * make_pathkeys_for_uniquekeyclauses
+ *		Generate a pathkeys list to be used for uniquekey clauses
+ */
+List *
+make_pathkeys_for_uniquekeys(PlannerInfo *root,
+							 List *sortclauses,
+							 List *tlist)
+{
+	List	   *pathkeys = NIL;
+	ListCell   *l;
+
+	foreach(l, sortclauses)
+	{
+		SortGroupClause *sortcl = (SortGroupClause *) lfirst(l);
+		Expr	   *sortkey;
+		PathKey    *pathkey;
+
+		sortkey = (Expr *) get_sortgroupclause_expr(sortcl, tlist);
+		Assert(OidIsValid(sortcl->sortop));
+		pathkey = make_pathkey_from_sortop(root,
+										   sortkey,
+										   root->nullable_baserels,
+										   sortcl->sortop,
+										   sortcl->nulls_first,
+										   sortcl->tleSortGroupRef,
+										   true);
+
+		if (pathkey_is_unique(pathkey, pathkeys))
+			pathkeys = lappend(pathkeys, pathkey);
+	}
+
+	return pathkeys;
+}
+
 /****************************************************************************
  *		PATHKEYS AND MERGECLAUSES
  ****************************************************************************/
diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c
new file mode 100644
index 0000000000..c421401d0f
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekey.c
@@ -0,0 +1,136 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekey.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekey.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "nodes/pg_list.h"
+
+static UniqueKey *make_canonical_uniquekey(PlannerInfo *root, EquivalenceClass *eclass);
+
+/*
+ * Build a list of unique keys
+ */
+List*
+build_uniquekeys(PlannerInfo *root, List *sortclauses)
+{
+	List *result = NIL;
+	List *sortkeys;
+	ListCell *l;
+
+	sortkeys = make_pathkeys_for_uniquekeys(root,
+											sortclauses,
+											root->processed_tlist);
+
+	/* Create a uniquekey and add it to the list */
+	foreach(l, sortkeys)
+	{
+		PathKey    *pathkey = (PathKey *) lfirst(l);
+		EquivalenceClass *ec = pathkey->pk_eclass;
+		UniqueKey *unique_key = make_canonical_uniquekey(root, ec);
+
+		result = lappend(result, unique_key);
+	}
+
+	return result;
+}
+
+/*
+ * uniquekeys_contained_in
+ *	  Are the keys2 included in the keys1 superset
+ */
+bool
+uniquekeys_contained_in(List *keys1, List *keys2)
+{
+	ListCell   *key1,
+			   *key2;
+
+	foreach(key2, keys2)
+	{
+		bool found = false;
+		UniqueKey  *uniquekey2 = (UniqueKey *) lfirst(key2);
+
+		foreach(key1, keys1)
+		{
+			UniqueKey  *uniquekey1 = (UniqueKey *) lfirst(key1);
+
+			if (uniquekey1->eq_clause == uniquekey2->eq_clause)
+				return true;
+		}
+
+		if (!found)
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * has_useful_uniquekeys
+ *		Detect whether the planner could have any uniquekeys that are
+ *		useful.
+ */
+bool
+has_useful_uniquekeys(PlannerInfo *root)
+{
+	if (root->query_uniquekeys != NIL)
+		return true;	/* there are some */
+	return false;		/* definitely useless */
+}
+
+/*
+ * make_canonical_uniquekey
+ *	  Given the parameters for a UniqueKey, find any pre-existing matching
+ *	  uniquekey in the query's list of "canonical" uniquekeys.  Make a new
+ *	  entry if there's not one already.
+ *
+ * Note that this function must not be used until after we have completed
+ * merging EquivalenceClasses.  (We don't try to enforce that here; instead,
+ * equivclass.c will complain if a merge occurs after root->canon_uniquekeys
+ * has become nonempty.)
+ */
+static UniqueKey *
+make_canonical_uniquekey(PlannerInfo *root,
+						 EquivalenceClass *eclass)
+{
+	UniqueKey  *uk;
+	ListCell   *lc;
+	MemoryContext oldcontext;
+
+	/* The passed eclass might be non-canonical, so chase up to the top */
+	while (eclass->ec_merged)
+		eclass = eclass->ec_merged;
+
+	foreach(lc, root->canon_uniquekeys)
+	{
+		uk = (UniqueKey *) lfirst(lc);
+		if (eclass == uk->eq_clause)
+			return uk;
+	}
+
+	/*
+	 * Be sure canonical uniquekeys are allocated in the main planning context.
+	 * Not an issue in normal planning, but it is for GEQO.
+	 */
+	oldcontext = MemoryContextSwitchTo(root->planner_cxt);
+
+	uk = makeNode(UniqueKey);
+	uk->eq_clause = eclass;
+
+	root->canon_uniquekeys = lappend(root->canon_uniquekeys, uk);
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return uk;
+}
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 8634940efc..dd64775d8f 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -511,6 +511,7 @@ minmax_qp_callback(PlannerInfo *root, void *extra)
 									  root->parse->targetList);
 
 	root->query_pathkeys = root->sort_pathkeys;
+	root->query_uniquekeys = NIL;
 }
 
 /*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..3a372af91b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -70,6 +70,7 @@ query_planner(PlannerInfo *root,
 	root->join_rel_level = NULL;
 	root->join_cur_level = 0;
 	root->canon_pathkeys = NIL;
+	root->canon_uniquekeys = NIL;
 	root->left_join_clauses = NIL;
 	root->right_join_clauses = NIL;
 	root->full_join_clauses = NIL;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d6f2153593..a7de8476d9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3657,15 +3657,30 @@ standard_qp_callback(PlannerInfo *root, void *extra)
 	 * much easier, since we know that the parser ensured that one is a
 	 * superset of the other.
 	 */
+	root->query_uniquekeys = NIL;
+
 	if (root->group_pathkeys)
+	{
 		root->query_pathkeys = root->group_pathkeys;
+
+		if (!root->parse->hasAggs)
+			root->query_uniquekeys = build_uniquekeys(root, qp_extra->groupClause);
+	}
 	else if (root->window_pathkeys)
 		root->query_pathkeys = root->window_pathkeys;
 	else if (list_length(root->distinct_pathkeys) >
 			 list_length(root->sort_pathkeys))
+	{
 		root->query_pathkeys = root->distinct_pathkeys;
+		root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else if (root->sort_pathkeys)
+	{
 		root->query_pathkeys = root->sort_pathkeys;
+
+		if (root->distinct_pathkeys)
+			root->query_uniquekeys = build_uniquekeys(root, parse->distinctClause);
+	}
 	else
 		root->query_pathkeys = NIL;
 }
@@ -6222,7 +6237,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
 
 	/* Estimate the cost of index scan */
 	indexScanPath = create_index_path(root, indexInfo,
-									  NIL, NIL, NIL, NIL,
+									  NIL, NIL, NIL, NIL, NIL,
 									  ForwardScanDirection, false,
 									  NULL, 1.0, false);
 
@@ -7107,6 +7122,26 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
 		}
 	}
 
+	foreach(lc, rel->unique_pathlist)
+	{
+		Path	   *subpath = (Path *) lfirst(lc);
+
+		/* Shouldn't have any parameterized paths anymore */
+		Assert(subpath->param_info == NULL);
+
+		if (tlist_same_exprs)
+			subpath->pathtarget->sortgrouprefs =
+				scanjoin_target->sortgrouprefs;
+		else
+		{
+			Path	   *newpath;
+
+			newpath = (Path *) create_projection_path(root, rel, subpath,
+													  scanjoin_target);
+			lfirst(lc) = newpath;
+		}
+	}
+
 	/*
 	 * Now, if final scan/join target contains SRFs, insert ProjectSetPath(s)
 	 * atop each existing path.  (Note that this function doesn't look at the
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e6d08aede5..a4dfafbb59 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -361,9 +361,9 @@ set_cheapest(RelOptInfo *parent_rel)
 }
 
 /*
- * add_path
+ * add_path_to
  *	  Consider a potential implementation path for the specified parent rel,
- *	  and add it to the rel's pathlist if it is worthy of consideration.
+ *	  and add it to the specified pathlist if it is worthy of consideration.
  *	  A path is worthy if it has a better sort order (better pathkeys) or
  *	  cheaper cost (on either dimension), or generates fewer rows, than any
  *	  existing path that has the same or superset parameterization rels.
@@ -416,10 +416,10 @@ set_cheapest(RelOptInfo *parent_rel)
  * 'parent_rel' is the relation entry to which the path corresponds.
  * 'new_path' is a potential path for parent_rel.
  *
- * Returns nothing, but modifies parent_rel->pathlist.
+ * Returns modified pathlist.
  */
-void
-add_path(RelOptInfo *parent_rel, Path *new_path)
+static List *
+add_path_to(RelOptInfo *parent_rel, List *pathlist, Path *new_path)
 {
 	bool		accept_new = true;	/* unless we find a superior old path */
 	int			insert_at = 0;	/* where to insert new item */
@@ -440,7 +440,7 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 	 * for more than one old path to be tossed out because new_path dominates
 	 * it.
 	 */
-	foreach(p1, parent_rel->pathlist)
+	foreach(p1, pathlist)
 	{
 		Path	   *old_path = (Path *) lfirst(p1);
 		bool		remove_old = false; /* unless new proves superior */
@@ -584,8 +584,7 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 		 */
 		if (remove_old)
 		{
-			parent_rel->pathlist = foreach_delete_current(parent_rel->pathlist,
-														  p1);
+			pathlist = foreach_delete_current(pathlist, p1);
 
 			/*
 			 * Delete the data pointed-to by the deleted cell, if possible
@@ -612,8 +611,7 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 	if (accept_new)
 	{
 		/* Accept the new path: insert it at proper place in pathlist */
-		parent_rel->pathlist =
-			list_insert_nth(parent_rel->pathlist, insert_at, new_path);
+		pathlist = list_insert_nth(pathlist, insert_at, new_path);
 	}
 	else
 	{
@@ -621,6 +619,15 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 		if (!IsA(new_path, IndexPath))
 			pfree(new_path);
 	}
+
+	return pathlist;
+}
+
+void
+add_path(RelOptInfo *parent_rel, Path *new_path)
+{
+	parent_rel->pathlist = add_path_to(parent_rel,
+									   parent_rel->pathlist, new_path);
 }
 
 /*
@@ -915,6 +922,13 @@ add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost,
 	return true;
 }
 
+void
+add_unique_path(RelOptInfo *parent_rel, Path *new_path)
+{
+	parent_rel->unique_pathlist = add_path_to(parent_rel,
+											  parent_rel->unique_pathlist,
+											  new_path);
+}
 
 /*****************************************************************************
  *		PATH NODE CREATION ROUTINES
@@ -940,6 +954,7 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = parallel_workers;
 	pathnode->pathkeys = NIL;	/* seqscan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_seqscan(pathnode, root, rel, pathnode->param_info);
 
@@ -964,6 +979,7 @@ create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* samplescan has unordered result */
+	pathnode->uniquekeys = NIL;
 
 	cost_samplescan(pathnode, root, rel, pathnode->param_info);
 
@@ -1000,6 +1016,7 @@ create_index_path(PlannerInfo *root,
 				  List *indexorderbys,
 				  List *indexorderbycols,
 				  List *pathkeys,
+				  List *uniquekeys,
 				  ScanDirection indexscandir,
 				  bool indexonly,
 				  Relids required_outer,
@@ -1018,6 +1035,7 @@ create_index_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = 0;
 	pathnode->path.pathkeys = pathkeys;
+	pathnode->path.uniquekeys = uniquekeys;
 
 	pathnode->indexinfo = index;
 	pathnode->indexclauses = indexclauses;
@@ -1061,6 +1079,7 @@ create_bitmap_heap_path(PlannerInfo *root,
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_degree;
 	pathnode->path.pathkeys = NIL;	/* always unordered */
+	pathnode->path.uniquekeys = NIL;
 
 	pathnode->bitmapqual = bitmapqual;
 
@@ -1922,6 +1941,7 @@ create_functionscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = pathkeys;
+	pathnode->uniquekeys = NIL;
 
 	cost_functionscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1948,6 +1968,7 @@ create_tablefuncscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_tablefuncscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1974,6 +1995,7 @@ create_valuesscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_valuesscan(pathnode, root, rel, pathnode->param_info);
 
@@ -1999,6 +2021,7 @@ create_ctescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* XXX for now, result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2025,6 +2048,7 @@ create_namedtuplestorescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_namedtuplestorescan(pathnode, root, rel, pathnode->param_info);
 
@@ -2051,6 +2075,7 @@ create_resultscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	cost_resultscan(pathnode, root, rel, pathnode->param_info);
 
@@ -2077,6 +2102,7 @@ create_worktablescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parallel_safe = rel->consider_parallel;
 	pathnode->parallel_workers = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
+	pathnode->uniquekeys = NIL;
 
 	/* Cost is the same as for a regular CTE scan */
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index baced7eec0..a1511b46ea 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 3d3be197e0..0de27f0ef3 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -269,6 +269,8 @@ struct PlannerInfo
 
 	List	   *canon_pathkeys; /* list of "canonical" PathKeys */
 
+	List	   *canon_uniquekeys; /* list of "canonical" UniqueKeys */
+
 	List	   *left_join_clauses;	/* list of RestrictInfos for mergejoinable
 									 * outer join clauses w/nonnullable var on
 									 * left */
@@ -297,6 +299,8 @@ struct PlannerInfo
 
 	List	   *query_pathkeys; /* desired pathkeys for query_planner() */
 
+	List	   *query_uniquekeys; /* unique keys used for the query */
+
 	List	   *group_pathkeys; /* groupClause pathkeys, if any */
 	List	   *window_pathkeys;	/* pathkeys of bottom window, if any */
 	List	   *distinct_pathkeys;	/* distinctClause pathkeys, if any */
@@ -657,6 +661,7 @@ typedef struct RelOptInfo
 	List	   *pathlist;		/* Path structures */
 	List	   *ppilist;		/* ParamPathInfos used in pathlist */
 	List	   *partial_pathlist;	/* partial Paths */
+	List	   *unique_pathlist;	/* unique Paths */
 	struct Path *cheapest_startup_path;
 	struct Path *cheapest_total_path;
 	struct Path *cheapest_unique_path;
@@ -1077,6 +1082,15 @@ typedef struct ParamPathInfo
 	List	   *ppi_clauses;	/* join clauses available from outer rels */
 } ParamPathInfo;
 
+/*
+ * UniqueKey
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+
+	EquivalenceClass *eq_clause;	/* equivalence class */
+} UniqueKey;
 
 /*
  * Type "Path" is used as-is for sequential-scan paths, as well as some other
@@ -1106,6 +1120,9 @@ typedef struct ParamPathInfo
  *
  * "pathkeys" is a List of PathKey nodes (see above), describing the sort
  * ordering of the path's output rows.
+ *
+ * "uniquekeys", if not NIL, is a list of UniqueKey nodes (see above),
+ * describing the XXX.
  */
 typedef struct Path
 {
@@ -1129,6 +1146,8 @@ typedef struct Path
 
 	List	   *pathkeys;		/* sort ordering of path's output */
 	/* pathkeys is a List of PathKey nodes; see above */
+
+	List	   *uniquekeys;	/* the unique keys, or NIL if none */
 } Path;
 
 /* Macro for extracting a path's parameterization relids; beware double eval */
diff --git a/src/include/nodes/print.h b/src/include/nodes/print.h
index 6126b491bf..006248bfb5 100644
--- a/src/include/nodes/print.h
+++ b/src/include/nodes/print.h
@@ -28,6 +28,7 @@ extern char *pretty_format_node_dump(const char *dump);
 extern void print_rt(const List *rtable);
 extern void print_expr(const Node *expr, const List *rtable);
 extern void print_pathkeys(const List *pathkeys, const List *rtable);
+extern void print_uniquekeys(const List *uniquekeys, const List *rtable);
 extern void print_tl(const List *tlist, const List *rtable);
 extern void print_slot(TupleTableSlot *slot);
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e450fe112a..fd25997af5 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -34,6 +34,7 @@ extern void add_partial_path(RelOptInfo *parent_rel, Path *new_path);
 extern bool add_partial_path_precheck(RelOptInfo *parent_rel,
 									  Cost total_cost, List *pathkeys);
 
+extern void add_unique_path(RelOptInfo *parent_rel, Path *new_path);
 extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
 								 Relids required_outer, int parallel_workers);
 extern Path *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel,
@@ -44,6 +45,7 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									List *indexorderbys,
 									List *indexorderbycols,
 									List *pathkeys,
+									List *uniquekeys,
 									ScanDirection indexscandir,
 									bool indexonly,
 									Relids required_outer,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ab73bd20c..5b6be383b3 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -214,6 +214,9 @@ extern List *build_join_pathkeys(PlannerInfo *root,
 extern List *make_pathkeys_for_sortclauses(PlannerInfo *root,
 										   List *sortclauses,
 										   List *tlist);
+extern List *make_pathkeys_for_uniquekeys(PlannerInfo *root,
+										  List *sortclauses,
+										  List *tlist);
 extern void initialize_mergeclause_eclasses(PlannerInfo *root,
 											RestrictInfo *restrictinfo);
 extern void update_mergeclause_eclasses(PlannerInfo *root,
@@ -240,4 +243,12 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+/*
+ * uniquekey.c
+ *	  Utilities for matching and building unique keys
+ */
+extern List *build_uniquekeys(PlannerInfo *root, List *sortclauses);
+extern bool uniquekeys_contained_in(List *keys1, List *keys2);
+extern bool has_useful_uniquekeys(PlannerInfo *root);
+
 #endif							/* PATHS_H */
-- 
2.21.0

v33-0002-Index-skip-scan-with-filtering.patchtext/x-diff; charset=us-asciiDownload

From a44f8bf868133f0a342bf8de477d650a455efac7 Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Fri, 15 Nov 2019 09:46:53 -0500
Subject: [PATCH v33 2/2] Index skip scan

Implementation of Index Skip Scan (see Loose Index Scan in the wiki [1])
on top of IndexOnlyScan and IndexScan. To make it suitable for both
situations when there are small number of distinct values and
significant amount of distinct values the following approach is taken -
instead of searching from the root for every value we're searching for
then first on the current page, and then if not found continue searching
from the root.

Original patch and design were proposed by Thomas Munro [2], revived and
improved by Dmitry Dolgov and Jesper Pedersen.

[1] https://wiki.postgresql.org/wiki/Loose_indexscan
[2] https://www.postgresql.org/message-id/flat/CADLWmXXbTSBxP-MzJuPAYSsL_2f0iPm5VWPbCvDbVvfX93FKkw%40mail.gmail.com

Author: Jesper Pedersen, Dmitry Dolgov
Reviewed-by: Thomas Munro, David Rowley, Floris Van Nee, Kyotaro Horiguchi, Tomas Vondra, Peter Geoghegan
---
 contrib/bloom/blutils.c                       |   1 +
 doc/src/sgml/config.sgml                      |  15 +
 doc/src/sgml/indexam.sgml                     |  63 ++
 doc/src/sgml/indices.sgml                     |  23 +
 src/backend/access/brin/brin.c                |   1 +
 src/backend/access/gin/ginutil.c              |   1 +
 src/backend/access/gist/gist.c                |   1 +
 src/backend/access/hash/hash.c                |   1 +
 src/backend/access/index/indexam.c            |  18 +
 src/backend/access/nbtree/nbtree.c            |  13 +
 src/backend/access/nbtree/nbtsearch.c         | 469 ++++++++++++-
 src/backend/access/spgist/spgutils.c          |   1 +
 src/backend/commands/explain.c                |  25 +
 src/backend/executor/nodeIndexonlyscan.c      |  97 ++-
 src/backend/executor/nodeIndexscan.c          |  56 +-
 src/backend/nodes/copyfuncs.c                 |   2 +
 src/backend/nodes/outfuncs.c                  |   2 +
 src/backend/nodes/readfuncs.c                 |   2 +
 src/backend/optimizer/path/costsize.c         |   1 +
 src/backend/optimizer/path/indxpath.c         |  78 +++
 src/backend/optimizer/plan/createplan.c       |  20 +-
 src/backend/optimizer/plan/planner.c          |  10 +-
 src/backend/optimizer/util/pathnode.c         |  68 ++
 src/backend/optimizer/util/plancat.c          |   1 +
 src/backend/utils/misc/guc.c                  |   9 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/amapi.h                    |   8 +
 src/include/access/genam.h                    |   2 +
 src/include/access/nbtree.h                   |   7 +
 src/include/access/sdir.h                     |   7 +
 src/include/nodes/execnodes.h                 |   6 +
 src/include/nodes/pathnodes.h                 |   5 +
 src/include/nodes/plannodes.h                 |   4 +
 src/include/optimizer/cost.h                  |   1 +
 src/include/optimizer/pathnode.h              |   3 +
 src/test/regress/expected/select_distinct.out | 621 ++++++++++++++++++
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/sql/select_distinct.sql      | 254 +++++++
 38 files changed, 1886 insertions(+), 14 deletions(-)

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 0104d02f67..a018b7f3d0 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -133,6 +133,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = blbulkdelete;
 	amroutine->amvacuumcleanup = blvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = blcostestimate;
 	amroutine->amoptions = bloptions;
 	amroutine->amproperty = NULL;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e07dc01e80..36ba75b077 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4517,6 +4517,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-enable-indexskipscan" xreflabel="enable_indexskipscan">
+      <term><varname>enable_indexskipscan</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_indexskipscan</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's use of index-skip-scan plan
+        types (see <xref linkend="indexes-index-skip-scans"/>). The default is
+        <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-enable-material" xreflabel="enable_material">
       <term><varname>enable_material</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 37f8d8760a..a726d80878 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -148,6 +148,7 @@ typedef struct IndexAmRoutine
     amendscan_function amendscan;
     ammarkpos_function ammarkpos;       /* can be NULL */
     amrestrpos_function amrestrpos;     /* can be NULL */
+    amskip_function amskip;             /* can be NULL */
 
     /* interface functions to support parallel index scans */
     amestimateparallelscan_function amestimateparallelscan;    /* can be NULL */
@@ -691,6 +692,68 @@ amrestrpos (IndexScanDesc scan);
 
   <para>
 <programlisting>
+bool
+amskip (IndexScanDesc scan,
+        ScanDirection direction,
+        ScanDirection indexdir,
+        bool scanstart,
+        int prefix);
+</programlisting>
+  Skip past all tuples where the first 'prefix' columns have the same value as
+  the last tuple returned in the current scan. The arguments are:
+
+   <variablelist>
+    <varlistentry>
+     <term><parameter>scan</parameter></term>
+     <listitem>
+      <para>
+       Index scan information
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>direction</parameter></term>
+     <listitem>
+      <para>
+       The direction in which data is advancing.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>indexdir</parameter></term>
+     <listitem>
+      <para>
+        The index direction, in which data must be read.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>scanstart</parameter></term>
+     <listitem>
+      <para>
+        Whether or not it is a start of the scan.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><parameter>prefix</parameter></term>
+     <listitem>
+      <para>
+        Distinct prefix size.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+
+  </para>
+
+  <para>
+<programlisting>
 Size
 amestimateparallelscan (void);
 </programlisting>
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index c54bf0dbbd..c429d98fc7 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1254,6 +1254,29 @@ SELECT target FROM tests WHERE subject = 'some-subject' AND success;
    and later will recognize such cases and allow index-only scans to be
    generated, but older versions will not.
   </para>
+
+  <sect2 id="indexes-index-skip-scans">
+    <title>Index Skip Scans</title>
+
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index</primary>
+      <secondary>index-skip scans</secondary>
+    </indexterm>
+    <indexterm zone="indexes-index-skip-scans">
+      <primary>index-skip scan</primary>
+    </indexterm>
+
+    <para>
+     When the rows retrieved from an index scan are then deduplicated by
+     eliminating rows matching on a prefix of index keys (e.g. when using
+     <literal>SELECT DISTINCT</literal>), the planner will consider
+     skipping groups of rows with a matching key prefix. When a row with
+     a particular prefix is found, remaining rows with the same key prefix
+     are skipped.  The larger the number of rows with the same key prefix
+     rows (i.e. the lower the number of distinct key prefixes in the index),
+     the more efficient this is.
+    </para>
+  </sect2>
  </sect1>
 
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2e8f67ef10..4db31bb211 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -113,6 +113,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = brinbulkdelete;
 	amroutine->amvacuumcleanup = brinvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = brincostestimate;
 	amroutine->amoptions = brinoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index a7e55caf28..8dd1d30d2a 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -65,6 +65,7 @@ ginhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = ginbulkdelete;
 	amroutine->amvacuumcleanup = ginvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = gincostestimate;
 	amroutine->amoptions = ginoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index aefc302ed2..8c692f7fb4 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -86,6 +86,7 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = gistbulkdelete;
 	amroutine->amvacuumcleanup = gistvacuumcleanup;
 	amroutine->amcanreturn = gistcanreturn;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = gistcostestimate;
 	amroutine->amoptions = gistoptions;
 	amroutine->amproperty = gistproperty;
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 4871b7ff4d..e5fa4c7864 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -83,6 +83,7 @@ hashhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = hashbulkdelete;
 	amroutine->amvacuumcleanup = hashvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = hashcostestimate;
 	amroutine->amoptions = hashoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 01539b6bd6..1047a35ade 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -33,6 +33,7 @@
  *		index_can_return	- does index support index-only scans?
  *		index_getprocid - get a support procedure OID
  *		index_getprocinfo - get a support procedure's lookup info
+ *		index_skip		- advance past duplicate key values in a scan
  *
  * NOTES
  *		This file contains the index_ routines which used
@@ -730,6 +731,23 @@ index_can_return(Relation indexRelation, int attno)
 	return indexRelation->rd_indam->amcanreturn(indexRelation, attno);
 }
 
+/* ----------------
+ *		index_skip
+ *
+ *		Skip past all tuples where the first 'prefix' columns have the
+ *		same value as the last tuple returned in the current scan.
+ * ----------------
+ */
+bool
+index_skip(IndexScanDesc scan, ScanDirection direction,
+		   ScanDirection indexdir, bool scanstart, int prefix)
+{
+	SCAN_CHECKS;
+
+	return scan->indexRelation->rd_indam->amskip(scan, direction,
+												 indexdir, scanstart, prefix);
+}
+
 /* ----------------
  *		index_getprocid
  *
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 5254bc7ef5..8fde56fe60 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -132,6 +132,7 @@ bthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = btbulkdelete;
 	amroutine->amvacuumcleanup = btvacuumcleanup;
 	amroutine->amcanreturn = btcanreturn;
+	amroutine->amskip = btskip;
 	amroutine->amcostestimate = btcostestimate;
 	amroutine->amoptions = btoptions;
 	amroutine->amproperty = btproperty;
@@ -381,6 +382,8 @@ btbeginscan(Relation rel, int nkeys, int norderbys)
 	 */
 	so->currTuples = so->markTuples = NULL;
 
+	so->skipScanKey = NULL;
+
 	scan->xs_itupdesc = RelationGetDescr(rel);
 
 	scan->opaque = so;
@@ -448,6 +451,16 @@ btrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 	_bt_preprocess_array_keys(scan);
 }
 
+/*
+ * btskip() -- skip to the beginning of the next key prefix
+ */
+bool
+btskip(IndexScanDesc scan, ScanDirection direction,
+	   ScanDirection indexdir, bool start, int prefix)
+{
+	return _bt_skip(scan, direction, indexdir, start, prefix);
+}
+
 /*
  *	btendscan() -- close down a scan
  */
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index c573814f01..e2b549355b 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -37,7 +37,10 @@ static bool _bt_parallel_readpage(IndexScanDesc scan, BlockNumber blkno,
 static Buffer _bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot);
 static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir);
 static inline void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir);
-
+static inline void _bt_update_skip_scankeys(IndexScanDesc scan,
+											Relation indexRel);
+static inline bool _bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+										Buffer buf, ScanDirection dir);
 
 /*
  *	_bt_drop_lock_and_maybe_pin()
@@ -1375,6 +1378,419 @@ _bt_next(IndexScanDesc scan, ScanDirection dir)
 	return true;
 }
 
+/*
+ *  _bt_skip() -- Skip items that have the same prefix as the most recently
+ * 				  fetched index tuple.
+ *
+ * 		The current position is set so that a subsequent call to _bt_next will
+ * 		fetch the first tuple that differs in the leading 'prefix' keys.
+ *
+ * 		There are four different kinds of skipping (depending on dir and
+ * 		indexdir, that are important to distinguish, especially in the presense
+ * 		of an index condition:
+ *
+ * 		* Advancing forward and reading forward
+ * 			simple scan
+ *
+ * 		* Advancing forward and reading backward
+ * 			scan inside a cursor fetching backward, when skipping is necessary
+ * 			right from the start
+ *
+ * 		* Advancing backward and reading forward
+ * 			scan with order by desc inside a cursor fetching forward, when
+ * 			skipping is necessary right from the start
+ *
+ * 		* Advancing backward and reading backward
+ * 			simple scan with order by desc
+ *
+ *      The current page is searched for the next unique value. If none is found
+ *      we will do a scan from the root in order to find the next page with
+ *      a unique value.
+ */
+bool
+_bt_skip(IndexScanDesc scan, ScanDirection dir,
+		 ScanDirection indexdir, bool scanstart, int prefix)
+{
+	BTScanOpaque so = (BTScanOpaque) scan->opaque;
+	BTStack stack;
+	Buffer buf;
+	OffsetNumber offnum;
+	BTScanPosItem *currItem;
+	Relation 	 indexRel = scan->indexRelation;
+
+	/* We want to return tuples, and we need a starting point */
+	Assert(scan->xs_want_itup);
+	Assert(scan->xs_itup);
+
+	if (so->numKilled > 0)
+		_bt_killitems(scan);
+
+	/* If skipScanKey is NULL then we initialize it with _bt_mkscankey */
+	if (so->skipScanKey == NULL)
+	{
+		so->skipScanKey = _bt_mkscankey(indexRel, scan->xs_itup);
+		so->skipScanKey->keysz = prefix;
+		so->skipScanKey->scantid = NULL;
+	}
+	so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+	_bt_update_skip_scankeys(scan, indexRel);
+
+	/* Check if the next unique key can be found within the current page.
+	 * Since we do not lock the current page between jumps, it's possible
+	 * that it was splitted since the last time we saw it. This is fine in
+	 * case of scanning forward, since page split to the right and we are
+	 * still on the left most page. In case of scanning backwards it's
+	 * possible to loose some pages and we need to remember the previous
+	 * page, and then follow the right link from the current page until we
+	 * find the original one.
+	 *
+	 * Since the whole idea of checking the current page is to protect
+	 * ourselves and make more performant statistic mismatch case when
+	 * there are too many distinct values for jumping, it's not clear if
+	 * the complexity of this solution in case of backward scan is
+	 * justified, so for now just avoid it.
+	 */
+	if (BufferIsValid(so->currPos.buf) && ScanDirectionIsForward(dir))
+	{
+		LockBuffer(so->currPos.buf, BT_READ);
+
+		if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
+		{
+			bool keyFound = false;
+
+			offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, so->currPos.buf);
+
+			/* Lock the page for SERIALIZABLE transactions */
+			PredicateLockPage(scan->indexRelation, BufferGetBlockNumber(so->currPos.buf),
+							  scan->xs_snapshot);
+
+			/* We know in which direction to look */
+			_bt_initialize_more_data(so, dir);
+
+			/* Now read the data */
+			keyFound = _bt_readpage(scan, dir, offnum);
+
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			ReleaseBuffer(so->currPos.buf);
+			so->currPos.buf = InvalidBuffer;
+
+			if (keyFound)
+			{
+				/* set IndexTuple */
+				currItem = &so->currPos.items[so->currPos.itemIndex];
+				scan->xs_heaptid = currItem->heapTid;
+				scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+				return true;
+			}
+		}
+		else
+		{
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		}
+	}
+
+	if (BufferIsValid(so->currPos.buf))
+	{
+		ReleaseBuffer(so->currPos.buf);
+		so->currPos.buf = InvalidBuffer;
+	}
+
+	/*
+	 * We haven't found scan key within the current page, so let's scan from
+	 * the root. Use _bt_search and _bt_binsrch to get the buffer and offset
+	 * number
+	 */
+	so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+	stack = _bt_search(scan->indexRelation, so->skipScanKey,
+					   &buf, BT_READ, scan->xs_snapshot);
+	_bt_freestack(stack);
+	so->currPos.buf = buf;
+	offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+
+	/* Lock the page for SERIALIZABLE transactions */
+	PredicateLockPage(scan->indexRelation, BufferGetBlockNumber(buf),
+					  scan->xs_snapshot);
+
+	/* We know in which direction to look */
+	_bt_initialize_more_data(so, dir);
+
+	/*
+	 * Simplest case is when both directions are forward, when we are already
+	 * at the next distinct key at the beginning of the series (so everything
+	 * else would be done in _bt_readpage)
+	 *
+	 * The case when both directions are backwards is also simple, but we need
+	 * to go one step back, since we need a last element from the previous
+	 * series.
+	 */
+	if (ScanDirectionIsBackward(dir) && ScanDirectionIsBackward(indexdir))
+		 offnum = OffsetNumberPrev(offnum);
+
+	/*
+	 * Andvance backward but read forward. At this moment we are at the next
+	 * distinct key at the beginning of the series. In case if scan just
+	 * started, we can read forward without doing anything else. Otherwise
+	 * find previous distinct key and the beginning of it's series and read
+	 * forward from there. To do so, go back one step, perform binary search
+	 * to find the first item in the series and let _bt_readpage do everything
+	 * else.
+	 */
+	else if (ScanDirectionIsBackward(dir) && ScanDirectionIsForward(indexdir))
+	{
+		if (!scanstart)
+		{
+			/* Reading forward means we expect to see more data on the right */
+			so->currPos.moreRight = true;
+
+			offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+
+			/* One step back to find a previous value */
+			_bt_readpage(scan, dir, offnum);
+
+			LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+			if (_bt_next(scan, dir))
+			{
+				LockBuffer(so->currPos.buf, BT_READ);
+				_bt_update_skip_scankeys(scan, indexRel);
+
+				/*
+				 * And now find the last item from the sequence for the
+				 * current, value with the intention do OffsetNumberNext. As a
+				 * result we end up on a first element from the sequence.
+				 */
+				if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
+					offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				else
+				{
+					if (BufferIsValid(so->currPos.buf))
+					{
+						/* Before leaving current page, deal with any killed items */
+						if (so->numKilled > 0)
+							_bt_killitems(scan);
+
+						LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+						ReleaseBuffer(so->currPos.buf);
+						so->currPos.buf = InvalidBuffer;
+					}
+
+					stack = _bt_search(scan->indexRelation, so->skipScanKey,
+									   &buf, BT_READ, scan->xs_snapshot);
+					_bt_freestack(stack);
+					so->currPos.buf = buf;
+					offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				}
+			}
+			else
+			{
+				pfree(so->skipScanKey);
+				so->skipScanKey = NULL;
+				return false;
+			}
+		}
+	}
+
+	/*
+	 * Advance forward but read backward. At this moment we are at the next
+	 * distinct key at the beginning of the series. In case if scan just
+	 * started, we can go one step back and read forward without doing
+	 * anything else. Otherwise find the next distinct key and the beginning
+	 * of it's series, go one step back and read backward from there.
+	 *
+	 * An interesting situation can happen if one of distinct keys do not pass
+	 * a corresponding index condition at all. In this case reading backward
+	 * can lead to a previous distinct key being found, creating a loop. To
+	 * avoid that check the value to be returned, and jump one more time if
+	 * it's the same as at the beginning. Note that we do not check visibility
+	 * here, and dead tuples could also lead to the same situation. This has to
+	 * be checked on the caller side.
+	 */
+	else if (ScanDirectionIsForward(dir) && ScanDirectionIsBackward(indexdir))
+	{
+		if (scanstart)
+			offnum = OffsetNumberPrev(offnum);
+		else
+		{
+			OffsetNumber nextOffset,
+						startOffset,
+						jumpOffset;
+
+			IndexTuple startItup = CopyIndexTuple(scan->xs_itup);
+			Page page = BufferGetPage(so->currPos.buf);
+
+			/* We are at the end and need to return */
+			if ((offnum > PageGetMaxOffsetNumber(page)) &
+				(so->currPos.nextPage == P_NONE))
+			{
+				LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+
+				BTScanPosUnpinIfPinned(so->currPos);
+				BTScanPosInvalidate(so->currPos)
+
+				pfree(so->skipScanKey);
+				so->skipScanKey = NULL;
+				return false;
+			}
+
+			nextOffset = startOffset = ItemPointerGetOffsetNumber(&scan->xs_itup->t_tid);
+
+			/* Reading backwards means we expect to see more data on the left */
+			so->currPos.moreLeft = true;
+
+			while (nextOffset == startOffset)
+			{
+				IndexTuple itup;
+				CHECK_FOR_INTERRUPTS();
+
+				/*
+				 * Find a next index tuple to update scan key. It could be at
+				 * the end, so check for max offset
+				 */
+				if (!_bt_readpage(scan, ForwardScanDirection, offnum))
+				{
+					/*
+					 * There's no actually-matching data on this page.  Try to
+					 * advance to the next page. Return false if there's no
+					 * matching data at all.
+					 */
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+					if (!_bt_steppage(scan, dir))
+					{
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+					LockBuffer(so->currPos.buf, BT_READ);
+				}
+
+				currItem = &so->currPos.items[so->currPos.firstItem];
+				itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+
+				scan->xs_itup = itup;
+				so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+
+				_bt_update_skip_scankeys(scan, indexRel);
+				if (BufferIsValid(so->currPos.buf))
+				{
+					/* Before leaving current page, deal with any killed items */
+					if (so->numKilled > 0)
+						_bt_killitems(scan);
+
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+					ReleaseBuffer(so->currPos.buf);
+					so->currPos.buf = InvalidBuffer;
+				}
+
+				stack = _bt_search(scan->indexRelation, so->skipScanKey,
+								   &buf, BT_READ, scan->xs_snapshot);
+				_bt_freestack(stack);
+				so->currPos.buf = buf;
+				jumpOffset = offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
+				offnum = OffsetNumberPrev(offnum);
+
+				if (!_bt_readpage(scan, indexdir, offnum))
+				{
+					/*
+					 * There's no actually-matching data on this page.  Try to
+					 * advance to the next page. Return false if there's no
+					 * matching data at all.
+					 */
+					LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+					if (!_bt_steppage(scan, indexdir))
+					{
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+					LockBuffer(so->currPos.buf, BT_READ);
+				}
+
+				currItem = &so->currPos.items[so->currPos.lastItem];
+				itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+				nextOffset = ItemPointerGetOffsetNumber(&itup->t_tid);
+
+				/*
+				 * To check if we returned the same tuple, try to find a
+				 * startItup on the current page. For that we need to update
+				 * scankey to match the whole tuple and set nextkey to return
+				 * an exact tuple, not the next one. If the nextOffset is the
+				 * same as before, it means we are in the loop, return offnum
+				 * to the original position and jump further
+				 */
+				scan->xs_itup = startItup;
+				_bt_update_skip_scankeys(scan, indexRel);
+
+				so->skipScanKey->keysz = IndexRelationGetNumberOfKeyAttributes(indexRel);
+				so->skipScanKey->nextkey = false;
+
+				if (_bt_scankey_within_page(scan, so->skipScanKey,
+											so->currPos.buf, dir))
+				{
+					OffsetNumber maxoff;
+					startOffset = _bt_binsrch(scan->indexRelation,
+											  so->skipScanKey,
+											  so->currPos.buf);
+
+					page = BufferGetPage(so->currPos.buf);
+					maxoff = PageGetMaxOffsetNumber(page);
+
+					if (nextOffset <= startOffset)
+					{
+						offnum = jumpOffset;
+						nextOffset = startOffset;
+					}
+
+					if ((offnum > maxoff) & (so->currPos.nextPage == P_NONE))
+					{
+						LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+
+						BTScanPosUnpinIfPinned(so->currPos);
+						BTScanPosInvalidate(so->currPos)
+
+						pfree(so->skipScanKey);
+						so->skipScanKey = NULL;
+						return false;
+					}
+				}
+
+				/* Return original scankey options */
+				so->skipScanKey->keysz = prefix;
+				so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+			}
+		}
+	}
+
+	/* Now read the data */
+	if (!_bt_readpage(scan, indexdir, offnum))
+	{
+		/*
+		 * There's no actually-matching data on this page.  Try to advance to
+		 * the next page.  Return false if there's no matching data at all.
+		 */
+		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		if (!_bt_steppage(scan, dir))
+		{
+			pfree(so->skipScanKey);
+			so->skipScanKey = NULL;
+			return false;
+		}
+	}
+	else
+	{
+		/* Drop the lock, and maybe the pin, on the current page */
+		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+	}
+
+	/* And set IndexTuple */
+	currItem = &so->currPos.items[so->currPos.itemIndex];
+	scan->xs_heaptid = currItem->heapTid;
+	scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+
+	so->currPos.moreLeft = true;
+	so->currPos.moreRight = true;
+
+	return true;
+}
+
 /*
  *	_bt_readpage() -- Load data from current index page into so->currPos
  *
@@ -2246,3 +2662,54 @@ _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir)
 	so->numKilled = 0;			/* just paranoia */
 	so->markItemIndex = -1;		/* ditto */
 }
+
+/*
+ * _bt_update_skip_scankeys() -- set up a new values for the existing scankeys
+ * 								 based on the current index tuple
+ */
+static inline void
+_bt_update_skip_scankeys(IndexScanDesc scan, Relation indexRel)
+{
+	TupleDesc		itupdesc;
+	int			indnkeyatts,
+				i;
+	BTScanOpaque 	so = (BTScanOpaque) scan->opaque;
+	ScanKey			scankeys = so->skipScanKey->scankeys;
+
+	itupdesc = RelationGetDescr(indexRel);
+	indnkeyatts = IndexRelationGetNumberOfKeyAttributes(indexRel);
+	for (i = 0; i < indnkeyatts; i++)
+	{
+		Datum datum;
+		bool null;
+		int flags;
+
+		datum = index_getattr(scan->xs_itup, i + 1, itupdesc, &null);
+		flags = (null ? SK_ISNULL : 0) |
+				(indexRel->rd_indoption[i] << SK_BT_INDOPTION_SHIFT);
+		scankeys[i].sk_flags = flags;
+		scankeys[i].sk_argument = datum;
+	}
+}
+
+/*
+ * _bt_scankey_within_page() -- check if the provided scankey could be found
+ * 								within a page, specified by the buffer.
+ */
+static inline bool
+_bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+						Buffer buf, ScanDirection dir)
+{
+	OffsetNumber low, high;
+	Page page = BufferGetPage(buf);
+	BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+
+	low = P_FIRSTDATAKEY(opaque);
+	high = PageGetMaxOffsetNumber(page);
+
+	if (unlikely(high < low))
+		return false;
+
+	return (_bt_compare(scan->indexRelation, key, page, low) > 0 &&
+			_bt_compare(scan->indexRelation, key, page, high) < 1);
+}
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 4924ae1c59..fa09a4685e 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -68,6 +68,7 @@ spghandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = spgbulkdelete;
 	amroutine->amvacuumcleanup = spgvacuumcleanup;
 	amroutine->amcanreturn = spgcanreturn;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = spgcostestimate;
 	amroutine->amoptions = spgoptions;
 	amroutine->amproperty = spgproperty;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c367c750b1..a7dd874531 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -141,6 +141,7 @@ static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
 static void ExplainIndentText(ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
+static void ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
 
 
@@ -1052,6 +1053,22 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 	return planstate_tree_walker(planstate, ExplainPreScanNode, rels_used);
 }
 
+/*
+ * ExplainIndexSkipScanKeys -
+ *	  Append information about index skip scan to es->str.
+ *
+ * Can be used to print the skip prefix size.
+ */
+static void
+ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es)
+{
+	if (skipPrefixSize > 0)
+	{
+		if (es->format != EXPLAIN_FORMAT_TEXT)
+			ExplainPropertyInteger("Distinct Prefix", NULL, skipPrefixSize, es);
+	}
+}
+
 /*
  * ExplainNode -
  *	  Appends a description of a plan tree to es->str
@@ -1386,6 +1403,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
 
+				ExplainIndexSkipScanKeys(indexscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexscan->indexid,
 										indexscan->indexorderdir,
 										es);
@@ -1396,6 +1415,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			{
 				IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) plan;
 
+				ExplainIndexSkipScanKeys(indexonlyscan->indexskipprefixsize, es);
+
 				ExplainIndexScanDetails(indexonlyscan->indexid,
 										indexonlyscan->indexorderdir,
 										es);
@@ -1655,6 +1676,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	switch (nodeTag(plan))
 	{
 		case T_IndexScan:
+			if (((IndexScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyBool("Skip scan", true, es);
 			show_scan_qual(((IndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexScan *) plan)->indexqualorig)
@@ -1668,6 +1691,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			break;
 		case T_IndexOnlyScan:
+			if (((IndexOnlyScan *) plan)->indexskipprefixsize > 0)
+				ExplainPropertyBool("Skip scan", true, es);
 			show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
 						   "Index Cond", planstate, ancestors, es);
 			if (((IndexOnlyScan *) plan)->indexqual)
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 5617ac29e7..c4e4b087a7 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -41,6 +41,7 @@
 #include "miscadmin.h"
 #include "storage/bufmgr.h"
 #include "storage/predicate.h"
+#include "storage/itemptr.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -62,9 +63,26 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	EState	   *estate;
 	ExprContext *econtext;
 	ScanDirection direction;
+	ScanDirection readDirection;
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
 	ItemPointer tid;
+	ItemPointerData startTid;
+	IndexOnlyScan *indexonlyscan = (IndexOnlyScan *) node->ss.ps.plan;
+
+	/*
+	 * Tells if the current position was reached via skipping. In this case
+	 * there is no nead for the index_getnext_tid
+	 */
+	bool skipped = false;
+
+	/*
+	 * Index only scan must be aware that in case of skipping we can return to
+	 * the starting point due to visibility checks. In this situation we need
+	 * to jump further, and number of skipping attempts tell us how far do we
+	 * need to do so.
+	 */
+	int skipAttempts = 0;
 
 	/*
 	 * extract necessary information from index scan node
@@ -72,7 +90,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexOnlyScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexonlyscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -114,16 +132,87 @@ IndexOnlyNext(IndexOnlyScanState *node)
 						 node->ioss_OrderByKeys,
 						 node->ioss_NumOrderByKeys);
 	}
+	else
+	{
+		ItemPointerCopy(&scandesc->xs_heaptid, &startTid);
+	}
+
+	/*
+	 * Check if we need to skip to the next key prefix, because we've been
+	 * asked to implement DISTINCT.
+	 *
+	 * When fetching a cursor in the direction opposite to a general scan
+	 * direction, the result must be what normal fetching should have
+	 * returned, but in reversed order. In other words, return the last or
+	 * first scanned tuple in a DISTINCT set, depending on a cursor direction.
+	 * Due to that we skip also when the first tuple wasn't emitted yet, but
+	 * the directions are opposite.
+	 */
+	if (node->ioss_SkipPrefixSize > 0 &&
+		(node->ioss_FirstTupleEmitted ||
+		 ScanDirectionsAreOpposite(direction, indexonlyscan->indexorderdir)))
+	{
+		if (!index_skip(scandesc, direction, indexonlyscan->indexorderdir,
+						!node->ioss_FirstTupleEmitted, node->ioss_SkipPrefixSize))
+		{
+			/*
+			 * Reached end of index. At this point currPos is invalidated, and
+			 * we need to reset ioss_FirstTupleEmitted, since otherwise after
+			 * going backwards, reaching the end of index, and going forward
+			 * again we apply skip again. It would be incorrect and lead to an
+			 * extra skipped item.
+			 */
+			node->ioss_FirstTupleEmitted = false;
+			return ExecClearTuple(slot);
+		}
+		else
+		{
+			skipAttempts = 1;
+			skipped = true;
+			tid = &scandesc->xs_heaptid;
+		}
+	}
+
+	readDirection = skipped ? indexonlyscan->indexorderdir : direction;
 
 	/*
 	 * OK, now that we have what we need, fetch the next tuple.
 	 */
-	while ((tid = index_getnext_tid(scandesc, direction)) != NULL)
+	while (skipped || (tid = index_getnext_tid(scandesc, readDirection)) != NULL)
 	{
 		bool		tuple_from_heap = false;
 
 		CHECK_FOR_INTERRUPTS();
 
+		/*
+		 * While doing index only skip scan with advancing and reading in
+		 * different directions we can return to the same position where we
+		 * started after visibility check. Recognize such situations and skip
+		 * more.
+		 */
+		if ((readDirection != direction) &&
+			ItemPointerIsValid(&startTid) && ItemPointerEquals(&startTid, tid))
+		{
+			int i;
+			skipAttempts += 1;
+
+			for (i = 0; i < skipAttempts; i++)
+			{
+				if (!index_skip(scandesc, direction,
+								indexonlyscan->indexorderdir,
+								!node->ioss_FirstTupleEmitted,
+								node->ioss_SkipPrefixSize))
+				{
+					node->ioss_FirstTupleEmitted = false;
+					return ExecClearTuple(slot);
+				}
+			}
+
+			tid = &scandesc->xs_heaptid;
+		}
+
+		skipped = false;
+
 		/*
 		 * We can skip the heap fetch if the TID references a heap page on
 		 * which all tuples are known visible to everybody.  In any case,
@@ -250,6 +339,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 							  ItemPointerGetBlockNumber(tid),
 							  estate->es_snapshot);
 
+		node->ioss_FirstTupleEmitted = true;
+
 		return slot;
 	}
 
@@ -504,6 +595,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
 	indexstate->ss.ps.ExecProcNode = ExecIndexOnlyScan;
+	indexstate->ioss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->ioss_FirstTupleEmitted = false;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index d0a96a38e0..449aaec3ac 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -85,6 +85,13 @@ IndexNext(IndexScanState *node)
 	ScanDirection direction;
 	IndexScanDesc scandesc;
 	TupleTableSlot *slot;
+	IndexScan *indexscan = (IndexScan *) node->ss.ps.plan;
+
+	/*
+	 * tells if the current position was reached via skipping. In this case
+	 * there is no nead for the index_getnext_tid
+	 */
+	bool skipped = false;
 
 	/*
 	 * extract necessary information from index scan node
@@ -92,7 +99,7 @@ IndexNext(IndexScanState *node)
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	/* flip direction if this is an overall backward scan */
-	if (ScanDirectionIsBackward(((IndexScan *) node->ss.ps.plan)->indexorderdir))
+	if (ScanDirectionIsBackward(indexscan->indexorderdir))
 	{
 		if (ScanDirectionIsForward(direction))
 			direction = BackwardScanDirection;
@@ -117,6 +124,12 @@ IndexNext(IndexScanState *node)
 
 		node->iss_ScanDesc = scandesc;
 
+		/* Index skip scan assumes xs_want_itup, so set it to true */
+		if (indexscan->indexskipprefixsize > 0)
+			node->iss_ScanDesc->xs_want_itup = true;
+		else
+			node->iss_ScanDesc->xs_want_itup = false;
+
 		/*
 		 * If no run-time keys to calculate or they are ready, go ahead and
 		 * pass the scankeys to the index AM.
@@ -127,12 +140,48 @@ IndexNext(IndexScanState *node)
 						 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
 	}
 
+	/*
+	 * Check if we need to skip to the next key prefix, because we've been
+	 * asked to implement DISTINCT.
+	 *
+	 * When fetching a cursor in the direction opposite to a general scan
+	 * direction, the result must be what normal fetching should have
+	 * returned, but in reversed order. In other words, return the last or
+	 * first scanned tuple in a DISTINCT set, depending on a cursor direction.
+	 * Due to that we skip also when the first tuple wasn't emitted yet, but
+	 * the directions are opposite.
+	 */
+	if (node->iss_SkipPrefixSize > 0 &&
+		(node->iss_FirstTupleEmitted ||
+		 ScanDirectionsAreOpposite(direction, indexscan->indexorderdir)))
+	{
+		if (!index_skip(scandesc, direction, indexscan->indexorderdir,
+					   !node->iss_FirstTupleEmitted, node->iss_SkipPrefixSize))
+		{
+			/*
+			 * Reached end of index. At this point currPos is invalidated, and
+			 * we need to reset iss_FirstTupleEmitted, since otherwise after
+			 * going backwards, reaching the end of index, and going forward
+			 * again we apply skip again. It would be incorrect and lead to an
+			 * extra skipped item.
+			 */
+			node->iss_FirstTupleEmitted = false;
+			return ExecClearTuple(slot);
+		}
+		else
+		{
+			skipped = true;
+			index_fetch_heap(scandesc, slot);
+		}
+	}
+
 	/*
 	 * ok, now that we have what we need, fetch the next tuple.
 	 */
-	while (index_getnext_slot(scandesc, direction, slot))
+	while (skipped || index_getnext_slot(scandesc, direction, slot))
 	{
 		CHECK_FOR_INTERRUPTS();
+		skipped = false;
 
 		/*
 		 * If the index was lossy, we have to recheck the index quals using
@@ -149,6 +198,7 @@ IndexNext(IndexScanState *node)
 			}
 		}
 
+		node->iss_FirstTupleEmitted = true;
 		return slot;
 	}
 
@@ -910,6 +960,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	indexstate->ss.ps.plan = (Plan *) node;
 	indexstate->ss.ps.state = estate;
 	indexstate->ss.ps.ExecProcNode = ExecIndexScan;
+	indexstate->iss_SkipPrefixSize = node->indexskipprefixsize;
+	indexstate->iss_FirstTupleEmitted = false;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 54ad62bb7f..e0cfd710c4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -493,6 +493,7 @@ _copyIndexScan(const IndexScan *from)
 	COPY_NODE_FIELD(indexorderbyorig);
 	COPY_NODE_FIELD(indexorderbyops);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
 
 	return newnode;
 }
@@ -518,6 +519,7 @@ _copyIndexOnlyScan(const IndexOnlyScan *from)
 	COPY_NODE_FIELD(indexorderby);
 	COPY_NODE_FIELD(indextlist);
 	COPY_SCALAR_FIELD(indexorderdir);
+	COPY_SCALAR_FIELD(indexskipprefixsize);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 16083e7a7e..5f723cda4b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -562,6 +562,7 @@ _outIndexScan(StringInfo str, const IndexScan *node)
 	WRITE_NODE_FIELD(indexorderbyorig);
 	WRITE_NODE_FIELD(indexorderbyops);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
 }
 
 static void
@@ -576,6 +577,7 @@ _outIndexOnlyScan(StringInfo str, const IndexOnlyScan *node)
 	WRITE_NODE_FIELD(indexorderby);
 	WRITE_NODE_FIELD(indextlist);
 	WRITE_ENUM_FIELD(indexorderdir, ScanDirection);
+	WRITE_INT_FIELD(indexskipprefixsize);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 551ce6c41c..028d03a56d 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1820,6 +1820,7 @@ _readIndexScan(void)
 	READ_NODE_FIELD(indexorderbyorig);
 	READ_NODE_FIELD(indexorderbyops);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
 
 	READ_DONE();
 }
@@ -1839,6 +1840,7 @@ _readIndexOnlyScan(void)
 	READ_NODE_FIELD(indexorderby);
 	READ_NODE_FIELD(indextlist);
 	READ_ENUM_FIELD(indexorderdir, ScanDirection);
+	READ_INT_FIELD(indexskipprefixsize);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b5a0033721..710edf160a 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -124,6 +124,7 @@ int			max_parallel_workers_per_gather = 2;
 bool		enable_seqscan = true;
 bool		enable_indexscan = true;
 bool		enable_indexonlyscan = true;
+bool		enable_indexskipscan = true;
 bool		enable_bitmapscan = true;
 bool		enable_tidscan = true;
 bool		enable_sort = true;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 363f5349f1..196b132568 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -791,6 +791,16 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexPath  *ipath = (IndexPath *) lfirst(lc);
 
+		/*
+		 * To prevent unique paths from index skip scans being potentially used
+		 * when not needed scan keep them in a separate pathlist.
+		*/
+		if (ipath->indexskipprefix != 0)
+		{
+			add_unique_path(rel, (Path *) ipath);
+			continue;
+		}
+
 		if (index->amhasgettuple)
 			add_path(rel, (Path *) ipath);
 
@@ -880,6 +890,8 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	bool		pathkeys_possibly_useful;
 	bool		index_is_ordered;
 	bool		index_only_scan;
+	bool		not_empty_qual = false;
+	bool		can_skip;
 	int			indexcol;
 
 	/*
@@ -1029,6 +1041,60 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	index_only_scan = (scantype != ST_BITMAPSCAN &&
 					   check_index_only(rel, index));
 
+	/* Check if an index skip scan is possible. */
+	can_skip = enable_indexskipscan & index->amcanskip;
+
+	/*
+	 * Skip scan is not supported when there are qual conditions, which are not
+	 * covered by index. The reason for that is that those conditions are
+	 * evaluated later, already after skipping was applied.
+	 *
+	 * TODO: This implementation is too restrictive, and doesn't allow e.g.
+	 * index expressions. For that we need to examine index_clauses too.
+	 */
+	if (root->parse->jointree != NULL)
+	{
+		ListCell *lc;
+
+		foreach(lc, (List *)root->parse->jointree->quals)
+		{
+			Node *expr, *qual = (Node *) lfirst(lc);
+			Var *var;
+			bool found = false;
+
+			if (!is_opclause(qual))
+			{
+				not_empty_qual = true;
+				break;
+			}
+
+			expr = get_leftop(qual);
+
+			if (!IsA(expr, Var))
+			{
+				not_empty_qual = true;
+				break;
+			}
+
+			var = (Var *) expr;
+
+			for (int i = 0; i < index->ncolumns; i++)
+			{
+				if (index->indexkeys[i] == var->varattno)
+				{
+					found = true;
+					break;
+				}
+			}
+
+			if (!found)
+			{
+				not_empty_qual = true;
+				break;
+			}
+		}
+	}
+
 	/*
 	 * 4. Generate an indexscan path if there are relevant restriction clauses
 	 * in the current clauses, OR the index ordering is potentially useful for
@@ -1056,6 +1122,12 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 								  false);
 		result = lappend(result, ipath);
 
+		/* Consider index skip scan as well */
+		if (useful_uniquekeys != NULL && can_skip && !not_empty_qual)
+			result = lappend(result,
+							 create_skipscan_unique_path(root, index,
+								 						 (Path *) ipath));
+
 		/*
 		 * If appropriate, consider parallel index scan.  We don't allow
 		 * parallel index scan for bitmap index scans.
@@ -1116,6 +1188,12 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 									  false);
 			result = lappend(result, ipath);
 
+			/* Consider index skip scan as well */
+			if (useful_uniquekeys != NULL && can_skip && !not_empty_qual)
+				result = lappend(result,
+								 create_skipscan_unique_path(root, index,
+															 (Path *) ipath));
+
 			/* If appropriate, consider parallel index scan */
 			if (index->amcanparallel &&
 				rel->consider_parallel && outer_relids == NULL &&
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index dff826a828..7b32f2cc7e 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -175,12 +175,14 @@ static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
 								 Oid indexid, List *indexqual, List *indexqualorig,
 								 List *indexorderby, List *indexorderbyorig,
 								 List *indexorderbyops,
-								 ScanDirection indexscandir);
+								 ScanDirection indexscandir,
+								 int skipprefix);
 static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 Index scanrelid, Oid indexid,
 										 List *indexqual, List *indexorderby,
 										 List *indextlist,
-										 ScanDirection indexscandir);
+										 ScanDirection indexscandir,
+										 int skipprefix);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
 											  List *indexqualorig);
@@ -2910,7 +2912,8 @@ create_indexscan_plan(PlannerInfo *root,
 												fixed_indexquals,
 												fixed_indexorderbys,
 												best_path->indexinfo->indextlist,
-												best_path->indexscandir);
+												best_path->indexscandir,
+												best_path->indexskipprefix);
 	else
 		scan_plan = (Scan *) make_indexscan(tlist,
 											qpqual,
@@ -2921,7 +2924,8 @@ create_indexscan_plan(PlannerInfo *root,
 											fixed_indexorderbys,
 											indexorderbys,
 											indexorderbyops,
-											best_path->indexscandir);
+											best_path->indexscandir,
+											best_path->indexskipprefix);
 
 	copy_generic_path_info(&scan_plan->plan, &best_path->path);
 
@@ -5184,7 +5188,8 @@ make_indexscan(List *qptlist,
 			   List *indexorderby,
 			   List *indexorderbyorig,
 			   List *indexorderbyops,
-			   ScanDirection indexscandir)
+			   ScanDirection indexscandir,
+			   int skipPrefixSize)
 {
 	IndexScan  *node = makeNode(IndexScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5201,6 +5206,7 @@ make_indexscan(List *qptlist,
 	node->indexorderbyorig = indexorderbyorig;
 	node->indexorderbyops = indexorderbyops;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
 
 	return node;
 }
@@ -5213,7 +5219,8 @@ make_indexonlyscan(List *qptlist,
 				   List *indexqual,
 				   List *indexorderby,
 				   List *indextlist,
-				   ScanDirection indexscandir)
+				   ScanDirection indexscandir,
+				   int skipPrefixSize)
 {
 	IndexOnlyScan *node = makeNode(IndexOnlyScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5228,6 +5235,7 @@ make_indexonlyscan(List *qptlist,
 	node->indexorderby = indexorderby;
 	node->indextlist = indextlist;
 	node->indexorderdir = indexscandir;
+	node->indexskipprefixsize = skipPrefixSize;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a7de8476d9..88305df5c3 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4828,13 +4828,19 @@ create_distinct_paths(PlannerInfo *root,
 			Path	   *path = (Path *) lfirst(lc);
 
 			if (pathkeys_contained_in(needed_pathkeys, path->pathkeys))
-			{
 				add_path(distinct_rel, (Path *)
 						 create_upper_unique_path(root, distinct_rel,
 												  path,
 												  list_length(root->distinct_pathkeys),
 												  numDistinctRows));
-			}
+		}
+
+		foreach(lc, input_rel->unique_pathlist)
+		{
+			Path	   *path = (Path *) lfirst(lc);
+
+			if (uniquekeys_contained_in(needed_pathkeys, path->uniquekeys))
+				add_path(distinct_rel, path);
 		}
 
 		/* For explicit-sort case, always use the more rigorous clause */
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index a4dfafbb59..b0ce17b0d6 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2564,6 +2564,7 @@ create_projection_path(PlannerInfo *root,
 	pathnode->path.pathkeys = subpath->pathkeys;
 
 	pathnode->subpath = subpath;
+	pathnode->path.uniquekeys = subpath->uniquekeys;
 
 	/*
 	 * We might not need a separate Result node.  If the input plan node type
@@ -2929,6 +2930,73 @@ create_upper_unique_path(PlannerInfo *root,
 	return pathnode;
 }
 
+/*
+ * create_skipscan_unique_path
+ *	  Creates a pathnode the same as an existing IndexPath except based on
+ *	  skipping duplicate values.  This may or may not be cheaper than using
+ *	  create_upper_unique_path.
+ *
+ * The input path must be an IndexPath for an index that supports amskip.
+ */
+IndexPath *
+create_skipscan_unique_path(PlannerInfo *root, IndexOptInfo *index,
+							Path *basepath)
+{
+	IndexPath 	*pathnode = makeNode(IndexPath);
+	int 		numDistinctRows;
+	int 		distinctPrefixKeys;
+	ListCell 	*lc;
+	List 	   	*exprs = NIL;
+
+
+	distinctPrefixKeys = list_length(root->query_uniquekeys);
+
+	Assert(IsA(basepath, IndexPath));
+
+	/* We don't want to modify basepath, so make a copy. */
+	memcpy(pathnode, basepath, sizeof(IndexPath));
+
+	/*
+	 * Normally we can think about distinctPrefixKeys as just
+	 * a number of distinct keys. But if lets say we have a
+	 * distinct key a, and the index contains b, a in exactly
+	 * this order. In such situation we need to use position
+	 * of a in the index as distinctPrefixKeys, otherwise skip
+	 * will happen only by the first column.
+	 */
+	foreach(lc, root->query_uniquekeys)
+	{
+		UniqueKey *uniquekey = (UniqueKey *) lfirst(lc);
+		EquivalenceMember *em =
+			lfirst_node(EquivalenceMember,
+						list_head(uniquekey->eq_clause->ec_members));
+		Var *var = (Var *) em->em_expr;
+
+		exprs = lappend(exprs, em->em_expr);
+
+		for (int i = 0; i < index->ncolumns; i++)
+		{
+			if (index->indexkeys[i] == var->varattno)
+			{
+				distinctPrefixKeys = Max(i + 1, distinctPrefixKeys);
+				break;
+			}
+		}
+	}
+
+	Assert(distinctPrefixKeys > 0);
+	pathnode->indexskipprefix = distinctPrefixKeys;
+
+	numDistinctRows = estimate_num_groups(root, exprs,
+										  pathnode->path.rows,
+										  NULL);
+
+	pathnode->path.total_cost = pathnode->path.startup_cost * numDistinctRows;
+	pathnode->path.rows = numDistinctRows;
+
+	return pathnode;
+}
+
 /*
  * create_agg_path
  *	  Creates a pathnode that represents performing aggregation/grouping
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d82fc5ab8b..f65b299f37 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -271,6 +271,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			info->amoptionalkey = amroutine->amoptionalkey;
 			info->amsearcharray = amroutine->amsearcharray;
 			info->amsearchnulls = amroutine->amsearchnulls;
+			info->amcanskip = (amroutine->amskip != NULL);
 			info->amcanparallel = amroutine->amcanparallel;
 			info->amhasgettuple = (amroutine->amgettuple != NULL);
 			info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index cacbe904db..7c71ee4499 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -923,6 +923,15 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_indexskipscan", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of index-skip-scan plans."),
+			NULL
+		},
+		&enable_indexskipscan,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_bitmapscan", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of bitmap-scan plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e1048c0047..a002ee2143 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -353,6 +353,7 @@
 #enable_hashjoin = on
 #enable_indexscan = on
 #enable_indexonlyscan = on
+#enable_indexskipscan = on
 #enable_material = on
 #enable_mergejoin = on
 #enable_nestloop = on
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 3b3e22f73d..3d39cd9d07 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -130,6 +130,13 @@ typedef void (*amrescan_function) (IndexScanDesc scan,
 typedef bool (*amgettuple_function) (IndexScanDesc scan,
 									 ScanDirection direction);
 
+/* skip past duplicates in a given prefix */
+typedef bool (*amskip_function) (IndexScanDesc scan,
+								 ScanDirection dir,
+								 ScanDirection indexdir,
+								 bool start,
+								 int prefix);
+
 /* fetch all valid tuples */
 typedef int64 (*amgetbitmap_function) (IndexScanDesc scan,
 									   TIDBitmap *tbm);
@@ -229,6 +236,7 @@ typedef struct IndexAmRoutine
 	amendscan_function amendscan;
 	ammarkpos_function ammarkpos;	/* can be NULL */
 	amrestrpos_function amrestrpos; /* can be NULL */
+	amskip_function amskip;				/* can be NULL */
 
 	/* interface functions to support parallel index scans */
 	amestimateparallelscan_function amestimateparallelscan; /* can be NULL */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 7e9364a50c..815de4e4dd 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,8 @@ extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
 extern IndexBulkDeleteResult *index_vacuum_cleanup(IndexVacuumInfo *info,
 												   IndexBulkDeleteResult *stats);
 extern bool index_can_return(Relation indexRelation, int attno);
+extern bool index_skip(IndexScanDesc scan, ScanDirection direction,
+					   ScanDirection indexdir, bool start, int prefix);
 extern RegProcedure index_getprocid(Relation irel, AttrNumber attnum,
 									uint16 procnum);
 extern FmgrInfo *index_getprocinfo(Relation irel, AttrNumber attnum,
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 20ace69dab..e098c6a1ab 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -662,6 +662,9 @@ typedef struct BTScanOpaqueData
 	 */
 	int			markItemIndex;	/* itemIndex, or -1 if not valid */
 
+	/* Work space for _bt_skip */
+	BTScanInsert	skipScanKey;	/* used to control skipping */
+
 	/* keep these last in struct for efficiency */
 	BTScanPosData currPos;		/* current position data */
 	BTScanPosData markPos;		/* marked position, if any */
@@ -793,6 +796,8 @@ extern OffsetNumber _bt_binsrch_insert(Relation rel, BTInsertState insertstate);
 extern int32 _bt_compare(Relation rel, BTScanInsert key, Page page, OffsetNumber offnum);
 extern bool _bt_first(IndexScanDesc scan, ScanDirection dir);
 extern bool _bt_next(IndexScanDesc scan, ScanDirection dir);
+extern bool _bt_skip(IndexScanDesc scan, ScanDirection dir,
+					 ScanDirection indexdir, bool start, int prefix);
 extern Buffer _bt_get_endpoint(Relation rel, uint32 level, bool rightmost,
 							   Snapshot snapshot);
 
@@ -817,6 +822,8 @@ extern void _bt_end_vacuum_callback(int code, Datum arg);
 extern Size BTreeShmemSize(void);
 extern void BTreeShmemInit(void);
 extern bytea *btoptions(Datum reloptions, bool validate);
+extern bool btskip(IndexScanDesc scan, ScanDirection dir,
+				   ScanDirection indexdir, bool start, int prefix);
 extern bool btproperty(Oid index_oid, int attno,
 					   IndexAMProperty prop, const char *propname,
 					   bool *res, bool *isnull);
diff --git a/src/include/access/sdir.h b/src/include/access/sdir.h
index 23feb90986..094a127464 100644
--- a/src/include/access/sdir.h
+++ b/src/include/access/sdir.h
@@ -55,4 +55,11 @@ typedef enum ScanDirection
 #define ScanDirectionIsForward(direction) \
 	((bool) ((direction) == ForwardScanDirection))
 
+/*
+ * ScanDirectionsAreOpposite
+ *		True iff scan directions are backward/forward or forward/backward.
+ */
+#define ScanDirectionsAreOpposite(dirA, dirB) \
+	((bool) (dirA != NoMovementScanDirection && dirA == -dirB))
+
 #endif							/* SDIR_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 1f6f5bbc20..2c6acc160a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1423,6 +1423,8 @@ typedef struct IndexScanState
 	ExprContext *iss_RuntimeContext;
 	Relation	iss_RelationDesc;
 	struct IndexScanDescData *iss_ScanDesc;
+	int         iss_SkipPrefixSize;
+	bool		iss_FirstTupleEmitted;
 
 	/* These are needed for re-checking ORDER BY expr ordering */
 	pairingheap *iss_ReorderQueue;
@@ -1452,6 +1454,8 @@ typedef struct IndexScanState
  *		TableSlot		   slot for holding tuples fetched from the table
  *		VMBuffer		   buffer in use for visibility map testing, if any
  *		PscanLen		   size of parallel index-only scan descriptor
+ *		SkipPrefixSize	   number of keys for skip-based DISTINCT
+ *		FirstTupleEmitted  has the first tuple been emitted
  * ----------------
  */
 typedef struct IndexOnlyScanState
@@ -1470,6 +1474,8 @@ typedef struct IndexOnlyScanState
 	struct IndexScanDescData *ioss_ScanDesc;
 	TupleTableSlot *ioss_TableSlot;
 	Buffer		ioss_VMBuffer;
+	int         ioss_SkipPrefixSize;
+	bool		ioss_FirstTupleEmitted;
 	Size		ioss_PscanLen;
 } IndexOnlyScanState;
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0de27f0ef3..ce00060ee0 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -840,6 +840,7 @@ struct IndexOptInfo
 	bool		amsearchnulls;	/* can AM search for NULL/NOT NULL entries? */
 	bool		amhasgettuple;	/* does AM have amgettuple interface? */
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
+	bool		amcanskip;		/* can AM skip duplicate values? */
 	bool		amcanparallel;	/* does AM support parallel scan? */
 	/* Rather than include amapi.h here, we declare amcostestimate like this */
 	void		(*amcostestimate) ();	/* AM's cost estimator */
@@ -1190,6 +1191,9 @@ typedef struct Path
  * we need not recompute them when considering using the same index in a
  * bitmap index/heap scan (see BitmapHeapPath).  The costs of the IndexPath
  * itself represent the costs of an IndexScan or IndexOnlyScan plan type.
+ *
+ * 'indexskipprefix' represents the number of columns to consider for skip
+ * scans.
  *----------
  */
 typedef struct IndexPath
@@ -1202,6 +1206,7 @@ typedef struct IndexPath
 	ScanDirection indexscandir;
 	Cost		indextotalcost;
 	Selectivity indexselectivity;
+	int			indexskipprefix;
 } IndexPath;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 32c0d87f80..03a00e8e1d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -409,6 +409,8 @@ typedef struct IndexScan
 	List	   *indexorderbyorig;	/* the same in original form */
 	List	   *indexorderbyops;	/* OIDs of sort ops for ORDER BY exprs */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for distinct
+										 * scans */
 } IndexScan;
 
 /* ----------------
@@ -436,6 +438,8 @@ typedef struct IndexOnlyScan
 	List	   *indexorderby;	/* list of index ORDER BY exprs */
 	List	   *indextlist;		/* TargetEntry list describing index's cols */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
+	int			indexskipprefixsize;	/* the size of the prefix for distinct
+										 * scans */
 } IndexOnlyScan;
 
 /* ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index cb012ba198..847f34f02b 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -50,6 +50,7 @@ extern PGDLLIMPORT int max_parallel_workers_per_gather;
 extern PGDLLIMPORT bool enable_seqscan;
 extern PGDLLIMPORT bool enable_indexscan;
 extern PGDLLIMPORT bool enable_indexonlyscan;
+extern PGDLLIMPORT bool enable_indexskipscan;
 extern PGDLLIMPORT bool enable_bitmapscan;
 extern PGDLLIMPORT bool enable_tidscan;
 extern PGDLLIMPORT bool enable_sort;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index fd25997af5..ba3eaffd8a 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -202,6 +202,9 @@ extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
 												 Path *subpath,
 												 int numCols,
 												 double numGroups);
+extern IndexPath *create_skipscan_unique_path(PlannerInfo *root,
+											  IndexOptInfo *index,
+											  Path *subpath);
 extern AggPath *create_agg_path(PlannerInfo *root,
 								RelOptInfo *rel,
 								Path *subpath,
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index f3696c6d1d..c50c6d1866 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -244,3 +244,624 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+CREATE INDEX ON distinct_a ((a + 1));
+ANALYZE distinct_a;
+SELECT DISTINCT a FROM distinct_a;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+ a 
+---
+ 1
+(1 row)
+
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Index Only Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: true
+(2 rows)
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+-- test index skip scan for expressions
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT (a + 1) FROM distinct_a;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Scan using distinct_a_expr_idx on distinct_a
+   Skip scan: true
+(2 rows)
+
+SELECT DISTINCT (a + 1) FROM distinct_a;
+ ?column? 
+----------
+        2
+        3
+        4
+        5
+        6
+(5 rows)
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+ a | b 
+---+---
+ 1 | 2
+ 2 | 2
+ 3 | 2
+ 4 | 2
+ 5 | 2
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: true
+   Index Cond: (b = 2)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Index Only Scan using distinct_a_b_a on distinct_a
+   Skip scan: true
+   Index Cond: (b = 2)
+(3 rows)
+
+DROP INDEX distinct_a_b_a;
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+FETCH FROM c;
+ a | b 
+---+---
+ 1 | 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a | b 
+---+---
+ 5 | 1
+ 4 | 1
+ 3 | 1
+ 2 | 1
+ 1 | 1
+(5 rows)
+
+END;
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+FETCH FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a | b 
+---+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+FETCH 6 FROM c;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a |   b   
+---+-------
+ 1 | 10000
+ 2 | 10000
+ 3 | 10000
+ 4 | 10000
+ 5 | 10000
+(5 rows)
+
+END;
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Index Only Scan using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: true
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 1 | 2
+ 3 | 1 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 1 | 2
+ 1 | 1 | 2
+(2 rows)
+
+END;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Index Only Scan Backward using distinct_abc_a_b_c_idx on distinct_abc
+   Skip scan: true
+   Index Cond: (c = 2)
+(3 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+FETCH ALL FROM c;
+ a | b | c 
+---+---+---
+ 3 | 2 | 2
+ 1 | 2 | 2
+(2 rows)
+
+FETCH BACKWARD ALL FROM c;
+ a | b | c 
+---+---+---
+ 1 | 2 | 2
+ 3 | 2 | 2
+(2 rows)
+
+END;
+DROP TABLE distinct_abc;
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+ 2 | 1 | 10
+ 3 | 1 | 10
+ 4 | 1 | 10
+ 5 | 1 | 10
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+ a | b | c  
+---+---+----
+ 1 | 1 | 10
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Index Scan using distinct_a_a_b_idx on distinct_a
+   Skip scan: true
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Unique
+   ->  Bitmap Heap Scan on distinct_a
+         Recheck Cond: (a = 1)
+         ->  Bitmap Index Scan on distinct_a_a_b_idx
+               Index Cond: (a = 1)
+(5 rows)
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Unique
+   ->  Index Scan using distinct_a_a_b_idx on distinct_a
+         Index Cond: (b = 2)
+         Filter: (c = 10)
+(4 rows)
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+ a | a 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+ 4 | 4
+ 5 | 5
+(5 rows)
+
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+ a | ?column? 
+---+----------
+ 1 |        1
+ 2 |        1
+ 3 |        1
+ 4 |        1
+ 5 |        1
+(5 rows)
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+FETCH FROM c;
+ a 
+---
+ 1
+(1 row)
+
+FETCH BACKWARD FROM c;
+ a 
+---
+(0 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+FETCH 6 FROM c;
+ a 
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+FETCH BACKWARD 6 FROM c;
+ a 
+---
+ 5
+ 4
+ 3
+ 2
+ 1
+(5 rows)
+
+END;
+DROP TABLE distinct_a;
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 1
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+ a | b 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 1
+ 4 | 1
+ 5 | 1
+(5 rows)
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 | 10000
+ 1 | 10000
+(5 rows)
+
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+ a |   b   
+---+-------
+ 5 | 10000
+ 4 | 10000
+ 3 | 10000
+ 2 |  9999
+ 1 | 10000
+(5 rows)
+
+DROP TABLE distinct_visibility;
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Index Only Scan using distinct_boundaries_a_b_c_idx on distinct_boundaries
+   Skip scan: true
+   Index Cond: ((b >= 1) AND (c = 0))
+(3 rows)
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+ a | b | c 
+---+---+---
+ 1 | 2 | 0
+ 2 | 2 | 0
+ 3 | 2 | 0
+ 4 | 2 | 0
+ 5 | 2 | 0
+(5 rows)
+
+DROP TABLE distinct_boundaries;
+-- test tuple killing
+-- DESC ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed where a = 3;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 5 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 1 | 1000 | 0 | 10
+(4 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 1 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 5 | 1000 | 0 | 10
+(4 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
+-- regular ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed where a = 3;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a, b;
+    FETCH FORWARD ALL FROM c;
+ a | b | c | d  
+---+---+---+----
+ 1 | 1 | 1 | 10
+ 2 | 1 | 1 | 10
+ 4 | 1 | 1 | 10
+ 5 | 1 | 1 | 10
+(4 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a | b | c | d  
+---+---+---+----
+ 5 | 1 | 1 | 10
+ 4 | 1 | 1 | 10
+ 2 | 1 | 1 | 10
+ 1 | 1 | 1 | 10
+(4 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
+-- partial delete
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+CREATE INDEX ON distinct_killed (a, b, c, d);
+DELETE FROM distinct_killed WHERE a = 3 AND b <= 999;
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 5 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 3 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 1 | 1000 | 0 | 10
+(5 rows)
+
+    FETCH BACKWARD ALL FROM c;
+ a |  b   | c | d  
+---+------+---+----
+ 1 | 1000 | 0 | 10
+ 2 | 1000 | 0 | 10
+ 3 | 1000 | 0 | 10
+ 4 | 1000 | 0 | 10
+ 5 | 1000 | 0 | 10
+(5 rows)
+
+COMMIT;
+DROP TABLE distinct_killed;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index a1c90eb905..bd3b373515 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -78,6 +78,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashjoin                | on
  enable_indexonlyscan           | on
  enable_indexscan               | on
+ enable_indexskipscan           | on
  enable_material                | on
  enable_mergejoin               | on
  enable_nestloop                | on
@@ -89,7 +90,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(17 rows)
+(18 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index a605e86449..3441a0efc6 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -73,3 +73,257 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+-- index only skip scan
+CREATE TABLE distinct_a (a int, b int, c int);
+INSERT INTO distinct_a (
+    SELECT five, tenthous, 10 FROM
+    generate_series(1, 5) five,
+    generate_series(1, 10000) tenthous
+);
+CREATE INDEX ON distinct_a (a, b);
+CREATE INDEX ON distinct_a ((a + 1));
+ANALYZE distinct_a;
+
+SELECT DISTINCT a FROM distinct_a;
+SELECT DISTINCT a FROM distinct_a WHERE a = 1;
+SELECT DISTINCT a FROM distinct_a ORDER BY a DESC;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a;
+
+-- test index skip scan with a condition on a non unique field
+SELECT DISTINCT ON (a) a, b FROM distinct_a WHERE b = 2;
+
+-- test index skip scan backwards
+SELECT DISTINCT ON (a) a, b FROM distinct_a ORDER BY a DESC, b DESC;
+
+-- test index skip scan for expressions
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT (a + 1) FROM distinct_a;
+SELECT DISTINCT (a + 1) FROM distinct_a;
+
+-- check colums order
+CREATE INDEX distinct_a_b_a on distinct_a (b, a);
+
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT on (a, b) a, b FROM distinct_a WHERE b = 2;
+
+DROP INDEX distinct_a_b_a;
+
+-- test opposite scan/index directions inside a cursor
+-- forward/backward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a, b;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- backward/forward
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b FROM distinct_a ORDER BY a DESC, b DESC;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+-- test missing values and skipping from the end
+CREATE TABLE distinct_abc(a int, b int, c int);
+CREATE INDEX ON distinct_abc(a, b, c);
+INSERT INTO distinct_abc
+	VALUES (1, 1, 1),
+		   (1, 1, 2),
+		   (1, 2, 2),
+		   (1, 2, 3),
+		   (2, 2, 1),
+		   (2, 2, 3),
+		   (3, 1, 1),
+		   (3, 1, 2),
+		   (3, 2, 2),
+		   (3, 2, 3);
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR
+SELECT DISTINCT ON (a) a,b,c FROM distinct_abc WHERE c = 2
+ORDER BY a DESC, b DESC;
+
+FETCH ALL FROM c;
+FETCH BACKWARD ALL FROM c;
+
+END;
+
+DROP TABLE distinct_abc;
+
+-- index skip scan
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a ORDER BY a;
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c
+FROM distinct_a WHERE a = 1 ORDER BY a;
+
+-- check colums order
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT a FROM distinct_a WHERE b = 2 AND c = 10;
+
+-- check projection case
+SELECT DISTINCT a, a FROM distinct_a WHERE b = 2;
+SELECT DISTINCT a, 1 FROM distinct_a WHERE b = 2;
+
+-- test cursor forward/backward movements
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT DISTINCT a FROM distinct_a;
+
+FETCH FROM c;
+FETCH BACKWARD FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+FETCH 6 FROM c;
+FETCH BACKWARD 6 FROM c;
+
+END;
+
+DROP TABLE distinct_a;
+
+-- test tuples visibility
+CREATE TABLE distinct_visibility (a int, b int);
+INSERT INTO distinct_visibility (select a, b from generate_series(1,5) a, generate_series(1, 10000) b);
+CREATE INDEX ON distinct_visibility (a, b);
+ANALYZE distinct_visibility;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 1;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DELETE FROM distinct_visibility WHERE a = 2 and b = 10000;
+SELECT DISTINCT ON (a) a, b FROM distinct_visibility ORDER BY a DESC, b DESC;
+DROP TABLE distinct_visibility;
+
+-- test page boundaries
+CREATE TABLE distinct_boundaries AS
+    SELECT a, b::int2 b, (b % 2)::int2 c FROM
+        generate_series(1, 5) a,
+        generate_series(1,366) b;
+
+CREATE INDEX ON distinct_boundaries (a, b, c);
+ANALYZE distinct_boundaries;
+
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+SELECT DISTINCT ON (a) a, b, c from distinct_boundaries
+WHERE b >= 1 and c = 0 ORDER BY a, b;
+
+DROP TABLE distinct_boundaries;
+
+-- test tuple killing
+
+-- DESC ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed where a = 3;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
+
+-- regular ordering
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed where a = 3;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a, b;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
+
+-- partial delete
+CREATE TABLE distinct_killed AS
+    SELECT a, b, b % 2 AS c, 10 AS d
+        FROM generate_series(1, 5) a,
+             generate_series(1,1000) b;
+
+CREATE INDEX ON distinct_killed (a, b, c, d);
+
+DELETE FROM distinct_killed WHERE a = 3 AND b <= 999;
+
+BEGIN;
+    DECLARE c SCROLL CURSOR FOR
+    SELECT DISTINCT ON (a) a,b,c,d
+    FROM distinct_killed ORDER BY a DESC, b DESC;
+    FETCH FORWARD ALL FROM c;
+    FETCH BACKWARD ALL FROM c;
+COMMIT;
+
+DROP TABLE distinct_killed;
-- 
2.21.0

#71

Floris Van Nee

florisvannee@Optiver.com

almost 6 years ago

In reply to: Dmitry Dolgov (#70)

* Suspicious performance difference between different type of workload,
mentioned by Tomas (unfortunately I had no chance yet to investigate).

His benchmark results indeed most likely point to multiple comparisons being done. Since the most likely place where these occur is _bt_readpage, I suspect this is called multiple times. Looking at your patch, I think that's indeed the case. For example, suppose a page contains [1,2,3,4,5] and the planner makes a complete misestimation and chooses a skip scan here. First call to _bt_readpage will compare every tuple on the page already and store everything in the workspace, which will now contain [1,2,3,4,5]. However, when a skip is done the elements on the page (not the workspace) are compared to find the next one. Then, another _bt_readpage is done, starting at the new offnum. So we'll compare every tuple (except 1) on the page again. Workspace now contains [2,3,4,5]. Next tuple we'll end up with [3,4,5] etc. So tuple 5 actually gets compared 5 times in _bt_readpage alone.

-Floris

#72

Dilip Kumar

dilipbalaut@gmail.com

almost 6 years ago

In reply to: Dmitry Dolgov (#70)

Re: Index Skip Scan

On Wed, Apr 8, 2020 at 1:10 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Mon, Apr 06, 2020 at 06:31:08PM +0000, Floris Van Nee wrote:

Hm, I wasn't aware about this one, thanks for bringing this up. Btw, Floris, I
would appreciate if in the future you can make it more visible that changes you
suggest contain some fixes. E.g. it wasn't clear for me from your previous email
that that's the case, and it doesn't make sense to pull into different direction
when we're trying to achieve the same goal :)

I wasn't aware that this particular case could be triggered before I saw Dilip's email, otherwise I'd have mentioned it here of course. It's just that because my patch handles filter conditions in general, it works for this case too.

Oh, then fortunately I've got a wrong impression, sorry and thanks for
clarification :)

In the patch I posted a week ago these cases are all handled
correctly, as it introduces this extra logic in the Executor.

Okay, So I think we can merge those fixes in Dmitry's patch set.

I'll definitely take a look at suggested changes in filtering part.

It may be possible to just merge the filtering part into your patch, but I'm not entirely sure. Basically you have to pull the information about skipping one level up, out of the node, into the generic IndexNext code.

I was actually thinking more about just preventing skip scan in this
situation, which is if I'm not mistaken could be solved by inspecting
qual conditions to figure out if they're covered in the index -
something like in attachments (this implementation is actually too
restrictive in this sense and will not allow e.g. expressions, that's
why I haven't bumped patch set version for it - soon I'll post an
extended version).

Some more comments...

+ so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+ _bt_update_skip_scankeys(scan, indexRel);
+
.......
+ /*
+ * We haven't found scan key within the current page, so let's scan from
+ * the root. Use _bt_search and _bt_binsrch to get the buffer and offset
+ * number
+ */
+ so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+ stack = _bt_search(scan->indexRelation, so->skipScanKey,
+    &buf, BT_READ, scan->xs_snapshot);

Why do we need to set so->skipScanKey->nextkey =
ScanDirectionIsForward(dir); multiple times? I think it is fine to
just
set it once?

+static inline bool
+_bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+ Buffer buf, ScanDirection dir)
+{
+ OffsetNumber low, high;
+ Page page = BufferGetPage(buf);
+ BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+
+ low = P_FIRSTDATAKEY(opaque);
+ high = PageGetMaxOffsetNumber(page);
+
+ if (unlikely(high < low))
+ return false;
+
+ return (_bt_compare(scan->indexRelation, key, page, low) > 0 &&
+ _bt_compare(scan->indexRelation, key, page, high) < 1);
+}

I think the high key condition should be changed to
_bt_compare(scan->indexRelation, key, page, high) < 0 ? Because if
prefix qual is equal to the high key then also
there is no point in searching on the current page so we can directly skip.

+ /* Check if an index skip scan is possible. */
+ can_skip = enable_indexskipscan & index->amcanskip;
+
+ /*
+ * Skip scan is not supported when there are qual conditions, which are not
+ * covered by index. The reason for that is that those conditions are
+ * evaluated later, already after skipping was applied.
+ *
+ * TODO: This implementation is too restrictive, and doesn't allow e.g.
+ * index expressions. For that we need to examine index_clauses too.
+ */
+ if (root->parse->jointree != NULL)
+ {
+ ListCell *lc;
+
+ foreach(lc, (List *)root->parse->jointree->quals)
+ {
+ Node *expr, *qual = (Node *) lfirst(lc);
+ Var *var;

I think we can avoid checking for expression if can_skip is already false.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#73

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: Dmitry Dolgov (#70)

Re: Index Skip Scan

On Wed, 8 Apr 2020 at 07:40, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Other than that to summarize current open points for future readers
(this thread somehow became quite big):

* Making UniqueKeys usage more generic to allow using skip scan for more
use cases (hopefully it was covered by the v33, but I still need a
confirmation from David, like blinking twice or something).

I've not yet looked at the latest patch, but I did put some thoughts
into an email on the other thread that's been discussing UniqueKeys
[1]: /messages/by-id/CAApHDvpx1qED1uLqubcKJ=oHatCMd7pTUKkdq0B72_08nbR3Hw@mail.gmail.com

I'm keen to hear thoughts on the plan I mentioned over there. Likely
it would be best to discuss the specifics of what additional features
we need to add to UniqueKeys for skip scans over here, but discuss any
chances which affect both patches over there. We certainly can't have
two separate implementations of UniqueKeys, so I believe the skip
scans UniqueKeys patch should most likely be based on the one in [1]/messages/by-id/CAApHDvpx1qED1uLqubcKJ=oHatCMd7pTUKkdq0B72_08nbR3Hw@mail.gmail.com
or some descendant of it.

[1]: /messages/by-id/CAApHDvpx1qED1uLqubcKJ=oHatCMd7pTUKkdq0B72_08nbR3Hw@mail.gmail.com

#74

Dmitry Dolgov

9erthalion6@gmail.com

over 5 years ago

In reply to: David Rowley (#73)

Re: Index Skip Scan

Sorry for late reply.

On Tue, Apr 14, 2020 at 09:19:22PM +1200, David Rowley wrote:

I've not yet looked at the latest patch, but I did put some thoughts
into an email on the other thread that's been discussing UniqueKeys
[1].

I'm keen to hear thoughts on the plan I mentioned over there. Likely
it would be best to discuss the specifics of what additional features
we need to add to UniqueKeys for skip scans over here, but discuss any
chances which affect both patches over there. We certainly can't have
two separate implementations of UniqueKeys, so I believe the skip
scans UniqueKeys patch should most likely be based on the one in [1]
or some descendant of it.

[1] /messages/by-id/CAApHDvpx1qED1uLqubcKJ=oHatCMd7pTUKkdq0B72_08nbR3Hw@mail.gmail.com

Yes, I've come to the same conclusion, although I have my concerns about
having such a dependency between patches. Will look at the suggested
patches soon.

#75

Dmitry Dolgov

9erthalion6@gmail.com

over 5 years ago

In reply to: Dilip Kumar (#72)

Re: Index Skip Scan

On Sat, Apr 11, 2020 at 03:17:25PM +0530, Dilip Kumar wrote:

Some more comments...

Thanks for reviewing. Since this patch took much longer than I expected,
it's useful to have someone to look at it with a "fresh eyes".

+ so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+ _bt_update_skip_scankeys(scan, indexRel);
+
.......
+ /*
+ * We haven't found scan key within the current page, so let's scan from
+ * the root. Use _bt_search and _bt_binsrch to get the buffer and offset
+ * number
+ */
+ so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+ stack = _bt_search(scan->indexRelation, so->skipScanKey,
+    &buf, BT_READ, scan->xs_snapshot);

Why do we need to set so->skipScanKey->nextkey =
ScanDirectionIsForward(dir); multiple times? I think it is fine to
just set it once?

I believe it was necessary for previous implementations, but in the
current version we can avoid this, you're right.

+static inline bool
+_bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+ Buffer buf, ScanDirection dir)
+{
+ OffsetNumber low, high;
+ Page page = BufferGetPage(buf);
+ BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+
+ low = P_FIRSTDATAKEY(opaque);
+ high = PageGetMaxOffsetNumber(page);
+
+ if (unlikely(high < low))
+ return false;
+
+ return (_bt_compare(scan->indexRelation, key, page, low) > 0 &&
+ _bt_compare(scan->indexRelation, key, page, high) < 1);
+}
I think the high key condition should be changed to
_bt_compare(scan->indexRelation, key, page, high) < 0 ? Because if
prefix qual is equal to the high key then also there is no point in
searching on the current page so we can directly skip.

From nbtree/README and comments to functions like _bt_split I've got an
impression that the high key could be equal to the last item on the leaf
page, so there is a point in searching. Is that incorrect?

+ /* Check if an index skip scan is possible. */
+ can_skip = enable_indexskipscan & index->amcanskip;
+
+ /*
+ * Skip scan is not supported when there are qual conditions, which are not
+ * covered by index. The reason for that is that those conditions are
+ * evaluated later, already after skipping was applied.
+ *
+ * TODO: This implementation is too restrictive, and doesn't allow e.g.
+ * index expressions. For that we need to examine index_clauses too.
+ */
+ if (root->parse->jointree != NULL)
+ {
+ ListCell *lc;
+
+ foreach(lc, (List *)root->parse->jointree->quals)
+ {
+ Node *expr, *qual = (Node *) lfirst(lc);
+ Var *var;

I think we can avoid checking for expression if can_skip is already false.

Yes, that makes sense. I'll include your suggestions into the next
rebased version I'm preparing.

#76

Dilip Kumar

dilipbalaut@gmail.com

over 5 years ago

In reply to: Dmitry Dolgov (#75)

Re: Index Skip Scan

On Sun, May 10, 2020 at 11:17 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Sat, Apr 11, 2020 at 03:17:25PM +0530, Dilip Kumar wrote:

Some more comments...

Thanks for reviewing. Since this patch took much longer than I expected,
it's useful to have someone to look at it with a "fresh eyes".
+ so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+ _bt_update_skip_scankeys(scan, indexRel);
+
.......
+ /*
+ * We haven't found scan key within the current page, so let's scan from
+ * the root. Use _bt_search and _bt_binsrch to get the buffer and offset
+ * number
+ */
+ so->skipScanKey->nextkey = ScanDirectionIsForward(dir);
+ stack = _bt_search(scan->indexRelation, so->skipScanKey,
+    &buf, BT_READ, scan->xs_snapshot);
Why do we need to set so->skipScanKey->nextkey =
ScanDirectionIsForward(dir); multiple times? I think it is fine to
just set it once?
I believe it was necessary for previous implementations, but in the
current version we can avoid this, you're right.
+static inline bool
+_bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+ Buffer buf, ScanDirection dir)
+{
+ OffsetNumber low, high;
+ Page page = BufferGetPage(buf);
+ BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+
+ low = P_FIRSTDATAKEY(opaque);
+ high = PageGetMaxOffsetNumber(page);
+
+ if (unlikely(high < low))
+ return false;
+
+ return (_bt_compare(scan->indexRelation, key, page, low) > 0 &&
+ _bt_compare(scan->indexRelation, key, page, high) < 1);
+}
I think the high key condition should be changed to
_bt_compare(scan->indexRelation, key, page, high) < 0 ? Because if
prefix qual is equal to the high key then also there is no point in
searching on the current page so we can directly skip.
From nbtree/README and comments to functions like _bt_split I've got an
impression that the high key could be equal to the last item on the leaf
page, so there is a point in searching. Is that incorrect?

But IIUC, here we want to decide whether we will get the next key in
the current page or not? Is my understanding is correct? So if our
key (the last tuple key) is equal to the high key means the max key on
this page is the same as what we already got in the last tuple so why
would we want to go on this page? because this will not give us the
new key. So ideally, we should only be looking into this page if our
last tuple key is smaller than the high key. Am I missing something?

+ /* Check if an index skip scan is possible. */
+ can_skip = enable_indexskipscan & index->amcanskip;
+
+ /*
+ * Skip scan is not supported when there are qual conditions, which are not
+ * covered by index. The reason for that is that those conditions are
+ * evaluated later, already after skipping was applied.
+ *
+ * TODO: This implementation is too restrictive, and doesn't allow e.g.
+ * index expressions. For that we need to examine index_clauses too.
+ */
+ if (root->parse->jointree != NULL)
+ {
+ ListCell *lc;
+
+ foreach(lc, (List *)root->parse->jointree->quals)
+ {
+ Node *expr, *qual = (Node *) lfirst(lc);
+ Var *var;

I think we can avoid checking for expression if can_skip is already false.

Yes, that makes sense. I'll include your suggestions into the next
rebased version I'm preparing.

Ok.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#77

Dmitry Dolgov

9erthalion6@gmail.com

over 5 years ago

In reply to: Dilip Kumar (#76)

Re: Index Skip Scan

On Mon, May 11, 2020 at 04:04:00PM +0530, Dilip Kumar wrote:
+static inline bool
+_bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+ Buffer buf, ScanDirection dir)
+{
+ OffsetNumber low, high;
+ Page page = BufferGetPage(buf);
+ BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+
+ low = P_FIRSTDATAKEY(opaque);
+ high = PageGetMaxOffsetNumber(page);
+
+ if (unlikely(high < low))
+ return false;
+
+ return (_bt_compare(scan->indexRelation, key, page, low) > 0 &&
+ _bt_compare(scan->indexRelation, key, page, high) < 1);
+}
I think the high key condition should be changed to
_bt_compare(scan->indexRelation, key, page, high) < 0 ? Because if
prefix qual is equal to the high key then also there is no point in
searching on the current page so we can directly skip.
From nbtree/README and comments to functions like _bt_split I've got an
impression that the high key could be equal to the last item on the leaf
page, so there is a point in searching. Is that incorrect?
But IIUC, here we want to decide whether we will get the next key in
the current page or not?

In general this function does what it says, it checks wether or not the
provided scankey could be found within the page. All the logic about
finding a proper next key to fetch is implemented on the call site, and
within this function we want to test whatever was passed in. Does it
answer the question?

#78

Dilip Kumar

dilipbalaut@gmail.com

over 5 years ago

In reply to: Dmitry Dolgov (#77)

Re: Index Skip Scan

On Mon, May 11, 2020 at 4:55 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Mon, May 11, 2020 at 04:04:00PM +0530, Dilip Kumar wrote:
+static inline bool
+_bt_scankey_within_page(IndexScanDesc scan, BTScanInsert key,
+ Buffer buf, ScanDirection dir)
+{
+ OffsetNumber low, high;
+ Page page = BufferGetPage(buf);
+ BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+
+ low = P_FIRSTDATAKEY(opaque);
+ high = PageGetMaxOffsetNumber(page);
+
+ if (unlikely(high < low))
+ return false;
+
+ return (_bt_compare(scan->indexRelation, key, page, low) > 0 &&
+ _bt_compare(scan->indexRelation, key, page, high) < 1);
+}
I think the high key condition should be changed to
_bt_compare(scan->indexRelation, key, page, high) < 0 ? Because if
prefix qual is equal to the high key then also there is no point in
searching on the current page so we can directly skip.
From nbtree/README and comments to functions like _bt_split I've got an
impression that the high key could be equal to the last item on the leaf
page, so there is a point in searching. Is that incorrect?
But IIUC, here we want to decide whether we will get the next key in
the current page or not?
In general this function does what it says, it checks wether or not the
provided scankey could be found within the page. All the logic about
finding a proper next key to fetch is implemented on the call site, and
within this function we want to test whatever was passed in. Does it
answer the question?

Ok, I agree that the function is doing what it is expected to do.
But, then I have a problem with this call site.

+ /* Check if the next unique key can be found within the current page.
+ * Since we do not lock the current page between jumps, it's possible
+ * that it was splitted since the last time we saw it. This is fine in
+ * case of scanning forward, since page split to the right and we are
+ * still on the left most page. In case of scanning backwards it's
+ * possible to loose some pages and we need to remember the previous
+ * page, and then follow the right link from the current page until we
+ * find the original one.
+ *
+ * Since the whole idea of checking the current page is to protect
+ * ourselves and make more performant statistic mismatch case when
+ * there are too many distinct values for jumping, it's not clear if
+ * the complexity of this solution in case of backward scan is
+ * justified, so for now just avoid it.
+ */
+ if (BufferIsValid(so->currPos.buf) && ScanDirectionIsForward(dir))
+ {
+ LockBuffer(so->currPos.buf, BT_READ);
+
+ if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
+ {

Here we expect whether the "next" unique key can be found on this page
or not, but we are using the function which suggested whether the
"current" key can be found on this page or not. I think in boundary
cases where the high key is equal to the current key, this function
will return true (which is expected from this function), and based on
that we will simply scan the current page and IMHO that cost could be
avoided no?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#79

Peter Geoghegan

pg@bowt.ie

over 5 years ago

In reply to: Peter Geoghegan (#4)

Re: Index Skip Scan

On Mon, Jan 20, 2020 at 5:05 PM Peter Geoghegan <pg@bowt.ie> wrote:

You can add another assertion that calls a new utility function in
bufmgr.c. That can use the same logic as this existing assertion in
FlushOneBuffer():

Assert(LWLockHeldByMe(BufferDescriptorGetContentLock(bufHdr)));

We haven't needed assertions like this so far because it's usually it
is clear whether or not a buffer lock is held (plus the bufmgr.c
assertions help on their own).

Just in case anybody missed it, I am working on a patch that makes
nbtree use Valgrind instrumentation to detect page accessed without a
buffer content lock held:

/messages/by-id/CAH2-WzkLgyN3zBvRZ1pkNJThC=xi_0gpWRUb_45eexLH1+k2_Q@mail.gmail.com

There is also one component that detects when any buffer is accessed
without a buffer pin.

--
Peter Geoghegan

#80

Dmitry Dolgov

9erthalion6@gmail.com

over 5 years ago

In reply to: Dilip Kumar (#78)

Re: Index Skip Scan

On Wed, May 13, 2020 at 02:37:21PM +0530, Dilip Kumar wrote:
+ if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
+ {
Here we expect whether the "next" unique key can be found on this page
or not, but we are using the function which suggested whether the
"current" key can be found on this page or not. I think in boundary
cases where the high key is equal to the current key, this function
will return true (which is expected from this function), and based on
that we will simply scan the current page and IMHO that cost could be
avoided no?

Yes, looks like you're right, there is indeed an unecessary extra scan
happening. To avoid that we can see the key->nextkey and adjust higher
boundary correspondingly. Will also add this into the next rebased
patch, thanks!

#81

Dilip Kumar

dilipbalaut@gmail.com

over 5 years ago

In reply to: Dmitry Dolgov (#80)

Re: Index Skip Scan

On Fri, 15 May 2020 at 6:06 PM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Wed, May 13, 2020 at 02:37:21PM +0530, Dilip Kumar wrote:

+ if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf,

dir))

+ {

Here we expect whether the "next" unique key can be found on this page
or not, but we are using the function which suggested whether the
"current" key can be found on this page or not. I think in boundary
cases where the high key is equal to the current key, this function
will return true (which is expected from this function), and based on
that we will simply scan the current page and IMHO that cost could be
avoided no?

Yes, looks like you're right, there is indeed an unecessary extra scan
happening. To avoid that we can see the key->nextkey and adjust higher
boundary correspondingly. Will also add this into the next rebased
patch, thanks!

Great thanks!

--

Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#82

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Dmitry Dolgov (#70)

Re: Index Skip Scan

On Wed, Apr 8, 2020 at 3:41 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Mon, Apr 06, 2020 at 06:31:08PM +0000, Floris Van Nee wrote:

Hm, I wasn't aware about this one, thanks for bringing this up. Btw,

Floris, I

would appreciate if in the future you can make it more visible that

changes you

suggest contain some fixes. E.g. it wasn't clear for me from your

previous email

that that's the case, and it doesn't make sense to pull into different

direction

when we're trying to achieve the same goal :)

I wasn't aware that this particular case could be triggered before I saw

Dilip's email, otherwise I'd have mentioned it here of course. It's just
that because my patch handles filter conditions in general, it works for
this case too.

Oh, then fortunately I've got a wrong impression, sorry and thanks for
clarification :)

In the patch I posted a week ago these cases are all handled
correctly, as it introduces this extra logic in the Executor.

Okay, So I think we can merge those fixes in Dmitry's patch set.

I'll definitely take a look at suggested changes in filtering part.

It may be possible to just merge the filtering part into your patch, but

I'm not entirely sure. Basically you have to pull the information about
skipping one level up, out of the node, into the generic IndexNext code.

I was actually thinking more about just preventing skip scan in this
situation, which is if I'm not mistaken could be solved by inspecting
qual conditions to figure out if they're covered in the index -
something like in attachments (this implementation is actually too
restrictive in this sense and will not allow e.g. expressions, that's
why I haven't bumped patch set version for it - soon I'll post an
extended version).

Other than that to summarize current open points for future readers
(this thread somehow became quite big):

* Making UniqueKeys usage more generic to allow using skip scan for more
use cases (hopefully it was covered by the v33, but I still need a
confirmation from David, like blinking twice or something).

* Suspicious performance difference between different type of workload,
mentioned by Tomas (unfortunately I had no chance yet to investigate).

* Thinking about supporting conditions, that are not covered by the index,
to make skipping more flexible (one of the potential next steps in the
future, as suggested by Floris).

Looks this is the latest patch, which commit it is based on? Thanks

--
Best Regards
Andy Fan

#83

Dmitry Dolgov

9erthalion6@gmail.com

over 5 years ago

In reply to: Andy Fan (#82)

Re: Index Skip Scan

On Tue, Jun 02, 2020 at 08:36:31PM +0800, Andy Fan wrote:

Other than that to summarize current open points for future readers
(this thread somehow became quite big):

* Making UniqueKeys usage more generic to allow using skip scan for more
use cases (hopefully it was covered by the v33, but I still need a
confirmation from David, like blinking twice or something).

* Suspicious performance difference between different type of workload,
mentioned by Tomas (unfortunately I had no chance yet to investigate).

* Thinking about supporting conditions, that are not covered by the index,
to make skipping more flexible (one of the potential next steps in the
future, as suggested by Floris).

Looks this is the latest patch, which commit it is based on? Thanks

I have a rebased version, if you're about it. Didn't posted it yet
mostly since I'm in the middle of adapting it to the UniqueKeys from
other thread. Would it be ok for you to wait a bit until I'll post
finished version?

#84

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Dmitry Dolgov (#83)

Re: Index Skip Scan

On Tue, Jun 2, 2020 at 9:38 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Tue, Jun 02, 2020 at 08:36:31PM +0800, Andy Fan wrote:

Other than that to summarize current open points for future readers
(this thread somehow became quite big):

* Making UniqueKeys usage more generic to allow using skip scan for

more

use cases (hopefully it was covered by the v33, but I still need a
confirmation from David, like blinking twice or something).

* Suspicious performance difference between different type of workload,
mentioned by Tomas (unfortunately I had no chance yet to

investigate).

* Thinking about supporting conditions, that are not covered by the

index,

to make skipping more flexible (one of the potential next steps in

the

future, as suggested by Floris).

Looks this is the latest patch, which commit it is based on? Thanks

I have a rebased version, if you're about it. Didn't posted it yet
mostly since I'm in the middle of adapting it to the UniqueKeys from
other thread. Would it be ok for you to wait a bit until I'll post
finished version?

Sure, that's OK. The discussion on UniqueKey thread looks more complex
than what I expected, that's why I want to check the code here, but that's
fine,
you can work on your schedule.

--
Best Regards
Andy Fan