Tid scan improvements

Started by Edmund Hornerover 7 years ago97 messages

ejrh00@gmail.com

over 7 years ago

2 attachment(s)

Hello,

To scratch an itch, I have been working on teaching TidScan how to do
range queries, i.e. those using >=, <, BETWEEN, etc. This means we
can write, for instance,

SELECT * FROM t WHERE ctid >= '(1000,0)' AND ctid < '(2000,0)';

instead of resorting to the old trick:

SELECT * FROM t WHERE ctid = ANY (ARRAY(SELECT format('(%s,%s)', i, j)::tid
FROM generate_series(1000,1999) AS gs(i), generate_series(1,200)
AS gs2(j)));

where "200" is some guess at how many tuples can fit on a page for that table.

There's some previous discussion about this at
/messages/by-id/CAHyXU0zJhg_5RtxKnNbAK=4ZzQEFUFi+52RjpLrxtkRTD6CDFw@mail.gmail.com
.

Since range scan execution is rather different from the existing
TidScan execution, I ended up making a new plan type, TidRangeScan.
There is still only one TidPath, but it has an additional member that
describes which method to use.

As part of the work I also taught TidScan that its results are ordered
by ctid, i.e. to set a pathkey on a TidPath. The benefit of this is
that queries such as

SELECT MAX(ctid) FROM t;
SELECT * FROM t WHERE ctid IN (...) ORDER BY ctid;

are now planned a bit more efficiently. Execution was already
returning tuples in ascending ctid order; I just had to add support
for descending order.

Attached are a couple of patches:
- 01_tid_scan_ordering.patch
- 02_tid_range_scan.patch, to be applied on top of 01.

Can I add this to the next CommitFest?

Obviously the whole thing needs thorough review, and I expect there to
be numerous problems. (I had to make this prototype to demonstrate to
myself that it wasn't completely beyond me. I know from experience
how easy it is to enthusiastically volunteer something for an open
source project, discover that one does not have the time or skill
required, and be too embarrassed to show one's face again!)

As well as actual correctness, some aspects that I am particularly
unsure about include:

- Is it messy to use TidPath for both types of scan?
- What is the planning cost for plans that don't end up being a
TidScan or TidRangeScan?
- Have I put the various helper functions in the right files?
- Is there a less brittle way to create tables of a specific number
of blocks/tuples in the regression tests?
- Have a got the ScanDirection right during execution?
- Are my changes to heapam ok?

Cheers,
Edmund

Attachments:

01_tid_scan_ordering.patchapplication/octet-stream; name=01_tid_scan_ordering.patchDownload

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 16a80a0..d9deb72 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -111,6 +111,7 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
 static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static void show_scan_direction(ExplainState *es, ScanDirection direction);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
 static void ExplainScanTarget(Scan *plan, ExplainState *es);
@@ -1194,7 +1195,6 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SeqScan:
 		case T_SampleScan:
 		case T_BitmapHeapScan:
-		case T_TidScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1203,6 +1203,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_WorkTableScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_TidScan:
+			show_scan_direction(es, ((TidScan *) plan)->direction);
+			ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_ForeignScan:
 		case T_CustomScan:
 			if (((Scan *) plan)->scanrelid > 0)
@@ -2797,25 +2801,21 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
 }
 
 /*
- * Add some additional details about an IndexScan or IndexOnlyScan
+ * Show the direction of a scan.
  */
 static void
-ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
-						ExplainState *es)
+show_scan_direction(ExplainState *es, ScanDirection direction)
 {
-	const char *indexname = explain_get_index_name(indexid);
-
 	if (es->format == EXPLAIN_FORMAT_TEXT)
 	{
-		if (ScanDirectionIsBackward(indexorderdir))
+		if (ScanDirectionIsBackward(direction))
 			appendStringInfoString(es->str, " Backward");
-		appendStringInfo(es->str, " using %s", indexname);
 	}
 	else
 	{
 		const char *scandir;
 
-		switch (indexorderdir)
+		switch (direction)
 		{
 			case BackwardScanDirection:
 				scandir = "Backward";
@@ -2831,11 +2831,27 @@ ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 				break;
 		}
 		ExplainPropertyText("Scan Direction", scandir, es);
-		ExplainPropertyText("Index Name", indexname, es);
 	}
 }
 
 /*
+ * Add some additional details about an IndexScan or IndexOnlyScan
+ */
+static void
+ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
+						ExplainState *es)
+{
+	const char *indexname = explain_get_index_name(indexid);
+
+	show_scan_direction(es, indexorderdir);
+
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+		appendStringInfo(es->str, " using %s", indexname);
+	else
+		ExplainPropertyText("Index Name", indexname, es);
+}
+
+/*
  * Show the target of a Scan node
  */
 static void
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index e207b1f..df11d92 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -50,6 +50,7 @@ typedef struct TidExpr
 static void TidExprListCreate(TidScanState *tidstate);
 static void TidListEval(TidScanState *tidstate);
 static int	itemptr_comparator(const void *a, const void *b);
+static int	itemptr_comparator_reverse(const void *a, const void *b);
 static TupleTableSlot *TidNext(TidScanState *node);
 
 
@@ -225,13 +226,12 @@ TidListEval(TidScanState *tidstate)
 							  RelationGetRelid(tidstate->ss.ss_currentRelation),
 							  &cursor_tid))
 			{
-				if (numTids >= numAllocTids)
-				{
-					numAllocTids *= 2;
-					tidList = (ItemPointerData *)
-						repalloc(tidList,
-								 numAllocTids * sizeof(ItemPointerData));
-				}
+				/*
+				 * A current-of TidExpr only exists by itself, and we should
+				 * already have allocated a tidList entry for it.  We don't
+				 * need to check whether the tidList array needs to be resized.
+				 */
+				Assert(numTids < numAllocTids);
 				tidList[numTids++] = cursor_tid;
 			}
 		}
@@ -247,12 +247,16 @@ TidListEval(TidScanState *tidstate)
 	{
 		int			lastTid;
 		int			i;
+		int (* cmp) (const void *, const void *);
+
+		/* Choose the sort order based on the scan direction. */
+		cmp = ScanDirectionIsBackward(((TidScan *) tidstate->ss.ps.plan)->direction) ? itemptr_comparator_reverse : itemptr_comparator;
 
 		/* CurrentOfExpr could never appear OR'd with something else */
 		Assert(!tidstate->tss_isCurrentOf);
 
 		qsort((void *) tidList, numTids, sizeof(ItemPointerData),
-			  itemptr_comparator);
+			  cmp);
 		lastTid = 0;
 		for (i = 1; i < numTids; i++)
 		{
@@ -291,6 +295,15 @@ itemptr_comparator(const void *a, const void *b)
 	return 0;
 }
 
+/*
+ * qsort comparator for ItemPointerData items, in reverse order
+ */
+static int
+itemptr_comparator_reverse(const void *a, const void *b)
+{
+	return itemptr_comparator(b,a);
+}
+
 /* ----------------------------------------------------------------
  *		TidNext
  *
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 7c8220c..5f84984 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -583,6 +583,7 @@ _copyTidScan(const TidScan *from)
 	 * copy remainder of node
 	 */
 	COPY_NODE_FIELD(tidquals);
+	COPY_SCALAR_FIELD(direction);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 6269f47..870dd2e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -615,6 +615,7 @@ _outTidScan(StringInfo str, const TidScan *node)
 	_outScanInfo(str, (const Scan *) node);
 
 	WRITE_NODE_FIELD(tidquals);
+	WRITE_ENUM_FIELD(direction, ScanDirection);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3254524..2317f58 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1828,6 +1828,7 @@ _readTidScan(void)
 	ReadCommonScan(&local_node->scan);
 
 	READ_NODE_FIELD(tidquals);
+	READ_ENUM_FIELD(direction, ScanDirection);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index ec66cb9..729b207 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -18,6 +18,9 @@
 #include "postgres.h"
 
 #include "access/stratnum.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/plannodes.h"
@@ -848,6 +851,22 @@ build_join_pathkeys(PlannerInfo *root,
 	return truncate_useless_pathkeys(root, joinrel, outer_pathkeys);
 }
 
+/*
+ * build_tidscan_pathkeys
+ *	  Build the path keys corresponding to ORDER BY ctid ASC|DESC.
+ */
+List *
+build_tidscan_pathkeys(PlannerInfo *root,
+					   RelOptInfo *rel,
+					   ScanDirection direction)
+{
+	int opno = (direction == ForwardScanDirection) ? TIDLessOperator : TIDGreaterOperator;
+	Var *varexpr = makeVar(rel->relid, SelfItemPointerAttributeNumber, TIDOID, -1, InvalidOid, 0);
+	List *pathkeys = build_expression_pathkey(root, (Expr *) varexpr, NULL, opno, rel->relids, true);
+
+	return pathkeys;
+}
+
 /****************************************************************************
  *		PATHKEYS AND SORT CLAUSES
  ****************************************************************************/
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 3bb5b8d..7a40700 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -247,6 +247,8 @@ TidQualFromBaseRestrictinfo(RelOptInfo *rel)
  * create_tidscan_paths
  *	  Create paths corresponding to direct TID scans of the given rel.
  *
+ *	  Path keys and direction will be set on the scans if it looks useful.
+ *
  *	  Candidate paths are added to the rel's pathlist (using add_path).
  */
 void
@@ -265,6 +267,30 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 	tidquals = TidQualFromBaseRestrictinfo(rel);
 
 	if (tidquals)
-		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals,
+	{
+		List			*pathkeys = NULL;
+		ScanDirection	 direction = ForwardScanDirection;
+
+		if (has_useful_pathkeys(root, rel)) {
+			/*
+			 * Build path keys corresponding to ORDER BY ctid ASC, and check
+			 * whether they will be useful for this scan.  If not, build
+			 * path keys for DESC, and try that; set the direction to
+			 * BackwardScanDirection if so.  If neither of them will be
+			 * useful, no path keys will be set.
+			 */
+			pathkeys = build_tidscan_pathkeys(root, rel, ForwardScanDirection);
+			if (!pathkeys_contained_in(pathkeys, root->query_pathkeys))
+			{
+				pathkeys = build_tidscan_pathkeys(root, rel, BackwardScanDirection);
+				if (pathkeys_contained_in(pathkeys, root->query_pathkeys))
+					direction = BackwardScanDirection;
+				else
+					pathkeys = NULL;
+			}
+		}
+
+		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals, pathkeys, direction,
 												   required_outer));
+	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ae41c9e..4e1faa6 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -185,7 +185,7 @@ static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 					 List *bitmapqualorig,
 					 Index scanrelid);
 static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
-			 List *tidquals);
+			 List *tidquals, ScanDirection direction);
 static SubqueryScan *make_subqueryscan(List *qptlist,
 				  List *qpqual,
 				  Index scanrelid,
@@ -3097,7 +3097,9 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	scan_plan = make_tidscan(tlist,
 							 scan_clauses,
 							 scan_relid,
-							 tidquals);
+							 tidquals,
+							 best_path->direction
+							);
 
 	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
 
@@ -5179,7 +5181,8 @@ static TidScan *
 make_tidscan(List *qptlist,
 			 List *qpqual,
 			 Index scanrelid,
-			 List *tidquals)
+			 List *tidquals,
+			 ScanDirection direction)
 {
 	TidScan    *node = makeNode(TidScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5190,6 +5193,7 @@ make_tidscan(List *qptlist,
 	plan->righttree = NULL;
 	node->scan.scanrelid = scanrelid;
 	node->tidquals = tidquals;
+	node->direction = direction;
 
 	return node;
 }
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index c5aaaf5..e2d51a9 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1186,6 +1186,7 @@ create_bitmap_or_path(PlannerInfo *root,
  */
 TidPath *
 create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
+					List *pathkeys, ScanDirection direction,
 					Relids required_outer)
 {
 	TidPath    *pathnode = makeNode(TidPath);
@@ -1198,9 +1199,10 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 	pathnode->path.parallel_aware = false;
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = 0;
-	pathnode->path.pathkeys = NIL;	/* always unordered */
+	pathnode->path.pathkeys = pathkeys;
 
 	pathnode->tidquals = tidquals;
+	pathnode->direction = direction;
 
 	cost_tidscan(&pathnode->path, root, rel, tidquals,
 				 pathnode->path.param_info);
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index d9b6bad..31e7d61 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -156,7 +156,7 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 7c2abbd..96d30aa 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -492,6 +492,7 @@ typedef struct TidScan
 {
 	Scan		scan;
 	List	   *tidquals;		/* qual(s) involving CTID = something */
+	ScanDirection direction;
 } TidScan;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 41caf87..cf4839d 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1233,6 +1233,7 @@ typedef struct TidPath
 {
 	Path		path;
 	List	   *tidquals;		/* qual(s) involving CTID = something */
+	ScanDirection direction;
 } TidPath;
 
 /*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 7c5ff22..a0a88a5 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,7 +63,8 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  RelOptInfo *rel,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
-					List *tidquals, Relids required_outer);
+					List *tidquals, List *pathkeys, ScanDirection direction,
+					Relids required_outer);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 				   List *subpaths, List *partial_subpaths,
 				   Relids required_outer,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde30..3b915b5 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -211,6 +211,9 @@ extern List *build_join_pathkeys(PlannerInfo *root,
 					RelOptInfo *joinrel,
 					JoinType jointype,
 					List *outer_pathkeys);
+extern List *build_tidscan_pathkeys(PlannerInfo *root,
+									RelOptInfo *rel,
+									ScanDirection direction);
 extern List *make_pathkeys_for_sortclauses(PlannerInfo *root,
 							  List *sortclauses,
 							  List *tlist);
diff --git a/src/test/regress/expected/tidscan.out b/src/test/regress/expected/tidscan.out
index 521ed1b..7eebe77 100644
--- a/src/test/regress/expected/tidscan.out
+++ b/src/test/regress/expected/tidscan.out
@@ -116,6 +116,39 @@ FETCH FIRST FROM c;
 (1 row)
 
 ROLLBACK;
+-- check that ordering on a tidscan doesn't require a sort
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ Tid Scan on tidscan
+   TID Cond: (ctid = ANY ('{"(0,2)","(0,1)","(0,3)"}'::tid[]))
+(2 rows)
+
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+ ctid  | id 
+-------+----
+ (0,1) |  1
+ (0,2) |  2
+ (0,3) |  3
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ Tid Scan Backward on tidscan
+   TID Cond: (ctid = ANY ('{"(0,2)","(0,1)","(0,3)"}'::tid[]))
+(2 rows)
+
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+ ctid  | id 
+-------+----
+ (0,3) |  3
+ (0,2) |  2
+ (0,1) |  1
+(3 rows)
+
 -- tidscan via CURRENT OF
 BEGIN;
 DECLARE c CURSOR FOR SELECT ctid, * FROM tidscan;
diff --git a/src/test/regress/sql/tidscan.sql b/src/test/regress/sql/tidscan.sql
index a8472e0..5237f06 100644
--- a/src/test/regress/sql/tidscan.sql
+++ b/src/test/regress/sql/tidscan.sql
@@ -43,6 +43,15 @@ FETCH BACKWARD 1 FROM c;
 FETCH FIRST FROM c;
 ROLLBACK;
 
+-- check that ordering on a tidscan doesn't require a sort
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+
 -- tidscan via CURRENT OF
 BEGIN;
 DECLARE c CURSOR FOR SELECT ctid, * FROM tidscan;

02_tid_range_scan.patchapplication/octet-stream; name=02_tid_range_scan.patchDownload

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 72395a5..2e50d83 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -575,11 +575,20 @@ heapgettup(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_syncscan = false;
-			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
+			
+			/* Start from last page of the scan. */
+			if (scan->rs_numblocks == InvalidBlockNumber)
+			{
+				if (scan->rs_startblock > 0)
+					page = scan->rs_startblock - 1;
+				else
+					page = scan->rs_nblocks - 1;
+			}
 			else
-				page = scan->rs_nblocks - 1;
+			{
+				page = scan->rs_startblock + scan->rs_numblocks - 1;
+			}
+
 			heapgetpage(scan, page);
 		}
 		else
@@ -876,11 +885,18 @@ heapgettup_pagemode(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_syncscan = false;
+
 			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
-			else
-				page = scan->rs_nblocks - 1;
+			if (scan->rs_numblocks == InvalidBlockNumber) {
+				if (scan->rs_startblock > 0)
+					page = scan->rs_startblock - 1;
+				else
+					page = scan->rs_nblocks - 1;
+			}
+			else {
+				page = scan->rs_startblock + scan->rs_numblocks - 1;
+			}
+
 			heapgetpage(scan, page);
 		}
 		else
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d9deb72..b9472c0 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -859,6 +859,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1005,6 +1006,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_TidScan:
 			pname = sname = "Tid Scan";
 			break;
+		case T_TidRangeScan:
+			pname = sname = "Tid Range Scan";
+			break;
 		case T_SubqueryScan:
 			pname = sname = "Subquery Scan";
 			break;
@@ -1190,22 +1194,25 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		ExplainPropertyBool("Parallel Aware", plan->parallel_aware, es);
 	}
 
-	switch (nodeTag(plan))
-	{
-		case T_SeqScan:
-		case T_SampleScan:
-		case T_BitmapHeapScan:
-		case T_SubqueryScan:
-		case T_FunctionScan:
-		case T_TableFuncScan:
-		case T_ValuesScan:
-		case T_CteScan:
-		case T_WorkTableScan:
-			ExplainScanTarget((Scan *) plan, es);
-			break;
-		case T_TidScan:
-			show_scan_direction(es, ((TidScan *) plan)->direction);
-			ExplainScanTarget((Scan *) plan, es);
+	switch (nodeTag(plan)) {
+        case T_SeqScan:
+        case T_SampleScan:
+        case T_BitmapHeapScan:
+        case T_SubqueryScan:
+        case T_FunctionScan:
+        case T_TableFuncScan:
+        case T_ValuesScan:
+        case T_CteScan:
+        case T_WorkTableScan:
+            ExplainScanTarget((Scan *) plan, es);
+            break;
+        case T_TidScan:
+        case T_TidRangeScan:
+			{
+				ScanDirection dir = IsA(plan, TidScan) ? ((TidScan *) plan)->direction : ((TidRangeScan *) plan)->direction;
+				show_scan_direction(es, dir);
+				ExplainScanTarget((Scan *) plan, es);
+			}
 			break;
 		case T_ForeignScan:
 		case T_CustomScan:
@@ -1601,6 +1608,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
 											   planstate, es);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				List	   *tidquals = ((TidRangeScan *) plan)->tidquals;
+				show_scan_qual(tidquals, "TID Cond", planstate, ancestors, es);
+				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+				if (plan->qual)
+					show_instrumentation_count("Rows Removed by Filter", 1,
+											   planstate, es);
+			}
+			break;
 		case T_ForeignScan:
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
@@ -2898,6 +2915,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index cc09895..0152e31 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -28,6 +28,7 @@ OBJS = execAmi.o execCurrent.o execExpr.o execExprInterp.o \
        nodeValuesscan.o \
        nodeCtescan.o nodeNamedtuplestorescan.o nodeWorktablescan.o \
        nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
+       nodeTidrangescan.o \
        nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o tqueue.o spi.o \
        nodeTableFuncscan.o
 
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 9e78421..48ab2db 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -52,6 +52,7 @@
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
 #include "executor/nodeTidscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
 #include "executor/nodeWindowAgg.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecReScanTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecReScanSubqueryScan((SubqueryScanState *) node);
 			break;
@@ -520,6 +525,7 @@ ExecSupportsBackwardScan(Plan *node)
 
 		case T_SeqScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_FunctionScan:
 		case T_ValuesScan:
 		case T_CteScan:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index eaed9fb..dec4dac 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -109,6 +109,7 @@
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
 #include "executor/nodeTidscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
 #include "executor/nodeWindowAgg.h"
@@ -238,6 +239,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 												   estate, eflags);
 			break;
 
+		case T_TidRangeScan:
+			result = (PlanState *) ExecInitTidRangeScan((TidRangeScan *) node,
+														estate, eflags);
+			break;
+
 		case T_SubqueryScan:
 			result = (PlanState *) ExecInitSubqueryScan((SubqueryScan *) node,
 														estate, eflags);
@@ -632,6 +638,10 @@ ExecEndNode(PlanState *node)
 			ExecEndTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecEndTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecEndSubqueryScan((SubqueryScanState *) node);
 			break;
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
new file mode 100644
index 0000000..03e62d8
--- /dev/null
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -0,0 +1,380 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.c
+ *	  Routines to support scanning a range of tids
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeTidrangescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ *
+ *		ExecTidRangeScan			scans a relation using a range of tids.
+ *		ExecInitTidRangeScan		creates and initializes state info.
+ *		ExecReScanTidRangeScan		rescans the tid relation.
+ *		ExecEndTidRangeScan			releases all storage.
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/pg_type.h"
+#include "executor/execdebug.h"
+#include "executor/nodeTidrangescan.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+
+static void TidRangeEvalBounds(TidRangeScanState *tidstate, BlockNumber rs_nblocks);
+static TupleTableSlot *TidRangeNext(TidRangeScanState *node);
+static bool TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot);
+
+
+static void
+TidRangeEvalBounds(TidRangeScanState *tidstate, BlockNumber rs_nblocks)
+{
+	ExprContext *econtext = tidstate->ss.ps.ps_ExprContext;
+	ItemPointer itemptr;
+	bool        isNull;
+
+	if (tidstate->lower_expr)
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidstate->lower_expr,
+														econtext,
+														&isNull));
+	else
+		isNull = true;
+
+	if (!isNull)
+	{
+		tidstate->first_block = ItemPointerGetBlockNumberNoCheck(itemptr);
+		tidstate->first_tuple = ItemPointerGetOffsetNumberNoCheck(itemptr);
+
+		if (((TidRangeScan *) (tidstate->ss.ps.plan))->lower_strict)
+		{
+			tidstate->first_tuple++;
+			if (tidstate->last_tuple == 0)
+				tidstate->last_block++;
+		}
+
+		if (tidstate->first_block > 0 && tidstate->first_block >= rs_nblocks)
+		{
+			tidstate->first_block = 0;
+			tidstate->blocks_to_scan = 0;
+			return;
+		}
+	}
+	else
+	{
+		tidstate->first_block = 0;
+		tidstate->first_tuple = 0;
+	}
+	
+	Assert(tidstate->first_block == 0 || tidstate->first_block < rs_nblocks);
+
+	if (tidstate->upper_expr)
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidstate->upper_expr,
+														econtext,
+														&isNull));
+	else
+		isNull = true;
+
+	if (!isNull)
+	{
+		tidstate->last_block = ItemPointerGetBlockNumberNoCheck(itemptr);
+		tidstate->last_tuple = ItemPointerGetOffsetNumberNoCheck(itemptr);
+
+		if (((TidRangeScan *) (tidstate->ss.ps.plan))->upper_strict)
+		{
+			/* If decrementing the last_tuple would cause last_block to underflow, don't do it. */
+			if (tidstate->last_block == 0 && tidstate->last_tuple == 0)
+			{
+				tidstate->first_block = 0;
+				tidstate->blocks_to_scan = 0;
+				return;
+			}
+			else
+			{
+				if (tidstate->last_tuple == 0)
+					tidstate->last_block--;
+				tidstate->last_tuple--;
+			}
+		}
+	}
+	else
+	{
+		tidstate->last_block = InvalidBlockNumber;
+		tidstate->last_tuple = MaxOffsetNumber;
+	}
+	
+	tidstate->blocks_to_scan = BlockNumberIsValid(tidstate->last_block) ? (tidstate->last_block - tidstate->first_block + 1) : (rs_nblocks - tidstate->first_block);
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeNext
+ *
+ *		Retrieve a tuple from the TidRangeScan node's currentRelation
+ *		using a heap scan between the bounds in the TidRangeScanState.
+ *
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+TidRangeNext(TidRangeScanState *node)
+{
+	HeapTuple       tuple;
+	HeapScanDesc    scandesc;
+	EState         *estate;
+	ScanDirection   direction;
+	TupleTableSlot *slot;
+
+	/*
+	* get information from the estate and scan state
+	*/
+	scandesc = node->ss.ss_currentScanDesc;
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+	if (ScanDirectionIsBackward(((TidRangeScan *) node->ss.ps.plan)->direction))
+	{
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
+	}
+	slot = node->ss.ss_ScanTupleSlot;
+
+	/* compute bounds and start a new scan, if necessary */
+	if (node->first_block == InvalidBlockNumber)
+	{
+		if (scandesc == NULL)
+		{
+			scandesc = heap_beginscan_strat(node->ss.ss_currentRelation,
+											estate->es_snapshot,
+											0, NULL,
+											false, false);
+			node->ss.ss_currentScanDesc = scandesc;
+		}
+
+		TidRangeEvalBounds(node, scandesc->rs_nblocks);
+
+	    heap_setscanlimits(scandesc, node->first_block, node->blocks_to_scan);
+		printf("set scan limits to %d, %d\n", node->first_block, node->blocks_to_scan);
+	}
+
+	/*
+	* get the next tuple from the table
+	*/
+	for (;;)
+	{
+		BlockNumber block;
+		OffsetNumber offset;
+
+		tuple = heap_getnext(scandesc, direction);
+		if (!tuple)
+			break;
+
+		block = ItemPointerGetBlockNumber(&tuple->t_self);
+		offset = ItemPointerGetOffsetNumber(&tuple->t_self);
+
+		if (block == node->first_block && offset < node->first_tuple)
+			continue;
+
+		if (block == node->last_block && offset > node->last_tuple)
+			continue;
+
+		break;
+	}
+
+	/*
+	* save the tuple and the buffer returned to us by the access methods in
+	* our scan tuple slot and return the slot.  Note: we pass 'false' because
+	* tuples returned by heap_getnext() are pointers onto disk pages and were
+	* not created with palloc() and so should not be pfree()'d.  Note also
+	* that ExecStoreTuple will increment the refcount of the buffer; the
+	* refcount will not be dropped until the tuple table slot is cleared.
+	*/
+	if (tuple)
+	   ExecStoreTuple(tuple,   /* tuple to store */
+					  slot,    /* slot to store in */
+					  scandesc->rs_cbuf,   /* buffer associated with this
+											* tuple */
+					  false);  /* don't pfree this pointer */
+	else
+	   ExecClearTuple(slot);
+
+	return slot;
+}
+
+/*
+ * TidRangeRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecTidRangeScan(node)
+ *
+ *		Scans the relation using tids and returns
+ *		   the next qualifying tuple in the direction specified.
+ *		We call the ExecScan() routine and pass it the appropriate
+ *		access method functions.
+ *
+ *		Conditions:
+ *		  -- the "cursor" maintained by the AMI is positioned at the tuple
+ *			 returned previously.
+ *
+ *		Initial States:
+ *		  -- the relation indicated is opened for scanning so that the
+ *			 "cursor" is positioned before the first qualifying tuple.
+ *		  -- tidPtr is -1.
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecTidRangeScan(PlanState *pstate)
+{
+	TidRangeScanState *node = castNode(TidRangeScanState, pstate);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) TidRangeNext,
+					(ExecScanRecheckMtd) TidRangeRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanTidRangeScan(node)
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanTidRangeScan(TidRangeScanState *node)
+{
+	HeapScanDesc scan = node->ss.ss_currentScanDesc;
+
+	if (scan != NULL)
+		heap_rescan(scan,       /* scan desc */
+					NULL);      /* new scan keys */
+
+	/* mark tid range as not computed yet */
+	node->first_block = InvalidBlockNumber;
+
+	ExecScanReScan(&node->ss);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecEndTidRangeScan
+ *
+ *		Releases any storage allocated through C routines.
+ *		Returns nothing.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndTidRangeScan(TidRangeScanState *node)
+{
+	HeapScanDesc scan = node->ss.ss_currentScanDesc;
+
+	/*
+	 * Free the exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * clear out tuple table slots
+	 */
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+	/* close heap scan */
+	if (scan != NULL)
+		heap_endscan(scan);
+
+	/*
+	 * close the heap relation.
+	 */
+	ExecCloseScanRelation(node->ss.ss_currentRelation);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitTidRangeScan
+ *
+ *		Initializes the tid scan's state information, creates
+ *		scan keys, and opens the base and tid relations.
+ *
+ *		Parameters:
+ *		  node: TidNode node produced by the planner.
+ *		  estate: the execution state initialized in InitPlan.
+ * ----------------------------------------------------------------
+ */
+TidRangeScanState *
+ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
+{
+	TidRangeScanState *tidstate;
+	Relation	currentRelation;
+
+	/*
+	 * create state structure
+	 */
+	tidstate = makeNode(TidRangeScanState);
+	tidstate->ss.ps.plan = (Plan *) node;
+	tidstate->ss.ps.state = estate;
+	tidstate->ss.ps.ExecProcNode = ExecTidRangeScan;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &tidstate->ss.ps);
+
+	/*
+	 * mark tid range as not computed yet (note that only
+	 * first_block == InvalidBlockNumber is necessary; the
+	 * others are just for consistency)
+	 */
+	tidstate->first_block = InvalidBlockNumber;
+	tidstate->first_tuple = InvalidOffsetNumber;
+	tidstate->last_block = InvalidBlockNumber;
+	tidstate->last_tuple = InvalidOffsetNumber;
+
+	/*
+	 * open the base relation and acquire appropriate lock on it.
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	tidstate->ss.ss_currentRelation = currentRelation;
+	tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &tidstate->ss,
+						  RelationGetDescr(currentRelation));
+
+	/*
+	 * Initialize result slot, type and projection.
+	 */
+	ExecInitResultTupleSlotTL(estate, &tidstate->ss.ps);
+	ExecAssignScanProjectionInfo(&tidstate->ss);
+
+	/*
+	 * initialize child expressions
+	 */
+	tidstate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) tidstate);
+
+	tidstate->lower_expr = ExecInitExpr((Expr *) node->lower_bound, &tidstate->ss.ps);
+	tidstate->upper_expr = ExecInitExpr((Expr *) node->upper_bound, &tidstate->ss.ps);
+
+	/*
+	 * all done.
+	 */
+	return tidstate;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 5f84984..c438058 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -588,6 +588,30 @@ _copyTidScan(const TidScan *from)
 	return newnode;
 }
 
+static TidRangeScan *
+_copyTidRangeScan(const TidRangeScan *from)
+{
+	TidRangeScan    *newnode = makeNode(TidRangeScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_NODE_FIELD(tidquals);
+	COPY_NODE_FIELD(lower_bound);
+	COPY_NODE_FIELD(upper_bound);
+	COPY_SCALAR_FIELD(lower_strict);
+	COPY_SCALAR_FIELD(upper_strict);
+	COPY_SCALAR_FIELD(direction);
+
+	return newnode;
+}
+
+
 /*
  * _copySubqueryScan
  */
@@ -4842,6 +4866,9 @@ copyObjectImpl(const void *from)
 		case T_TidScan:
 			retval = _copyTidScan(from);
 			break;
+		case T_TidRangeScan:
+			retval = _copyTidRangeScan(from);
+			break;
 		case T_SubqueryScan:
 			retval = _copySubqueryScan(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 870dd2e..2cab724 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -619,6 +619,21 @@ _outTidScan(StringInfo str, const TidScan *node)
 }
 
 static void
+_outTidRangeScan(StringInfo str, const TidRangeScan *node)
+{
+	WRITE_NODE_TYPE("TIDRANGESCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_NODE_FIELD(tidquals);
+	WRITE_NODE_FIELD(lower_bound);
+	WRITE_NODE_FIELD(upper_bound);
+	WRITE_BOOL_FIELD(lower_strict);
+	WRITE_BOOL_FIELD(upper_strict);
+	WRITE_ENUM_FIELD(direction, ScanDirection);
+}
+
+static void
 _outSubqueryScan(StringInfo str, const SubqueryScan *node)
 {
 	WRITE_NODE_TYPE("SUBQUERYSCAN");
@@ -1892,6 +1907,12 @@ _outTidPath(StringInfo str, const TidPath *node)
 	_outPathInfo(str, (const Path *) node);
 
 	WRITE_NODE_FIELD(tidquals);
+	WRITE_ENUM_FIELD(method, TidPathMethod);
+	WRITE_NODE_FIELD(lower_bound);
+	WRITE_NODE_FIELD(upper_bound);
+	WRITE_BOOL_FIELD(lower_strict);
+	WRITE_BOOL_FIELD(upper_strict);
+	WRITE_ENUM_FIELD(direction, ScanDirection);
 }
 
 static void
@@ -3763,6 +3784,9 @@ outNode(StringInfo str, const void *obj)
 			case T_TidScan:
 				_outTidScan(str, obj);
 				break;
+			case T_TidRangeScan:
+				_outTidRangeScan(str, obj);
+				break;
 			case T_SubqueryScan:
 				_outSubqueryScan(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 2317f58..76c65a0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1834,6 +1834,26 @@ _readTidScan(void)
 }
 
 /*
+ * _readTidRangeScan
+ */
+static TidRangeScan *
+_readTidRangeScan(void)
+{
+	READ_LOCALS(TidRangeScan);
+
+	ReadCommonScan(&local_node->scan);
+
+	READ_NODE_FIELD(tidquals);
+	READ_NODE_FIELD(lower_bound);
+	READ_NODE_FIELD(upper_bound);
+	READ_BOOL_FIELD(lower_strict);
+	READ_BOOL_FIELD(upper_strict);
+	READ_ENUM_FIELD(direction, ScanDirection);
+
+	READ_DONE();
+}
+
+/*
  * _readSubqueryScan
  */
 static SubqueryScan *
@@ -2684,6 +2704,8 @@ parseNodeString(void)
 		return_value = _readBitmapHeapScan();
 	else if (MATCH("TIDSCAN", 7))
 		return_value = _readTidScan();
+	else if (MATCH("TIDRANGESCAN", 12))
+		return_value = _readTidRangeScan();
 	else if (MATCH("SUBQUERYSCAN", 12))
 		return_value = _readSubqueryScan();
 	else if (MATCH("FUNCTIONSCAN", 12))
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7bf67a0..4a26187 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -76,6 +76,7 @@
 #include "access/amapi.h"
 #include "access/htup_details.h"
 #include "access/tsmapi.h"
+#include "catalog/pg_operator.h"
 #include "executor/executor.h"
 #include "executor/nodeHash.h"
 #include "miscadmin.h"
@@ -93,6 +94,7 @@
 #include "utils/selfuncs.h"
 #include "utils/spccache.h"
 #include "utils/tuplesort.h"
+#include "nodes/print.h"
 
 
 #define LOG2(x)  (log(x) / 0.693147180559945)
@@ -1166,6 +1168,54 @@ cost_bitmap_or_node(BitmapOrPath *path, PlannerInfo *root)
 	path->path.total_cost = totalCost;
 }
 
+static void
+estimate_tidscan_tuples_and_pages(RelOptInfo *baserel, List *tidquals, TidPathMethod method, Expr *lower_bound, Expr *upper_bound,
+								  int *ntuples_out, int *npages_out, int *nrandom_pages_out, bool *isCurrentOf) {
+	ListCell   *l;
+	int ntuples = 0;
+	int npages = 0;
+	int nrandom_pages = 0;
+
+	if (method == TID_PATH_LIST)
+	{
+		foreach(l, tidquals)
+		{
+			if (IsA(lfirst(l), ScalarArrayOpExpr))
+			{
+				/* Each element of the array yields 1 tuple */
+				ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) lfirst(l);
+				Node	   *arraynode = (Node *) lsecond(saop->args);
+
+				ntuples += estimate_array_length(arraynode);
+				nrandom_pages++;
+			}
+			else if (IsA(lfirst(l), CurrentOfExpr))
+			{
+				/* CURRENT OF yields 1 tuple */
+				*isCurrentOf = true;
+				ntuples++;
+				nrandom_pages++;
+			}
+			else
+			{
+				/* It's just CTID = something, count 1 tuple */
+				ntuples++;
+				nrandom_pages++;
+			}
+		}
+	}
+	else
+	{
+		double selectivity = tid_range_selectivity(baserel, lower_bound, upper_bound);
+		ntuples += selectivity * baserel->tuples;
+		npages += selectivity * baserel->pages;
+	}
+
+	*ntuples_out = ntuples;
+	*npages_out = npages;
+	*nrandom_pages_out = nrandom_pages;
+}
+
 /*
  * cost_tidscan
  *	  Determines and returns the cost of scanning a relation using TIDs.
@@ -1176,7 +1226,8 @@ cost_bitmap_or_node(BitmapOrPath *path, PlannerInfo *root)
  */
 void
 cost_tidscan(Path *path, PlannerInfo *root,
-			 RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info)
+			 RelOptInfo *baserel, List *tidquals, TidPathMethod method, Expr *lower_bound, Expr *upper_bound,
+			 ParamPathInfo *param_info)
 {
 	Cost		startup_cost = 0;
 	Cost		run_cost = 0;
@@ -1185,8 +1236,10 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	Cost		cpu_per_tuple;
 	QualCost	tid_qual_cost;
 	int			ntuples;
-	ListCell   *l;
+	int			npages;
+	int			nrandom_pages;
 	double		spc_random_page_cost;
+	double		spc_seq_page_cost;
 
 	/* Should only be applied to base relations */
 	Assert(baserel->relid > 0);
@@ -1199,29 +1252,8 @@ cost_tidscan(Path *path, PlannerInfo *root,
 		path->rows = baserel->rows;
 
 	/* Count how many tuples we expect to retrieve */
-	ntuples = 0;
-	foreach(l, tidquals)
-	{
-		if (IsA(lfirst(l), ScalarArrayOpExpr))
-		{
-			/* Each element of the array yields 1 tuple */
-			ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) lfirst(l);
-			Node	   *arraynode = (Node *) lsecond(saop->args);
-
-			ntuples += estimate_array_length(arraynode);
-		}
-		else if (IsA(lfirst(l), CurrentOfExpr))
-		{
-			/* CURRENT OF yields 1 tuple */
-			isCurrentOf = true;
-			ntuples++;
-		}
-		else
-		{
-			/* It's just CTID = something, count 1 tuple */
-			ntuples++;
-		}
-	}
+	estimate_tidscan_tuples_and_pages(baserel, tidquals, method, lower_bound, upper_bound,
+									  &ntuples, &npages, &nrandom_pages, &isCurrentOf);
 
 	/*
 	 * We must force TID scan for WHERE CURRENT OF, because only nodeTidscan.c
@@ -1248,10 +1280,11 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	/* fetch estimated page cost for tablespace containing table */
 	get_tablespace_page_costs(baserel->reltablespace,
 							  &spc_random_page_cost,
-							  NULL);
+							  &spc_seq_page_cost);
 
-	/* disk costs --- assume each tuple on a different page */
-	run_cost += spc_random_page_cost * ntuples;
+	/* disk costs */
+	run_cost += spc_random_page_cost * nrandom_pages;
+	run_cost += spc_seq_page_cost * npages;
 
 	/* Add scanning CPU costs */
 	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 7a40700..28f455b 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -44,34 +44,43 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/restrictinfo.h"
+#include "nodes/makefuncs.h"
 
 
-static bool IsTidEqualClause(OpExpr *node, int varno);
 static bool IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno);
 static List *TidQualFromExpr(Node *expr, int varno);
-static List *TidQualFromBaseRestrictinfo(RelOptInfo *rel);
+static List *TidQualFromBaseRestrictinfo(RelOptInfo *rel, TidPathMethod *method, Expr **lower_bound, Expr **upper_bound, bool *lower_strict, bool *upper_strict);
 
 
+static bool IsTidVar(Var *var, int varno)
+{
+	return (var->varattno == SelfItemPointerAttributeNumber &&
+			var->vartype == TIDOID &&
+			var->varno == varno &&
+			var->varlevelsup == 0);
+}
+
 /*
  * Check to see if an opclause is of the form
- *		CTID = pseudoconstant
+ *		CTID OP pseudoconstant
  * or
- *		pseudoconstant = CTID
+ *		pseudoconstant OP CTID
+ * where OP is the expected comparison operator.
  *
  * We check that the CTID Var belongs to relation "varno".  That is probably
  * redundant considering this is only applied to restriction clauses, but
  * let's be safe.
  */
 static bool
-IsTidEqualClause(OpExpr *node, int varno)
+IsTidComparison(OpExpr *node, int varno, Oid expected_comparison_operator)
 {
 	Node	   *arg1,
 			   *arg2,
 			   *other;
 	Var		   *var;
 
-	/* Operator must be tideq */
-	if (node->opno != TIDEqualOperator)
+	/* Operator must be the expected one */
+	if (node->opno != expected_comparison_operator)
 		return false;
 	if (list_length(node->args) != 2)
 		return false;
@@ -110,6 +119,14 @@ IsTidEqualClause(OpExpr *node, int varno)
 	return true;				/* success */
 }
 
+
+#define IsTidEqualClause(node, varno)	IsTidComparison(node, varno, TIDEqualOperator)
+#define IsTidLTClause(node, varno)	IsTidComparison(node, varno, TIDLessOperator)
+#define IsTidLEClause(node, varno)	IsTidComparison(node, varno, TIDLessEqOperator)
+#define IsTidGTClause(node, varno)	IsTidComparison(node, varno, TIDGreaterOperator)
+#define IsTidGEClause(node, varno)	IsTidComparison(node, varno, TIDGreaterEqOperator)
+
+
 /*
  * Check to see if a clause is of the form
  *		CTID = ANY (pseudoconstant_array)
@@ -216,14 +233,60 @@ TidQualFromExpr(Node *expr, int varno)
 	return rlst;
 }
 
+static Node *
+TidRangeQualFromExpr(Node *expr, int varno, bool want_lower_bound, Expr **bound, bool *strict)
+{
+	if (is_opclause(expr))
+	{
+		if (IsTidLTClause((OpExpr *) expr, varno) || IsTidLEClause((OpExpr *) expr, varno) ||
+			(IsTidGTClause((OpExpr *) expr, varno) || IsTidGEClause((OpExpr *) expr, varno)))
+		{
+			bool is_lower_bound = IsTidGTClause((OpExpr *) expr, varno) || IsTidGEClause((OpExpr *) expr, varno);
+
+			Node *rightop = get_rightop((Expr *) expr);
+			Node *leftop = get_leftop((Expr *) expr);
+			Node *value = rightop;
+
+			if (!IsA(leftop, Var) || !IsTidVar((Var *) leftop, varno))
+			{
+				is_lower_bound = !is_lower_bound;
+				value = leftop;
+			}
+
+			if (is_lower_bound == want_lower_bound)
+			{
+				*strict = IsTidGTClause((OpExpr *) expr, varno) || IsTidLTClause((OpExpr *) expr, varno);
+				*bound = (Expr *) value;
+				return expr;
+			}
+		}
+	}
+
+	return NULL;
+}
+
+static List *
+MakeTidRangeQuals(Node *lower_bound_expr, Node *upper_bound_expr)
+{
+	if (lower_bound_expr && !upper_bound_expr)
+		return list_make1(lower_bound_expr);
+	else if (!lower_bound_expr && upper_bound_expr)
+		return list_make1(upper_bound_expr);
+	else
+		return list_make2(lower_bound_expr, upper_bound_expr);
+}
+
 /*
  *	Extract a set of CTID conditions from the rel's baserestrictinfo list
  */
 static List *
-TidQualFromBaseRestrictinfo(RelOptInfo *rel)
+TidQualFromBaseRestrictinfo(RelOptInfo *rel, TidPathMethod *method,
+							Expr **lower_bound, Expr **upper_bound, bool *lower_strict, bool *upper_strict)
 {
 	List	   *rlst = NIL;
 	ListCell   *l;
+	Node *lower_bound_expr = NULL;
+	Node *upper_bound_expr = NULL;
 
 	foreach(l, rel->baserestrictinfo)
 	{
@@ -236,13 +299,37 @@ TidQualFromBaseRestrictinfo(RelOptInfo *rel)
 		if (!restriction_is_securely_promotable(rinfo, rel))
 			continue;
 
+		/*
+		 * Check if this clause contains a range qual
+		 */
+		if (!lower_bound_expr)
+			lower_bound_expr = TidRangeQualFromExpr((Node *) rinfo->clause, rel->relid, true, lower_bound, lower_strict);
+
+		if (!upper_bound_expr)
+			upper_bound_expr = TidRangeQualFromExpr((Node *) rinfo->clause, rel->relid, false, upper_bound, upper_strict);
+
 		rlst = TidQualFromExpr((Node *) rinfo->clause, rel->relid);
 		if (rlst)
 			break;
 	}
+
+	/*
+	 * If one or both range quals was specified, and there were no equality/in/current-of quals, use them.
+	 */
+	if (!rlst && (lower_bound_expr || upper_bound_expr))
+	{
+		rlst = MakeTidRangeQuals(lower_bound_expr, upper_bound_expr);
+		*method = TID_PATH_RANGE;
+	}
+	else if (rlst)
+	{
+		*method = TID_PATH_LIST;
+	}
+
 	return rlst;
 }
 
+
 /*
  * create_tidscan_paths
  *	  Create paths corresponding to direct TID scans of the given rel.
@@ -254,8 +341,15 @@ TidQualFromBaseRestrictinfo(RelOptInfo *rel)
 void
 create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 {
-	Relids		required_outer;
-	List	   *tidquals;
+	Relids		   required_outer;
+	List		  *tidquals;
+	TidPathMethod  method = TID_PATH_RANGE;
+	Expr		  *lower_bound = NULL;
+	Expr		  *upper_bound = NULL;
+	bool		   lower_strict = false;
+	bool		   upper_strict = false;
+	List			*pathkeys = NULL;
+	ScanDirection	 direction = ForwardScanDirection;
 
 	/*
 	 * We don't support pushing join clauses into the quals of a tidscan, but
@@ -264,33 +358,42 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 	 */
 	required_outer = rel->lateral_relids;
 
-	tidquals = TidQualFromBaseRestrictinfo(rel);
+	tidquals = TidQualFromBaseRestrictinfo(rel, &method, &lower_bound, &upper_bound, &lower_strict, &upper_strict);
 
-	if (tidquals)
+	/*
+	 * Try to determine the best scan direction and create some useful pathkeys.
+	 */
+	if (has_useful_pathkeys(root, rel))
 	{
-		List			*pathkeys = NULL;
-		ScanDirection	 direction = ForwardScanDirection;
-
-		if (has_useful_pathkeys(root, rel)) {
-			/*
-			 * Build path keys corresponding to ORDER BY ctid ASC, and check
-			 * whether they will be useful for this scan.  If not, build
-			 * path keys for DESC, and try that; set the direction to
-			 * BackwardScanDirection if so.  If neither of them will be
-			 * useful, no path keys will be set.
-			 */
-			pathkeys = build_tidscan_pathkeys(root, rel, ForwardScanDirection);
-			if (!pathkeys_contained_in(pathkeys, root->query_pathkeys))
-			{
-				pathkeys = build_tidscan_pathkeys(root, rel, BackwardScanDirection);
-				if (pathkeys_contained_in(pathkeys, root->query_pathkeys))
-					direction = BackwardScanDirection;
-				else
-					pathkeys = NULL;
-			}
+		/*
+		 * Build path keys corresponding to ORDER BY ctid ASC, and check
+		 * whether they will be useful for this scan.  If not, build
+		 * path keys for DESC, and try that; set the direction to
+		 * BackwardScanDirection if so.  If neither of them will be
+		 * useful, no path keys will be set.
+		 */
+		pathkeys = build_tidscan_pathkeys(root, rel, ForwardScanDirection);
+		if (!pathkeys_contained_in(pathkeys, root->query_pathkeys))
+		{
+			pathkeys = build_tidscan_pathkeys(root, rel, BackwardScanDirection);
+			if (pathkeys_contained_in(pathkeys, root->query_pathkeys))
+				direction = BackwardScanDirection;
+			else
+				pathkeys = NULL;
 		}
+	}
+
+	/*
+	 * If there are tidquals or some useful pathkeys were found, then it's
+	 * worth generating a tidscan path.
+	 */
+	if (tidquals || pathkeys)
+	{
+		/* If we don't have any tidquals, then we MUST create a tid range scan path. */
+		Assert(tidquals || method == TID_PATH_RANGE);
 
-		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals, pathkeys, direction,
-												   required_outer));
+		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals,
+												   method, lower_bound, upper_bound, lower_strict, upper_strict,
+												   required_outer, direction, pathkeys));
 	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 4e1faa6..e479a0a 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -125,8 +125,8 @@ static Plan *create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
 					  List **qual, List **indexqual, List **indexECs);
 static void bitmap_subplan_mark_shared(Plan *plan);
 static List *flatten_partitioned_rels(List *partitioned_rels);
-static TidScan *create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
-					List *tlist, List *scan_clauses);
+static Scan *create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
+						 List *tlist, List *scan_clauses);
 static SubqueryScan *create_subqueryscan_plan(PlannerInfo *root,
 						 SubqueryScanPath *best_path,
 						 List *tlist, List *scan_clauses);
@@ -186,6 +186,8 @@ static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 					 Index scanrelid);
 static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
 			 List *tidquals, ScanDirection direction);
+static TidRangeScan *make_tidrangescan(List *qptlist, List *qpqual, Index scanrelid,
+			 List *tidquals, Expr *lower_bound, Expr *upper_bound, bool lower_strict, bool upper_strict, ScanDirection direction);
 static SubqueryScan *make_subqueryscan(List *qptlist,
 				  List *qpqual,
 				  Index scanrelid,
@@ -371,6 +373,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -648,6 +651,7 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 			break;
 
 		case T_TidScan:
+		case T_TidRangeScan:
 			plan = (Plan *) create_tidscan_plan(root,
 												(TidPath *) best_path,
 												tlist,
@@ -3057,11 +3061,11 @@ create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
  *	 Returns a tidscan plan for the base relation scanned by 'best_path'
  *	 with restriction clauses 'scan_clauses' and targetlist 'tlist'.
  */
-static TidScan *
+static Scan *
 create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 					List *tlist, List *scan_clauses)
 {
-	TidScan    *scan_plan;
+	Scan	   *scan_plan;
 	Index		scan_relid = best_path->path.parent->relid;
 	List	   *tidquals = best_path->tidquals;
 	List	   *ortidquals;
@@ -3090,18 +3094,34 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	 * tidquals list has implicit OR semantics.
 	 */
 	ortidquals = tidquals;
-	if (list_length(ortidquals) > 1)
+	if (best_path->method == TID_PATH_LIST && list_length(ortidquals) > 1)
 		ortidquals = list_make1(make_orclause(ortidquals));
 	scan_clauses = list_difference(scan_clauses, ortidquals);
 
-	scan_plan = make_tidscan(tlist,
-							 scan_clauses,
-							 scan_relid,
-							 tidquals,
-							 best_path->direction
-							);
+	if (best_path->method == TID_PATH_LIST)
+	{
+        scan_plan = make_tidscan(tlist,
+                                 scan_clauses,
+                                 scan_relid,
+                                 tidquals,
+                                 best_path->direction
+        );
+	}
+	else
+	{
+		scan_plan = (Scan *) make_tidrangescan(tlist,
+											   scan_clauses,
+											   scan_relid,
+											   tidquals,
+											   best_path->lower_bound,
+											   best_path->upper_bound,
+											   best_path->lower_strict,
+											   best_path->upper_strict,
+											   best_path->direction
+  											);
+	}
 
-	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
+	copy_generic_path_info(&scan_plan->plan, &best_path->path);
 
 	return scan_plan;
 }
@@ -5198,6 +5218,36 @@ make_tidscan(List *qptlist,
 	return node;
 }
 
+static TidRangeScan *
+make_tidrangescan(List *qptlist,
+				  List *qpqual,
+				  Index scanrelid,
+				  List *tidquals,
+				  Expr *lower_bound,
+				  Expr *upper_bound,
+				  bool lower_strict,
+				  bool upper_strict,
+				  ScanDirection direction
+				 )
+{
+	TidRangeScan    *node = makeNode(TidRangeScan);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->tidquals = tidquals;
+	node->lower_bound = lower_bound;
+	node->upper_bound = upper_bound;
+	node->lower_strict = lower_strict;
+	node->upper_strict = upper_strict;
+	node->direction = direction;
+
+	return node;
+}
+
 static SubqueryScan *
 make_subqueryscan(List *qptlist,
 				  List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 69dd327..854d2d6 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -541,6 +541,21 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 					fix_scan_list(root, splan->tidquals, rtoffset);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				TidRangeScan    *splan = (TidRangeScan *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual, rtoffset);
+				splan->lower_bound =
+					(OpExpr *) fix_scan_expr(root, (Node *) splan->lower_bound, rtoffset);
+				splan->upper_bound =
+					(OpExpr *) fix_scan_expr(root, (Node *) splan->upper_bound, rtoffset);
+			}
+			break;
 		case T_SubqueryScan:
 			/* Needs special treatment, see comments below */
 			return set_subqueryscan_references(root,
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 83008d7..2b917694 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2410,6 +2410,14 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_TidRangeScan:
+			finalize_primnode((Node *) ((TidRangeScan *) plan)->lower_bound,
+							  &context);
+			finalize_primnode((Node *) ((TidRangeScan *) plan)->upper_bound,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			break;
+
 		case T_SubqueryScan:
 			{
 				SubqueryScan *sscan = (SubqueryScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e2d51a9..fb8b81f 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1186,12 +1186,12 @@ create_bitmap_or_path(PlannerInfo *root,
  */
 TidPath *
 create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
-					List *pathkeys, ScanDirection direction,
-					Relids required_outer)
+					TidPathMethod method, Expr *lower_bound, Expr *upper_bound, bool lower_strict, bool upper_strict,
+					Relids required_outer, ScanDirection direction, List *pathkeys)
 {
 	TidPath    *pathnode = makeNode(TidPath);
 
-	pathnode->path.pathtype = T_TidScan;
+	pathnode->path.pathtype = (method == TID_PATH_LIST) ? T_TidScan : T_TidRangeScan;
 	pathnode->path.parent = rel;
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
@@ -1202,9 +1202,14 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 	pathnode->path.pathkeys = pathkeys;
 
 	pathnode->tidquals = tidquals;
+	pathnode->method = method;
+	pathnode->lower_bound = lower_bound;
+	pathnode->upper_bound = upper_bound;
+	pathnode->lower_strict = lower_strict;
+	pathnode->upper_strict = upper_strict;
 	pathnode->direction = direction;
 
-	cost_tidscan(&pathnode->path, root, rel, tidquals,
+	cost_tidscan(&pathnode->path, root, rel, tidquals, method, lower_bound, upper_bound,
 				 pathnode->path.param_info);
 
 	return pathnode;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index f1c78ff..8c87d28 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -8219,3 +8219,27 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 	*indexPages = index->pages;
 }
+
+static BlockNumber
+get_block_number_from_tid_qual(Expr *qual, BlockNumber default_when_missing)
+{
+	if (qual && IsA(qual, Const)) {
+		Const *con = (Const *) qual;
+		ItemPointer itemptr = (ItemPointer) DatumGetPointer(con->constvalue);
+		return ItemPointerGetBlockNumberNoCheck(itemptr);
+	}
+	else
+	{
+		return default_when_missing;
+	}
+}
+
+double
+tid_range_selectivity(RelOptInfo *rel, Expr *lower_qual, Expr *upper_qual)
+{
+	BlockNumber lower_block = get_block_number_from_tid_qual(lower_qual, 0);
+	BlockNumber upper_block = get_block_number_from_tid_qual(upper_qual, rel->pages);
+
+	double selectivity = (upper_block - lower_block) / ((double) rel->pages + 1);
+	return Max(0.0, Min(1.0, selectivity));
+}
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 31e7d61..cdd2cd3 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -160,11 +160,11 @@
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git a/src/include/executor/nodeTidrangescan.h b/src/include/executor/nodeTidrangescan.h
new file mode 100644
index 0000000..d5ad2e1
--- /dev/null
+++ b/src/include/executor/nodeTidrangescan.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeTidscan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODETIDRANGESCAN_H
+#define NODETIDRANGESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern TidRangeScanState *ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags);
+extern void ExecEndTidRangeScan(TidRangeScanState *node);
+extern void ExecReScanTidRangeScan(TidRangeScanState *node);
+
+#endif							/* NODETIDRANGESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 018f50b..a998bfb 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1490,6 +1490,18 @@ typedef struct TidScanState
 	HeapTupleData tss_htup;
 } TidScanState;
 
+typedef struct TidRangeScanState
+{
+	ScanState		 ss;				/* its first field is NodeTag */
+	ExprState		*lower_expr;
+	ExprState		*upper_expr;
+	BlockNumber		 first_block;
+	OffsetNumber	 first_tuple;
+	BlockNumber		 last_block;
+	OffsetNumber	 last_tuple;
+	BlockNumber		 blocks_to_scan;
+} TidRangeScanState;
+
 /* ----------------
  *	 SubqueryScanState information
  *
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 697d3d7..e983fb8 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -59,6 +59,7 @@ typedef enum NodeTag
 	T_BitmapIndexScan,
 	T_BitmapHeapScan,
 	T_TidScan,
+	T_TidRangeScan,
 	T_SubqueryScan,
 	T_FunctionScan,
 	T_ValuesScan,
@@ -115,6 +116,7 @@ typedef enum NodeTag
 	T_BitmapIndexScanState,
 	T_BitmapHeapScanState,
 	T_TidScanState,
+	T_TidRangeScanState,
 	T_SubqueryScanState,
 	T_FunctionScanState,
 	T_TableFuncScanState,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 96d30aa..a0165e4 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -496,6 +496,21 @@ typedef struct TidScan
 } TidScan;
 
 /* ----------------
+ *		tid range scan node
+ * ----------------
+ */
+typedef struct TidRangeScan
+{
+	Scan		scan;
+	List	   *tidquals;
+	Expr		*lower_bound;
+	Expr		*upper_bound;
+	bool		lower_strict;
+	bool		upper_strict;
+	ScanDirection direction;
+} TidRangeScan;
+
+/* ----------------
  *		subquery scan node
  *
  * SubqueryScan is for scanning the output of a sub-query in the range table.
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index cf4839d..f348c38 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1229,10 +1229,21 @@ typedef struct BitmapOrPath
  * "CTID = pseudoconstant" or "CTID = ANY(pseudoconstant_array)".
  * Note they are bare expressions, not RestrictInfos.
  */
+typedef enum
+{
+	TID_PATH_LIST,			/* tidquals is a list of CTID = ?, CTID IN (?), with OR-semantics */
+	TID_PATH_RANGE			/* tidquals is a list of CTID > ?, CTID < ?, with AND-semantics */
+} TidPathMethod;
+
 typedef struct TidPath
 {
-	Path		path;
-	List	   *tidquals;		/* qual(s) involving CTID = something */
+	Path		  path;
+	List		 *tidquals;
+	TidPathMethod method;
+	Expr		 *lower_bound;
+	Expr		 *upper_bound;
+	bool		  lower_strict;
+	bool		  upper_strict;
 	ScanDirection direction;
 } TidPath;
 
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 77ca7ff..d8f7825 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -90,7 +90,8 @@ extern void cost_bitmap_and_node(BitmapAndPath *path, PlannerInfo *root);
 extern void cost_bitmap_or_node(BitmapOrPath *path, PlannerInfo *root);
 extern void cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec);
 extern void cost_tidscan(Path *path, PlannerInfo *root,
-			 RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info);
+			 RelOptInfo *baserel, List *tidquals, TidPathMethod method, Expr *lower_bound, Expr *upper_bound,
+			 ParamPathInfo *param_info);
 extern void cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
 				  RelOptInfo *baserel, ParamPathInfo *param_info);
 extern void cost_functionscan(Path *path, PlannerInfo *root,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index a0a88a5..32867e7 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,8 +63,9 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  RelOptInfo *rel,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
-					List *tidquals, List *pathkeys, ScanDirection direction,
-					Relids required_outer);
+					List *tidquals, TidPathMethod method,
+					Expr *lower_bound, Expr *upper_bound, bool lower_strict, bool upper_strict,
+					Relids required_outer, ScanDirection direction, List *pathkeys);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 				   List *subpaths, List *partial_subpaths,
 				   Relids required_outer,
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 95e4428..cf5c090 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -227,4 +227,7 @@ extern Selectivity scalararraysel_containment(PlannerInfo *root,
 						   Oid elemtype, bool isEquality, bool useOr,
 						   int varRelid);
 
+
+extern double tid_range_selectivity(RelOptInfo *rel, Expr *lower_qual, Expr *upper_qual);
+
 #endif							/* SELFUNCS_H */
diff --git a/src/test/regress/expected/tidrangescan.out b/src/test/regress/expected/tidrangescan.out
new file mode 100644
index 0000000..573e769
--- /dev/null
+++ b/src/test/regress/expected/tidrangescan.out
@@ -0,0 +1,229 @@
+-- tests for tidrangescans
+CREATE TABLE tidrangescan(id integer, data text);
+INSERT INTO tidrangescan SELECT i,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(1, 0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(1, 0)';
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (0,1)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,2)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,3)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,4)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,5)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,6)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,7)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,8)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid <= '(1, 5)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid <= '(1,5)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid <= '(1, 5)';
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (0,1)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,2)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,3)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,4)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,5)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,6)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,7)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,8)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,1)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,2)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,3)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,4)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,5)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(15 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(0, 0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(0,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(0, 0)';
+ ctid | data 
+------+------
+(0 rows)
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(9, 8)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid > '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(9, 8)';
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (9,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (9,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(9, 8)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(9, 8)';
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (9,8)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (9,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (9,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(100, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(100,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(100, 0)';
+ ctid | data 
+------+------
+(0 rows)
+
+-- ordering with no quals should use tid range scan
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan ORDER BY ctid ASC;
+           QUERY PLAN           
+--------------------------------
+ Tid Range Scan on tidrangescan
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan ORDER BY ctid DESC;
+               QUERY PLAN                
+-----------------------------------------
+ Tid Range Scan Backward on tidrangescan
+(1 row)
+
+-- min/max
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan;
+                  QUERY PLAN                  
+----------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Range Scan on tidrangescan
+                 Filter: (ctid IS NOT NULL)
+(5 rows)
+
+SELECT MIN(ctid) FROM tidrangescan;
+  min  
+-------
+ (0,1)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan;
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Range Scan Backward on tidrangescan
+                 Filter: (ctid IS NOT NULL)
+(5 rows)
+
+SELECT MAX(ctid) FROM tidrangescan;
+  max   
+--------
+ (9,10)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+                   QUERY PLAN                    
+-------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Range Scan on tidrangescan
+                 TID Cond: (ctid > '(5,0)'::tid)
+                 Filter: (ctid IS NOT NULL)
+(6 rows)
+
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+  min  
+-------
+ (5,1)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Range Scan Backward on tidrangescan
+                 TID Cond: (ctid < '(5,0)'::tid)
+                 Filter: (ctid IS NOT NULL)
+(6 rows)
+
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+  max   
+--------
+ (4,10)
+(1 row)
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+ ctid | data 
+------+------
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid > '(9,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+ ctid | data 
+------+------
+(0 rows)
+
diff --git a/src/test/regress/expected/tidscan.out b/src/test/regress/expected/tidscan.out
index 7eebe77..e0ec664 100644
--- a/src/test/regress/expected/tidscan.out
+++ b/src/test/regress/expected/tidscan.out
@@ -116,6 +116,25 @@ FETCH FIRST FROM c;
 (1 row)
 
 ROLLBACK;
+-- make sure that tid scan is chosen rather than tid range scan, if there are equality/in quals as well as range quals
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = '(0,1)' AND ctid < '(3,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidscan
+   TID Cond: (ctid = '(0,1)'::tid)
+   Filter: (ctid < '(3,0)'::tid)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,1)', '(0,2)']::tid[]) AND ctid < '(3,0)';
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Tid Scan on tidscan
+   TID Cond: (ctid = ANY ('{"(0,1)","(0,2)"}'::tid[]))
+   Filter: (ctid < '(3,0)'::tid)
+(3 rows)
+
 -- check that ordering on a tidscan doesn't require a sort
 EXPLAIN (COSTS OFF)
 SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 16f979c..517a469 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -89,7 +89,7 @@ test: brin gin gist spgist privileges init_privs security_label collate matview
 # ----------
 # Another group of parallel tests
 # ----------
-test: alter_generic alter_operator misc psql async dbsize misc_functions sysviews tsrf tidscan stats_ext
+test: alter_generic alter_operator misc psql async dbsize misc_functions sysviews tsrf tidscan tidrangescan stats_ext
 
 # rules cannot run concurrently with any test that creates a view
 test: rules psql_crosstab amutils
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 42632be..c97f1c6 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -134,6 +134,7 @@ test: misc_functions
 test: sysviews
 test: tsrf
 test: tidscan
+test: tidrangescan
 test: stats_ext
 test: rules
 test: psql_crosstab
diff --git a/src/test/regress/sql/tidrangescan.sql b/src/test/regress/sql/tidrangescan.sql
new file mode 100644
index 0000000..de132b8
--- /dev/null
+++ b/src/test/regress/sql/tidrangescan.sql
@@ -0,0 +1,68 @@
+-- tests for tidrangescans
+
+CREATE TABLE tidrangescan(id integer, data text);
+
+INSERT INTO tidrangescan SELECT i,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(1, 0)';
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(1, 0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid <= '(1, 5)';
+SELECT ctid, data FROM tidrangescan WHERE ctid <= '(1, 5)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(0, 0)';
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(0, 0)';
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(9, 8)';
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(9, 8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(9, 8)';
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(9, 8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(100, 0)';
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(100, 0)';
+
+-- ordering with no quals should use tid range scan
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan ORDER BY ctid ASC;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan ORDER BY ctid DESC;
+
+-- min/max
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan;
+SELECT MIN(ctid) FROM tidrangescan;
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan;
+SELECT MAX(ctid) FROM tidrangescan;
+
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';
diff --git a/src/test/regress/sql/tidscan.sql b/src/test/regress/sql/tidscan.sql
index 5237f06..05fef3c 100644
--- a/src/test/regress/sql/tidscan.sql
+++ b/src/test/regress/sql/tidscan.sql
@@ -43,6 +43,12 @@ FETCH BACKWARD 1 FROM c;
 FETCH FIRST FROM c;
 ROLLBACK;
 
+-- make sure that tid scan is chosen rather than tid range scan, if there are equality/in quals as well as range quals
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = '(0,1)' AND ctid < '(3,0)';
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,1)', '(0,2)']::tid[]) AND ctid < '(3,0)';
+
 -- check that ordering on a tidscan doesn't require a sort
 EXPLAIN (COSTS OFF)
 SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;

David Rowley

david.rowley@2ndquadrant.com

over 7 years ago

In reply to: Edmund Horner (#1)

Re: Tid scan improvements

On 12 August 2018 at 14:29, Edmund Horner <ejrh00@gmail.com> wrote:

To scratch an itch, I have been working on teaching TidScan how to do
range queries, i.e. those using >=, <, BETWEEN, etc. This means we
can write, for instance,

SELECT * FROM t WHERE ctid >= '(1000,0)' AND ctid < '(2000,0)';

I think this will be useful to UPDATE records at the end of a bloated
table to move them into space that's been freed up by vacuum to allow
the table to be trimmed back to size again.

Since range scan execution is rather different from the existing
TidScan execution, I ended up making a new plan type, TidRangeScan.
There is still only one TidPath, but it has an additional member that
describes which method to use.

I always thought that this would be implemented by overloading
TidScan. I thought that TidListEval() could be modified to remove
duplicates accounting for range scans. For example:

SELECT * FROM t WHERE ctid BETWEEN '(0,1)' AND (0,10') OR ctid
IN('(0,5)','(0,30)');

would first sort all the tids along with their operator and then make
a pass over the sorted array to remove any equality ctids that are
redundant because they're covered in a range.

As part of the work I also taught TidScan that its results are ordered
by ctid, i.e. to set a pathkey on a TidPath. The benefit of this is
that queries such as

SELECT MAX(ctid) FROM t;
SELECT * FROM t WHERE ctid IN (...) ORDER BY ctid;

I think that can be done as I see you're passing allow_sync as false
in heap_beginscan_strat(), so the scan will start at the beginning of
the heap.

Attached are a couple of patches:
- 01_tid_scan_ordering.patch
- 02_tid_range_scan.patch, to be applied on top of 01.

Can I add this to the next CommitFest?

Please do.

As well as actual correctness, some aspects that I am particularly
unsure about include:

- Is it messy to use TidPath for both types of scan?

I wonder if there is a good reason to have a separate node type at
all? I've not looked, but if you've managed to overload the TidPath
struct without it getting out of control, then perhaps the same can be
done with the node type.

- What is the planning cost for plans that don't end up being a
TidScan or TidRangeScan?

I suppose that wouldn't matter if there was just 1 path for a single node type.

- Is there a less brittle way to create tables of a specific number
of blocks/tuples in the regression tests?

Perhaps you could just populate a table with some number of records
then DELETE the ones above ctid (x,100) on each page, where 100 is
whatever you can be certain will fit on a page on any platform. I'm
not quite sure if our regress test would pass with a very small block
size anyway, but probably worth verifying that before you write the
first test that will break it.

I'll try to look in a bit more detail during the commitfest.

It's perhaps a minor detail at this stage, but generally, we don't
have code lines over 80 chars in length. There are some exceptions,
e.g not breaking error message strings so that they're easily
greppable. src/tools/pgindent has a tool that you can run to fix the
whitespace so it's in line with project standard.

Thanks for working on this. It will great to see improvements made in this area.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Robert Haas

robertmhaas@gmail.com

over 7 years ago

In reply to: David Rowley (#2)

Re: Tid scan improvements

On Sun, Aug 12, 2018 at 8:07 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

Thanks for working on this. It will great to see improvements made in this area.

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Edmund Horner

ejrh00@gmail.com

over 7 years ago

In reply to: David Rowley (#2)

Re: Tid scan improvements

On 12 August 2018 at 20:07, David Rowley <david.rowley@2ndquadrant.com> wrote:

Since range scan execution is rather different from the existing
TidScan execution, I ended up making a new plan type, TidRangeScan.
There is still only one TidPath, but it has an additional member that
describes which method to use.

I always thought that this would be implemented by overloading
TidScan. I thought that TidListEval() could be modified to remove
duplicates accounting for range scans. For example:

SELECT * FROM t WHERE ctid BETWEEN '(0,1)' AND (0,10') OR ctid
IN('(0,5)','(0,30)');

would first sort all the tids along with their operator and then make
a pass over the sorted array to remove any equality ctids that are
redundant because they're covered in a range.

Initially, I figured that 99% of the time, the user either wants to
filter by a specific set of ctids (such as those returned by a
subquery), or wants to process a table in blocks. Picking up an
OR-set of ctid-conditions and determining which parts should be picked
up row-wise (as in the existing code) versus which parts are blocks
that should be scanned -- and ensuring that any overlaps were removed
-- seemed more complicated than it was worth.

Having thought about it, I think what you propose might be worth it;
at least it limits us to a single TidScan plan to maintain.

The existing code:
- Looks for a qual that's an OR-list of (ctid = ?) or (ctid IN (?))
- Costs it by assuming each matching tuple is a separate page.
- When beginning the scan, evaluates all the ?s and builds an array
of tids to fetch.
- Sorts and remove duplicates.
- Iterates over the array, fetching tuples.

So we'd extend that to:
- Include in the OR-list "range" subquals of the form (ctid > ? AND
ctid < ?) (either side could be optional, and we have to deal with >=
and <= and having ctid on the rhs, etc.).
- Cost the range subquals by assuming they don't overlap, and
estimating how many blocks and tuples they span.
- When beginning the scan, evaluate all the ?s and build an array of
"tid ranges" to fetch. A tid range is a struct with a starting tid,
and an ending tid, and might just be a single tid item.
- Sort and remove duplicates.
- Iterate over the array, using a single fetch for single-item tid
ranges, and starting/ending a heap scan for multi-item tid ranges.

I think I'll try implementing this.

As part of the work I also taught TidScan that its results are ordered
by ctid, i.e. to set a pathkey on a TidPath. The benefit of this is
that queries such as

SELECT MAX(ctid) FROM t;
SELECT * FROM t WHERE ctid IN (...) ORDER BY ctid;

I think that can be done as I see you're passing allow_sync as false
in heap_beginscan_strat(), so the scan will start at the beginning of
the heap.

I found that heap scan caters to parallel scans, synchronised scans,
and block range indexing; but it didn't quite work for my case of
specifying a subset of a table and scanning backward or forward over
it. Hence my changes. I'm not overly familiar with the heap scan
code though.

- Is there a less brittle way to create tables of a specific number
of blocks/tuples in the regression tests?

Perhaps you could just populate a table with some number of records
then DELETE the ones above ctid (x,100) on each page, where 100 is
whatever you can be certain will fit on a page on any platform. I'm
not quite sure if our regress test would pass with a very small block
size anyway, but probably worth verifying that before you write the
first test that will break it.

I don't think I've tested with extreme block sizes.

I'll try to look in a bit more detail during the commitfest.

It's perhaps a minor detail at this stage, but generally, we don't
have code lines over 80 chars in length. There are some exceptions,
e.g not breaking error message strings so that they're easily
greppable. src/tools/pgindent has a tool that you can run to fix the
whitespace so it's in line with project standard.

I'll try to get pgindent running before my next patch.

Thanks for the comments!

David Rowley

david.rowley@2ndquadrant.com

over 7 years ago

In reply to: Edmund Horner (#4)

Re: Tid scan improvements

On 15 August 2018 at 11:11, Edmund Horner <ejrh00@gmail.com> wrote:

So we'd extend that to:
- Include in the OR-list "range" subquals of the form (ctid > ? AND
ctid < ?) (either side could be optional, and we have to deal with >=
and <= and having ctid on the rhs, etc.).
- Cost the range subquals by assuming they don't overlap, and
estimating how many blocks and tuples they span.
- When beginning the scan, evaluate all the ?s and build an array of
"tid ranges" to fetch. A tid range is a struct with a starting tid,
and an ending tid, and might just be a single tid item.
- Sort and remove duplicates.
- Iterate over the array, using a single fetch for single-item tid
ranges, and starting/ending a heap scan for multi-item tid ranges.

I think I'll try implementing this.

I've set this patch as waiting on author in the commitfest app.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Edmund Horner

ejrh00@gmail.com

over 7 years ago

In reply to: David Rowley (#5)

Re: Tid scan improvements

On Mon, 17 Sep 2018 at 23:21, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 15 August 2018 at 11:11, Edmund Horner <ejrh00@gmail.com> wrote:

So we'd extend that to:
- Include in the OR-list "range" subquals of the form (ctid > ? AND
ctid < ?) (either side could be optional, and we have to deal with >=
and <= and having ctid on the rhs, etc.).
- Cost the range subquals by assuming they don't overlap, and
estimating how many blocks and tuples they span.
- When beginning the scan, evaluate all the ?s and build an array of
"tid ranges" to fetch. A tid range is a struct with a starting tid,
and an ending tid, and might just be a single tid item.
- Sort and remove duplicates.
- Iterate over the array, using a single fetch for single-item tid
ranges, and starting/ending a heap scan for multi-item tid ranges.

I think I'll try implementing this.

I've set this patch as waiting on author in the commitfest app.

Thanks David.

Between work I have found time here and there to work on it, but
making a path type that handles all the above turns out to be
surprisingly harder than my tid range scan.

In the earlier discussion from 2012, Tom Lane said:

Bruce Momjian <bruce(at)momjian(dot)us> writes:

On Wed, Jun 13, 2012 at 03:21:17PM -0500, Merlin Moncure wrote:

IMNSHO, it's a no-brainer for the todo (but I think it's more
complicated than adding some comparisons -- which are working now):

I see. Seems we have to add index smarts to those comparisons. That
might be complicated.

Uh, the whole point of a TID scan is to *not* need an index.

What would be needed is for tidpath.c to let through more kinds of TID
comparison quals than it does now, and then for nodeTidscan.c to know
what to do with them. The latter logic might well look something like
btree indexscan qual preparation, but it wouldn't be the same code.

I have been generally following this approach (handling more kinds of
TID comparisons), and have found myself doing things like pairing up >
with <, estimating how much of a table is covered by some set of >, <,
or "> AND <" quals, etc. Things that I'm sure are handled in an
advanced way by index paths; unfortunately I didn't see any easily
reusable code in the index path code. So I've ended up writing
special-case code for TID scans. Hopefully it will be worth it.

Edmund

David Rowley

david.rowley@2ndquadrant.com

over 7 years ago

In reply to: Edmund Horner (#6)

Re: Tid scan improvements

On 19 September 2018 at 18:04, Edmund Horner <ejrh00@gmail.com> wrote:

I have been generally following this approach (handling more kinds of
TID comparisons), and have found myself doing things like pairing up >
with <, estimating how much of a table is covered by some set of >, <,
or "> AND <" quals, etc. Things that I'm sure are handled in an
advanced way by index paths; unfortunately I didn't see any easily
reusable code in the index path code. So I've ended up writing
special-case code for TID scans. Hopefully it will be worth it.

I don't think it would need to be as complex as the index matching
code. Just looping over the quals and gathering up all compatible ctid
quals should be fine. I imagine the complex handling of sorting the
quals by ctid and removal of redundant quals that are covered by some
range would be done in the executor.

Probably the costing will get more complex. At the moment it seems we
add a random_page_cost per ctid, but you'd probably need to make that
better and loop over the quals in each implicitly ANDed set and find
the max ctid for the > / >= quals and the the min < / <= ctid, then
get the page number from each and assume max - min seq_page_cost, then
add random_page_cost for any remaining equality quals. The costs from
other OR branches can likely just be added on. This would double
count if someone did WHERE ctid BETWEEN '(0,0') AND '(100,300)' OR
ctid BETWEEN '(0,0') AND '(100,300)'; The current code seems to
double count now for duplicate ctids anyway. It even double counts if
the ctid being compared to is on the same page as another ctid, so I
don't think that would be unacceptable.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Edmund Horner

ejrh00@gmail.com

over 7 years ago

In reply to: David Rowley (#7)

1 attachment(s)

Re: Tid scan improvements

On Wed, 19 Sep 2018 at 18:56, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 19 September 2018 at 18:04, Edmund Horner <ejrh00@gmail.com> wrote:

I have been generally following this approach (handling more kinds of
TID comparisons), and have found myself doing things like pairing up >
with <, estimating how much of a table is covered by some set of >, <,
or "> AND <" quals, etc. Things that I'm sure are handled in an
advanced way by index paths; unfortunately I didn't see any easily
reusable code in the index path code. So I've ended up writing
special-case code for TID scans. Hopefully it will be worth it.

I don't think it would need to be as complex as the index matching
code. Just looping over the quals and gathering up all compatible ctid
quals should be fine. I imagine the complex handling of sorting the
quals by ctid and removal of redundant quals that are covered by some
range would be done in the executor.

I've got the path creation and execution pretty much working, though
with some inefficiencies:
- Each individual TID is treated as a range of size 1 (but CURRENT
OF is handled as a single fetch)
- Range scans have to scan whole blocks, and skip over the tuples
that are out of range.
But it's enough to get the tests passing.

Right now I'm looking at costing:

Probably the costing will get more complex. At the moment it seems we
add a random_page_cost per ctid, but you'd probably need to make that
better and loop over the quals in each implicitly ANDed set and find
the max ctid for the > / >= quals and the the min < / <= ctid, then
get the page number from each and assume max - min seq_page_cost, then
add random_page_cost for any remaining equality quals. The costs from
other OR branches can likely just be added on. This would double
count if someone did WHERE ctid BETWEEN '(0,0') AND '(100,300)' OR
ctid BETWEEN '(0,0') AND '(100,300)'; The current code seems to
double count now for duplicate ctids anyway. It even double counts if
the ctid being compared to is on the same page as another ctid, so I
don't think that would be unacceptable.

There are two stages of costing:
1. Estimating the number of rows that the relation will return. This
happens before path generation.
2. Estimating the cost of the path.

In the existing code, (1) goes through the normal clausesel.c
machinery, eventually getting to the restriction function defined in
pg_operator. For range quals, e.g. >, it looks for a stats entry for
the variable, but since it's a system variable with no stats, it
returns DEFAULT_INEQ_SEL (in function scalarineqsel). For equality
quals, it does have some special-case code (in function
get_variable_numdistinct) to use stadistinct=-1 for the CTID variable,
resulting in a selectivity estimate of 1/ntuples.

(2), on the other hand, has special-case code in costsize.c (function
cost_tidscan), which estimates each TID as being a separate tuple
fetch from a different page. (The existing code only has to support
=, IN, and CURRENT OF as quals for a TID path.)

In my work, I have been adding support for range quals to (2), which
includes estimating the selectivity of expressions like (CTID > a AND
CTID < b). I got tired of handling all the various ways of ordering
the quals, so I thought I would try re-using the clausesel.c
machinery. In selfuncs.c, I've added special case code for
scalarineqsel and nulltestsel to handle CTID variables. (This also
improves the row count estimates.)

I'm not 100% sure what the costs of each range should be. I think the
first block should incur random_page_cost, with subsequent blocks
being seq_page_cost. Simple "CTID = ?" quals are still estimated as 1
tuple + 1 random block.

Have a look at the attached WIP if you like and tell me if you think
it's going in the right direction. I'm sorry for the size of the
patch; I couldn't find a nice way to cut it up. I did run pgindent
over it though. :)

Cheers,
Edmund

Attachments:

tid_scan_improvements-v1.patchapplication/octet-stream; name=tid_scan_improvements-v1.patchDownload

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3395445..b5c48f0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -575,11 +575,18 @@ heapgettup(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_syncscan = false;
-			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
+
+			/* Start from last page of the scan. */
+			if (scan->rs_numblocks == InvalidBlockNumber)
+			{
+				if (scan->rs_startblock > 0)
+					page = scan->rs_startblock - 1;
+				else
+					page = scan->rs_nblocks - 1;
+			}
 			else
-				page = scan->rs_nblocks - 1;
+				page = scan->rs_startblock + scan->rs_numblocks - 1;
+
 			heapgetpage(scan, page);
 		}
 		else
@@ -876,11 +883,18 @@ heapgettup_pagemode(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_syncscan = false;
+
 			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
+			if (scan->rs_numblocks == InvalidBlockNumber)
+			{
+				if (scan->rs_startblock > 0)
+					page = scan->rs_startblock - 1;
+				else
+					page = scan->rs_nblocks - 1;
+			}
 			else
-				page = scan->rs_nblocks - 1;
+				page = scan->rs_startblock + scan->rs_numblocks - 1;
+
 			heapgetpage(scan, page);
 		}
 		else
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ece0c19..a0d11f3 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -111,6 +111,7 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
 static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static void show_scan_direction(ExplainState *es, ScanDirection direction);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
 static void ExplainScanTarget(Scan *plan, ExplainState *es);
@@ -1245,7 +1246,6 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SeqScan:
 		case T_SampleScan:
 		case T_BitmapHeapScan:
-		case T_TidScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1254,6 +1254,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_WorkTableScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_TidScan:
+			show_scan_direction(es, ((TidScan *) plan)->direction);
+			ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_ForeignScan:
 		case T_CustomScan:
 			if (((Scan *) plan)->scanrelid > 0)
@@ -2867,25 +2871,21 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
 }
 
 /*
- * Add some additional details about an IndexScan or IndexOnlyScan
+ * Show the direction of a scan.
  */
 static void
-ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
-						ExplainState *es)
+show_scan_direction(ExplainState *es, ScanDirection direction)
 {
-	const char *indexname = explain_get_index_name(indexid);
-
 	if (es->format == EXPLAIN_FORMAT_TEXT)
 	{
-		if (ScanDirectionIsBackward(indexorderdir))
+		if (ScanDirectionIsBackward(direction))
 			appendStringInfoString(es->str, " Backward");
-		appendStringInfo(es->str, " using %s", indexname);
 	}
 	else
 	{
 		const char *scandir;
 
-		switch (indexorderdir)
+		switch (direction)
 		{
 			case BackwardScanDirection:
 				scandir = "Backward";
@@ -2901,11 +2901,27 @@ ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 				break;
 		}
 		ExplainPropertyText("Scan Direction", scandir, es);
-		ExplainPropertyText("Index Name", indexname, es);
 	}
 }
 
 /*
+ * Add some additional details about an IndexScan or IndexOnlyScan
+ */
+static void
+ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
+						ExplainState *es)
+{
+	const char *indexname = explain_get_index_name(indexid);
+
+	show_scan_direction(es, indexorderdir);
+
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+		appendStringInfo(es->str, " using %s", indexname);
+	else
+		ExplainPropertyText("Index Name", indexname, es);
+}
+
+/*
  * Show the target of a Scan node
  */
 static void
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 82d3c70..9b455d8 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -22,7 +22,9 @@
  */
 #include "postgres.h"
 
+#include "access/relscan.h"
 #include "access/sysattr.h"
+#include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "executor/execdebug.h"
 #include "executor/nodeTidscan.h"
@@ -39,21 +41,78 @@
 	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
 	 ((Var *) (node))->varlevelsup == 0)
 
+typedef enum
+{
+	TIDEXPR_CURRENT_OF,
+	TIDEXPR_IN_ARRAY,
+	TIDEXPR_EQ,
+	TIDEXPR_LT,
+	TIDEXPR_GT,
+	TIDEXPR_BETWEEN,
+	TIDEXPR_ANY
+}			TidExprType;
+
 /* one element in tss_tidexprs */
 typedef struct TidExpr
 {
+	TidExprType type;
 	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
-	bool		isarray;		/* if true, it yields tid[] not just tid */
-	CurrentOfExpr *cexpr;		/* alternatively, we can have CURRENT OF */
+	ExprState  *exprstate2;		/* For TIDEXPR_BETWEEN */
+	CurrentOfExpr *cexpr;		/* For TIDEXPR_CURRENT_OF */
+	bool		strict;			/* Indicates < rather than <=, or > rather */
+	bool		strict2;		/* than >= */
 } TidExpr;
 
+typedef struct TidRange
+{
+	ItemPointerData first;
+	ItemPointerData last;
+}			TidRange;
+
+static ExprState *MakeTidOpExprState(OpExpr *expr, TidScanState *tidstate, bool *strict, bool *invert);
 static void TidExprListCreate(TidScanState *tidstate);
+static TidRange * EnlargeTidRangeArray(TidRange * tidRanges, int numRanges, int *numAllocRanges);
+static bool SetTidLowerBound(ItemPointer tid, bool strict, int nblocks, ItemPointer lowerBound);
+static bool SetTidUpperBound(ItemPointer tid, bool strict, int nblocks, ItemPointer upperBound);
 static void TidListEval(TidScanState *tidstate);
+static bool MergeTidRanges(TidRange * a, TidRange * b);
 static int	itemptr_comparator(const void *a, const void *b);
+static int	tidrange_comparator(const void *a, const void *b);
+static HeapScanDesc BeginTidRangeScan(TidScanState *node, TidRange * range);
+static HeapTuple NextInTidRange(HeapScanDesc scandesc, ScanDirection direction, TidRange * range);
 static TupleTableSlot *TidNext(TidScanState *node);
 
 
 /*
+ * Create an ExprState corresponding to the value part of a TID comparison.
+ * If the comparison operator is > or <, strict is set.
+ * If the comparison is of the form VALUE op CTID, then invert is set.
+ */
+static ExprState *
+MakeTidOpExprState(OpExpr *expr, TidScanState *tidstate, bool *strict, bool *invert)
+{
+	Node	   *arg1 = get_leftop((Expr *) expr);
+	Node	   *arg2 = get_rightop((Expr *) expr);
+	ExprState  *exprstate = NULL;
+
+	*invert = false;
+
+	if (IsCTIDVar(arg1))
+		exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
+	else if (IsCTIDVar(arg2))
+	{
+		exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
+		*invert = true;
+	}
+	else
+		elog(ERROR, "could not identify CTID variable");
+
+	*strict = expr->opno == TIDLessOperator || expr->opno == TIDGreaterOperator;
+
+	return exprstate;
+}
+
+/*
  * Extract the qual subexpressions that yield TIDs to search for,
  * and compile them into ExprStates if they're ordinary expressions.
  *
@@ -69,6 +128,14 @@ TidExprListCreate(TidScanState *tidstate)
 	tidstate->tss_tidexprs = NIL;
 	tidstate->tss_isCurrentOf = false;
 
+	if (!node->tidquals)
+	{
+		TidExpr    *tidexpr = (TidExpr *) palloc0(sizeof(TidExpr));
+
+		tidexpr->type = TIDEXPR_ANY;
+		tidstate->tss_tidexprs = lappend(tidstate->tss_tidexprs, tidexpr);
+	}
+
 	foreach(l, node->tidquals)
 	{
 		Expr	   *expr = (Expr *) lfirst(l);
@@ -76,20 +143,16 @@ TidExprListCreate(TidScanState *tidstate)
 
 		if (is_opclause(expr))
 		{
-			Node	   *arg1;
-			Node	   *arg2;
-
-			arg1 = get_leftop(expr);
-			arg2 = get_rightop(expr);
-			if (IsCTIDVar(arg1))
-				tidexpr->exprstate = ExecInitExpr((Expr *) arg2,
-												  &tidstate->ss.ps);
-			else if (IsCTIDVar(arg2))
-				tidexpr->exprstate = ExecInitExpr((Expr *) arg1,
-												  &tidstate->ss.ps);
+			OpExpr	   *opexpr = (OpExpr *) expr;
+			bool		invert;
+
+			tidexpr->exprstate = MakeTidOpExprState(opexpr, tidstate, &tidexpr->strict, &invert);
+			if (opexpr->opno == TIDLessOperator || opexpr->opno == TIDLessEqOperator)
+				tidexpr->type = invert ? TIDEXPR_GT : TIDEXPR_LT;
+			else if (opexpr->opno == TIDGreaterOperator || opexpr->opno == TIDGreaterEqOperator)
+				tidexpr->type = invert ? TIDEXPR_LT : TIDEXPR_GT;
 			else
-				elog(ERROR, "could not identify CTID variable");
-			tidexpr->isarray = false;
+				tidexpr->type = TIDEXPR_EQ;
 		}
 		else if (expr && IsA(expr, ScalarArrayOpExpr))
 		{
@@ -98,15 +161,46 @@ TidExprListCreate(TidScanState *tidstate)
 			Assert(IsCTIDVar(linitial(saex->args)));
 			tidexpr->exprstate = ExecInitExpr(lsecond(saex->args),
 											  &tidstate->ss.ps);
-			tidexpr->isarray = true;
+			tidexpr->type = TIDEXPR_IN_ARRAY;
 		}
 		else if (expr && IsA(expr, CurrentOfExpr))
 		{
 			CurrentOfExpr *cexpr = (CurrentOfExpr *) expr;
 
 			tidexpr->cexpr = cexpr;
+			tidexpr->type = TIDEXPR_CURRENT_OF;
 			tidstate->tss_isCurrentOf = true;
 		}
+		else if (and_clause((Node *) expr))
+		{
+			OpExpr	   *arg1;
+			OpExpr	   *arg2;
+			bool		invert;
+			bool		invert2;
+
+			Assert(list_length(((BoolExpr *) expr)->args) == 2);
+			arg1 = (OpExpr *) linitial(((BoolExpr *) expr)->args);
+			arg2 = (OpExpr *) lsecond(((BoolExpr *) expr)->args);
+			tidexpr->exprstate = MakeTidOpExprState(arg1, tidstate, &tidexpr->strict, &invert);
+			tidexpr->exprstate2 = MakeTidOpExprState(arg2, tidstate, &tidexpr->strict2, &invert2);
+
+			/* If the LHS is not the lower bound, swap them. */
+			if (invert == (arg1->opno == TIDGreaterOperator || arg1->opno == TIDGreaterEqOperator))
+			{
+				bool		temp_strict;
+				ExprState  *temp_es;
+
+				temp_es = tidexpr->exprstate;
+				tidexpr->exprstate = tidexpr->exprstate2;
+				tidexpr->exprstate2 = temp_es;
+
+				temp_strict = tidexpr->strict;
+				tidexpr->strict = tidexpr->strict2;
+				tidexpr->strict2 = temp_strict;
+			}
+
+			tidexpr->type = TIDEXPR_BETWEEN;
+		}
 		else
 			elog(ERROR, "could not identify CTID expression");
 
@@ -118,6 +212,113 @@ TidExprListCreate(TidScanState *tidstate)
 		   !tidstate->tss_isCurrentOf);
 }
 
+static TidRange *
+EnlargeTidRangeArray(TidRange * tidRanges, int numRanges, int *numAllocRanges)
+{
+	if (numRanges >= *numAllocRanges)
+	{
+		*numAllocRanges *= 2;
+		tidRanges = (TidRange *)
+			repalloc(tidRanges,
+					 *numAllocRanges * sizeof(TidRange));
+	}
+	return tidRanges;
+}
+
+/*
+ * Set a lower bound tid, taking into account the strictness of the bound.
+ * Return false if the lower bound is outside the size of the table.
+ */
+static bool
+SetTidLowerBound(ItemPointer tid, bool strict, int nblocks, ItemPointer lowerBound)
+{
+	OffsetNumber offset;
+
+	if (tid == NULL)
+	{
+		ItemPointerSetBlockNumber(lowerBound, 0);
+		ItemPointerSetOffsetNumber(lowerBound, 1);
+		return true;
+	}
+
+	if (ItemPointerGetBlockNumberNoCheck(tid) > nblocks)
+		return false;
+
+	*lowerBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	if (strict)
+		ItemPointerSetOffsetNumber(lowerBound, OffsetNumberNext(offset));
+	else if (offset == 0)
+		ItemPointerSetOffsetNumber(lowerBound, 1);
+
+	return true;
+}
+
+/*
+ * Set an upper bound tid, taking into account the strictness of the bound.
+ * Return false if the bound excludes anything from the table.
+ */
+static bool
+SetTidUpperBound(ItemPointer tid, bool strict, int nblocks, ItemPointer upperBound)
+{
+	OffsetNumber offset;
+
+	/* If the table is empty, the range must be empty. */
+	if (nblocks == 0)
+		return false;
+
+	if (tid == NULL)
+	{
+		ItemPointerSetBlockNumber(upperBound, nblocks - 1);
+		ItemPointerSetOffsetNumber(upperBound, MaxOffsetNumber);
+		return true;
+	}
+
+	*upperBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	/*
+	 * If the expression was non-strict (<=) and the offset is 0, then just
+	 * pretend it was strict, because offset 0 doesn't exist and we may as
+	 * well exclude that block.
+	 */
+	if (!strict && offset == 0)
+		strict = true;
+
+	if (strict)
+	{
+		if (offset == 0)
+		{
+			BlockNumber block = ItemPointerGetBlockNumberNoCheck(upperBound);
+
+			/*
+			 * If the upper bound was already block 0, then there is no valid
+			 * range.
+			 */
+			if (block == 0)
+				return false;
+
+			ItemPointerSetBlockNumber(upperBound, block - 1);
+			ItemPointerSetOffsetNumber(upperBound, MaxOffsetNumber);
+		}
+		else
+			ItemPointerSetOffsetNumber(upperBound, OffsetNumberPrev(offset));
+	}
+
+	/*
+	 * If the upper bound is beyond the last block of the table, truncate it
+	 * to the last TID of the last block.
+	 */
+	if (ItemPointerGetBlockNumberNoCheck(upperBound) > nblocks)
+	{
+		ItemPointerSetBlockNumber(upperBound, nblocks - 1);
+		ItemPointerSetOffsetNumber(upperBound, MaxOffsetNumber);
+	}
+
+	return true;
+}
+
 /*
  * Compute the list of TIDs to be visited, by evaluating the expressions
  * for them.
@@ -129,9 +330,9 @@ TidListEval(TidScanState *tidstate)
 {
 	ExprContext *econtext = tidstate->ss.ps.ps_ExprContext;
 	BlockNumber nblocks;
-	ItemPointerData *tidList;
-	int			numAllocTids;
-	int			numTids;
+	TidRange   *tidRanges;
+	int			numAllocRanges;
+	int			numRanges;
 	ListCell   *l;
 
 	/*
@@ -147,10 +348,9 @@ TidListEval(TidScanState *tidstate)
 	 * are simple OpExprs or CurrentOfExprs.  If there are any
 	 * ScalarArrayOpExprs, we may have to enlarge the array.
 	 */
-	numAllocTids = list_length(tidstate->tss_tidexprs);
-	tidList = (ItemPointerData *)
-		palloc(numAllocTids * sizeof(ItemPointerData));
-	numTids = 0;
+	numAllocRanges = list_length(tidstate->tss_tidexprs);
+	tidRanges = (TidRange *) palloc0(numAllocRanges * sizeof(TidRange));
+	numRanges = 0;
 
 	foreach(l, tidstate->tss_tidexprs)
 	{
@@ -158,7 +358,7 @@ TidListEval(TidScanState *tidstate)
 		ItemPointer itemptr;
 		bool		isNull;
 
-		if (tidexpr->exprstate && !tidexpr->isarray)
+		if (tidexpr->exprstate && tidexpr->type == TIDEXPR_EQ)
 		{
 			itemptr = (ItemPointer)
 				DatumGetPointer(ExecEvalExprSwitchContext(tidexpr->exprstate,
@@ -168,17 +368,76 @@ TidListEval(TidScanState *tidstate)
 				ItemPointerIsValid(itemptr) &&
 				ItemPointerGetBlockNumber(itemptr) < nblocks)
 			{
-				if (numTids >= numAllocTids)
-				{
-					numAllocTids *= 2;
-					tidList = (ItemPointerData *)
-						repalloc(tidList,
-								 numAllocTids * sizeof(ItemPointerData));
-				}
-				tidList[numTids++] = *itemptr;
+				tidRanges = EnlargeTidRangeArray(tidRanges, numRanges, &numAllocRanges);
+				tidRanges[numRanges].first = *itemptr;
+				tidRanges[numRanges].last = *itemptr;
+				numRanges++;
 			}
 		}
-		else if (tidexpr->exprstate && tidexpr->isarray)
+		else if (tidexpr->exprstate && tidexpr->type == TIDEXPR_LT)
+		{
+			bool		upper_isNull;
+			ItemPointer upper_itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidexpr->exprstate,
+													  econtext,
+													  &upper_isNull));
+
+			if (upper_isNull)
+				continue;
+
+			tidRanges = EnlargeTidRangeArray(tidRanges, numRanges, &numAllocRanges);
+
+			SetTidLowerBound(NULL, false, nblocks, &tidRanges[numRanges].first);
+			if (SetTidUpperBound(upper_itemptr, tidexpr->strict, nblocks, &tidRanges[numRanges].last))
+				numRanges++;
+		}
+		else if (tidexpr->exprstate && tidexpr->type == TIDEXPR_GT)
+		{
+			bool		lower_isNull;
+			ItemPointer lower_itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidexpr->exprstate,
+													  econtext,
+													  &lower_isNull));
+
+			if (lower_isNull)
+				continue;
+
+			tidRanges = EnlargeTidRangeArray(tidRanges, numRanges, &numAllocRanges);
+
+			if (SetTidLowerBound(lower_itemptr, tidexpr->strict, nblocks, &tidRanges[numRanges].first) &&
+				SetTidUpperBound(NULL, false, nblocks, &tidRanges[numRanges].last))
+				numRanges++;
+		}
+		else if (tidexpr->exprstate && tidexpr->type == TIDEXPR_BETWEEN)
+		{
+			bool		lower_isNull,
+						upper_isNull;
+			ItemPointer lower_itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidexpr->exprstate,
+													  econtext,
+													  &lower_isNull));
+			ItemPointer upper_itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidexpr->exprstate2,
+													  econtext,
+													  &upper_isNull));
+
+			if (lower_isNull || upper_isNull)
+				continue;
+
+			tidRanges = EnlargeTidRangeArray(tidRanges, numRanges, &numAllocRanges);
+
+			if (SetTidLowerBound(lower_itemptr, tidexpr->strict, nblocks, &tidRanges[numRanges].first) &&
+				SetTidUpperBound(upper_itemptr, tidexpr->strict2, nblocks, &tidRanges[numRanges].last))
+				numRanges++;
+		}
+		else if (tidexpr->type == TIDEXPR_ANY)
+		{
+			tidRanges = EnlargeTidRangeArray(tidRanges, numRanges, &numAllocRanges);
+			SetTidLowerBound(NULL, false, nblocks, &tidRanges[numRanges].first);
+			SetTidUpperBound(NULL, false, nblocks, &tidRanges[numRanges].last);
+			numRanges++;
+		}
+		else if (tidexpr->exprstate && tidexpr->type == TIDEXPR_IN_ARRAY)
 		{
 			Datum		arraydatum;
 			ArrayType  *itemarray;
@@ -196,12 +455,12 @@ TidListEval(TidScanState *tidstate)
 			deconstruct_array(itemarray,
 							  TIDOID, sizeof(ItemPointerData), false, 's',
 							  &ipdatums, &ipnulls, &ndatums);
-			if (numTids + ndatums > numAllocTids)
+			if (numRanges + ndatums > numAllocRanges)
 			{
-				numAllocTids = numTids + ndatums;
-				tidList = (ItemPointerData *)
-					repalloc(tidList,
-							 numAllocTids * sizeof(ItemPointerData));
+				numAllocRanges = numRanges + ndatums;
+				tidRanges = (TidRange *)
+					repalloc(tidRanges,
+							 numAllocRanges * sizeof(TidRange));
 			}
 			for (i = 0; i < ndatums; i++)
 			{
@@ -210,13 +469,15 @@ TidListEval(TidScanState *tidstate)
 					itemptr = (ItemPointer) DatumGetPointer(ipdatums[i]);
 					if (ItemPointerIsValid(itemptr) &&
 						ItemPointerGetBlockNumber(itemptr) < nblocks)
-						tidList[numTids++] = *itemptr;
+						tidRanges[numRanges].first = *itemptr;
+					tidRanges[numRanges].last = *itemptr;
+					numRanges++;
 				}
 			}
 			pfree(ipdatums);
 			pfree(ipnulls);
 		}
-		else
+		else if (tidexpr->type == TIDEXPR_CURRENT_OF)
 		{
 			ItemPointerData cursor_tid;
 
@@ -225,16 +486,20 @@ TidListEval(TidScanState *tidstate)
 							  RelationGetRelid(tidstate->ss.ss_currentRelation),
 							  &cursor_tid))
 			{
-				if (numTids >= numAllocTids)
-				{
-					numAllocTids *= 2;
-					tidList = (ItemPointerData *)
-						repalloc(tidList,
-								 numAllocTids * sizeof(ItemPointerData));
-				}
-				tidList[numTids++] = cursor_tid;
+				/*
+				 * A current-of TidExpr only exists by itself, and we should
+				 * already have allocated a tidList entry for it.  We don't
+				 * need to check whether the tidList array needs to be
+				 * resized.
+				 */
+				Assert(numRanges < numAllocRanges);
+				tidRanges[numRanges].first = cursor_tid;
+				tidRanges[numRanges].last = cursor_tid;
+				numRanges++;
 			}
 		}
+		else
+			Assert(false);
 	}
 
 	/*
@@ -243,31 +508,55 @@ TidListEval(TidScanState *tidstate)
 	 * the list.  Sorting makes it easier to detect duplicates, and as a bonus
 	 * ensures that we will visit the heap in the most efficient way.
 	 */
-	if (numTids > 1)
+	if (numRanges > 1)
 	{
-		int			lastTid;
+		int			lastRange;
 		int			i;
 
 		/* CurrentOfExpr could never appear OR'd with something else */
 		Assert(!tidstate->tss_isCurrentOf);
 
-		qsort((void *) tidList, numTids, sizeof(ItemPointerData),
-			  itemptr_comparator);
-		lastTid = 0;
-		for (i = 1; i < numTids; i++)
+		qsort((void *) tidRanges, numRanges, sizeof(TidRange), tidrange_comparator);
+		lastRange = 0;
+		for (i = 1; i < numRanges; i++)
 		{
-			if (!ItemPointerEquals(&tidList[lastTid], &tidList[i]))
-				tidList[++lastTid] = tidList[i];
+			if (!MergeTidRanges(&tidRanges[lastRange], &tidRanges[i]))
+				tidRanges[++lastRange] = tidRanges[i];
 		}
-		numTids = lastTid + 1;
+		numRanges = lastRange + 1;
 	}
 
-	tidstate->tss_TidList = tidList;
-	tidstate->tss_NumTids = numTids;
+	tidstate->tss_TidRanges = tidRanges;
+	tidstate->tss_NumRanges = numRanges;
 	tidstate->tss_TidPtr = -1;
 }
 
 /*
+ * If two ranges overlap, merge them into one.
+ * Assumes the two ranges are already ordered by (first, last).
+ * Returns true if they were merged.
+ */
+static bool
+MergeTidRanges(TidRange * a, TidRange * b)
+{
+	ItemPointerData a_last = a->last;
+	ItemPointerData b_last;
+
+	if (!ItemPointerIsValid(&a_last))
+		a_last = a->first;
+
+	if (itemptr_comparator(&a_last, &b->first) <= 0)
+		return false;
+
+	b_last = b->last;
+	if (!ItemPointerIsValid(&b_last))
+		b_last = b->first;
+
+	a->last = b->last;
+	return true;
+}
+
+/*
  * qsort comparator for ItemPointerData items
  */
 static int
@@ -291,6 +580,86 @@ itemptr_comparator(const void *a, const void *b)
 	return 0;
 }
 
+/*
+ * qsort comparator for TidRange items
+ */
+static int
+tidrange_comparator(const void *a, const void *b)
+{
+	const		TidRange *tra = (const TidRange *) a;
+	const		TidRange *trb = (const TidRange *) b;
+	int			cmp_first = itemptr_comparator(&tra->first, &trb->first);
+
+	if (cmp_first != 0)
+		return cmp_first;
+	else
+		return itemptr_comparator(&tra->last, &trb->last);
+}
+
+static HeapScanDesc
+BeginTidRangeScan(TidScanState *node, TidRange * range)
+{
+	HeapScanDesc scandesc = node->ss.ss_currentScanDesc;
+	BlockNumber first_block = ItemPointerGetBlockNumberNoCheck(&range->first);
+	BlockNumber last_block = ItemPointerGetBlockNumberNoCheck(&range->last);
+
+	if (!scandesc)
+	{
+		EState	   *estate = node->ss.ps.state;
+
+		scandesc = heap_beginscan_strat(node->ss.ss_currentRelation,
+										estate->es_snapshot,
+										0, NULL,
+										false, false);
+		node->ss.ss_currentScanDesc = scandesc;
+	}
+	else
+		heap_rescan(scandesc, NULL);
+
+	heap_setscanlimits(scandesc, first_block, last_block - first_block + 1);
+	node->tss_inScan = true;
+	return scandesc;
+}
+
+static HeapTuple
+NextInTidRange(HeapScanDesc scandesc, ScanDirection direction, TidRange * range)
+{
+	BlockNumber first_block = ItemPointerGetBlockNumber(&range->first);
+	OffsetNumber first_offset = ItemPointerGetOffsetNumber(&range->first);
+	BlockNumber last_block = ItemPointerGetBlockNumber(&range->last);
+	OffsetNumber last_offset = ItemPointerGetOffsetNumber(&range->last);
+	HeapTuple	tuple;
+
+	for (;;)
+	{
+		BlockNumber block;
+		OffsetNumber offset;
+
+		tuple = heap_getnext(scandesc, direction);
+		if (!tuple)
+			break;
+
+		/* Check that the tuple is within the required range. */
+		block = ItemPointerGetBlockNumber(&tuple->t_self);
+		offset = ItemPointerGetOffsetNumber(&tuple->t_self);
+
+		/*
+		 * TODO if scanning forward, can stop as soon as we see a tuple
+		 * greater than last_offset
+		 */
+		/* similarly with backward, less than, first_offset */
+		if (block == first_block && offset < first_offset)
+			continue;
+
+		if (block == last_block && offset > last_offset)
+			continue;
+
+		break;
+	}
+
+	return tuple;
+}
+
 /* ----------------------------------------------------------------
  *		TidNext
  *
@@ -302,6 +671,7 @@ itemptr_comparator(const void *a, const void *b)
 static TupleTableSlot *
 TidNext(TidScanState *node)
 {
+	HeapScanDesc scandesc;
 	EState	   *estate;
 	ScanDirection direction;
 	Snapshot	snapshot;
@@ -309,105 +679,149 @@ TidNext(TidScanState *node)
 	HeapTuple	tuple;
 	TupleTableSlot *slot;
 	Buffer		buffer = InvalidBuffer;
-	ItemPointerData *tidList;
-	int			numTids;
+	int			numRanges;
 	bool		bBackward;
 
 	/*
 	 * extract necessary information from tid scan node
 	 */
+	scandesc = node->ss.ss_currentScanDesc;
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	snapshot = estate->es_snapshot;
 	heapRelation = node->ss.ss_currentRelation;
 	slot = node->ss.ss_ScanTupleSlot;
 
-	/*
-	 * First time through, compute the list of TIDs to be visited
-	 */
-	if (node->tss_TidList == NULL)
+	/* First time through, compute the list of TID ranges to be visited */
+	if (node->tss_TidRanges == NULL)
+	{
 		TidListEval(node);
 
-	tidList = node->tss_TidList;
-	numTids = node->tss_NumTids;
+		node->tss_TidPtr = -1;
+	}
 
-	/*
-	 * We use node->tss_htup as the tuple pointer; note this can't just be a
-	 * local variable here, as the scan tuple slot will keep a pointer to it.
-	 */
-	tuple = &(node->tss_htup);
+	numRanges = node->tss_NumRanges;
 
-	/*
-	 * Initialize or advance scan position, depending on direction.
-	 */
-	bBackward = ScanDirectionIsBackward(direction);
-	if (bBackward)
+	/* If the plan direction is backward, invert the direction. */
+	if (ScanDirectionIsBackward(((TidScan *) node->ss.ps.plan)->direction))
 	{
-		if (node->tss_TidPtr < 0)
-		{
-			/* initialize for backward scan */
-			node->tss_TidPtr = numTids - 1;
-		}
-		else
-			node->tss_TidPtr--;
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
 	}
-	else
+
+	tuple = NULL;
+	for (;;)
 	{
-		if (node->tss_TidPtr < 0)
+		TidRange   *currentRange;
+
+		if (!node->tss_inScan)
 		{
-			/* initialize for forward scan */
-			node->tss_TidPtr = 0;
+			/* Initialize or advance scan position, depending on direction. */
+			bBackward = ScanDirectionIsBackward(direction);
+			if (bBackward)
+			{
+				if (node->tss_TidPtr < 0)
+				{
+					/* initialize for backward scan */
+					node->tss_TidPtr = numRanges - 1;
+				}
+				else
+					node->tss_TidPtr--;
+			}
+			else
+			{
+				if (node->tss_TidPtr < 0)
+				{
+					/* initialize for forward scan */
+					node->tss_TidPtr = 0;
+				}
+				else
+					node->tss_TidPtr++;
+			}
 		}
-		else
-			node->tss_TidPtr++;
-	}
 
-	while (node->tss_TidPtr >= 0 && node->tss_TidPtr < numTids)
-	{
-		tuple->t_self = tidList[node->tss_TidPtr];
+		if (node->tss_TidPtr >= numRanges || node->tss_TidPtr < 0)
+			break;
 
-		/*
-		 * For WHERE CURRENT OF, the tuple retrieved from the cursor might
-		 * since have been updated; if so, we should fetch the version that is
-		 * current according to our snapshot.
-		 */
-		if (node->tss_isCurrentOf)
-			heap_get_latest_tid(heapRelation, snapshot, &tuple->t_self);
+		currentRange = &node->tss_TidRanges[node->tss_TidPtr];
 
-		if (heap_fetch(heapRelation, snapshot, tuple, &buffer, false, NULL))
+		/* TODO ranges of size 1 should also use a simple tuple fetch */
+		if (node->tss_isCurrentOf)
 		{
 			/*
-			 * Store the scanned tuple in the scan tuple slot of the scan
-			 * state.  Eventually we will only do this and not return a tuple.
+			 * We use node->tss_htup as the tuple pointer; note this can't
+			 * just be a local variable here, as the scan tuple slot will keep
+			 * a pointer to it.
 			 */
-			ExecStoreBufferHeapTuple(tuple, /* tuple to store */
-									 slot,	/* slot to store in */
-									 buffer);	/* buffer associated with
-												 * tuple */
+			tuple = &(node->tss_htup);
+			tuple->t_self = currentRange->first;
 
 			/*
-			 * At this point we have an extra pin on the buffer, because
-			 * ExecStoreHeapTuple incremented the pin count. Drop our local
-			 * pin.
+			 * For WHERE CURRENT OF, the tuple retrieved from the cursor might
+			 * since have been updated; if so, we should fetch the version
+			 * that is current according to our snapshot.
 			 */
-			ReleaseBuffer(buffer);
+			if (node->tss_isCurrentOf)
+				heap_get_latest_tid(heapRelation, snapshot, &tuple->t_self);
 
-			return slot;
+			if (heap_fetch(heapRelation, snapshot, tuple, &buffer, false, NULL))
+			{
+				/*
+				 * Store the scanned tuple in the scan tuple slot of the scan
+				 * state.  Eventually we will only do this and not return a
+				 * tuple.
+				 */
+				ExecStoreBufferHeapTuple(tuple, /* tuple to store */
+										 slot,	/* slot to store in */
+										 buffer);	/* buffer associated with
+													 * tuple */
+
+				/*
+				 * At this point we have an extra pin on the buffer, because
+				 * ExecStoreHeapTuple incremented the pin count. Drop our
+				 * local pin.
+				 */
+				ReleaseBuffer(buffer);
+
+				return slot;
+			}
+			else
+			{
+				tuple = NULL;
+			}
 		}
-		/* Bad TID or failed snapshot qual; try next */
-		if (bBackward)
-			node->tss_TidPtr--;
 		else
-			node->tss_TidPtr++;
+		{
+			if (!node->tss_inScan)
+				scandesc = BeginTidRangeScan(node, currentRange);
+
+			tuple = NextInTidRange(scandesc, direction, currentRange);
+			if (tuple)
+				break;
 
-		CHECK_FOR_INTERRUPTS();
+			node->tss_inScan = false;
+		}
 	}
 
 	/*
-	 * if we get here it means the tid scan failed so we are at the end of the
-	 * scan..
+	 * save the tuple and the buffer returned to us by the access methods in
+	 * our scan tuple slot and return the slot.  Note: we pass 'false' because
+	 * tuples returned by heap_getnext() are pointers onto disk pages and were
+	 * not created with palloc() and so should not be pfree()'d.  Note also
+	 * that ExecStoreHeapTuple will increment the refcount of the buffer; the
+	 * refcount will not be dropped until the tuple table slot is cleared.
 	 */
-	return ExecClearTuple(slot);
+	if (tuple)
+		ExecStoreBufferHeapTuple(tuple, /* tuple to store */
+								 slot,	/* slot to store in */
+								 scandesc->rs_cbuf);	/* buffer associated
+														 * with this tuple */
+	else
+		ExecClearTuple(slot);
+
+	return slot;
 }
 
 /*
@@ -460,11 +874,13 @@ ExecTidScan(PlanState *pstate)
 void
 ExecReScanTidScan(TidScanState *node)
 {
-	if (node->tss_TidList)
-		pfree(node->tss_TidList);
-	node->tss_TidList = NULL;
-	node->tss_NumTids = 0;
+	if (node->tss_TidRanges)
+		pfree(node->tss_TidRanges);
+
+	node->tss_TidRanges = NULL;
+	node->tss_NumRanges = 0;
 	node->tss_TidPtr = -1;
+	node->tss_inScan = false;
 
 	ExecScanReScan(&node->ss);
 }
@@ -479,6 +895,8 @@ ExecReScanTidScan(TidScanState *node)
 void
 ExecEndTidScan(TidScanState *node)
 {
+	HeapScanDesc scan = node->ss.ss_currentScanDesc;
+
 	/*
 	 * Free the exprcontext
 	 */
@@ -490,6 +908,10 @@ ExecEndTidScan(TidScanState *node)
 	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
+	/* close heap scan */
+	if (scan != NULL)
+		heap_endscan(scan);
+
 	/*
 	 * close the heap relation.
 	 */
@@ -529,11 +951,12 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
 	ExecAssignExprContext(estate, &tidstate->ss.ps);
 
 	/*
-	 * mark tid list as not computed yet
+	 * mark tid range list as not computed yet
 	 */
-	tidstate->tss_TidList = NULL;
-	tidstate->tss_NumTids = 0;
+	tidstate->tss_TidRanges = NULL;
+	tidstate->tss_NumRanges = 0;
 	tidstate->tss_TidPtr = -1;
+	tidstate->tss_inScan = false;
 
 	/*
 	 * open the base relation and acquire appropriate lock on it.
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 90703a6..02f096d 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -583,6 +583,7 @@ _copyTidScan(const TidScan *from)
 	 * copy remainder of node
 	 */
 	COPY_NODE_FIELD(tidquals);
+	COPY_SCALAR_FIELD(direction);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 6f5a4cb..937a93e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -619,6 +619,7 @@ _outTidScan(StringInfo str, const TidScan *node)
 	_outScanInfo(str, (const Scan *) node);
 
 	WRITE_NODE_FIELD(tidquals);
+	WRITE_ENUM_FIELD(direction, ScanDirection);
 }
 
 static void
@@ -1895,6 +1896,7 @@ _outTidPath(StringInfo str, const TidPath *node)
 	_outPathInfo(str, (const Path *) node);
 
 	WRITE_NODE_FIELD(tidquals);
+	WRITE_ENUM_FIELD(direction, ScanDirection);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 519deab..79de340 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1848,6 +1848,7 @@ _readTidScan(void)
 	ReadCommonScan(&local_node->scan);
 
 	READ_NODE_FIELD(tidquals);
+	READ_ENUM_FIELD(direction, ScanDirection);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7bf67a0..72b4fc6 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1185,8 +1185,11 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	Cost		cpu_per_tuple;
 	QualCost	tid_qual_cost;
 	int			ntuples;
+	int			nrandompages;
+	int			nseqpages;
 	ListCell   *l;
 	double		spc_random_page_cost;
+	double		spc_seq_page_cost;
 
 	/* Should only be applied to base relations */
 	Assert(baserel->relid > 0);
@@ -1200,6 +1203,8 @@ cost_tidscan(Path *path, PlannerInfo *root,
 
 	/* Count how many tuples we expect to retrieve */
 	ntuples = 0;
+	nrandompages = 0;
+	nseqpages = 0;
 	foreach(l, tidquals)
 	{
 		if (IsA(lfirst(l), ScalarArrayOpExpr))
@@ -1207,19 +1212,37 @@ cost_tidscan(Path *path, PlannerInfo *root,
 			/* Each element of the array yields 1 tuple */
 			ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) lfirst(l);
 			Node	   *arraynode = (Node *) lsecond(saop->args);
+			int			array_len = estimate_array_length(arraynode);
 
-			ntuples += estimate_array_length(arraynode);
+			ntuples += array_len;
+			nrandompages += array_len;
 		}
 		else if (IsA(lfirst(l), CurrentOfExpr))
 		{
 			/* CURRENT OF yields 1 tuple */
 			isCurrentOf = true;
 			ntuples++;
+			nrandompages++;
 		}
 		else
 		{
-			/* It's just CTID = something, count 1 tuple */
-			ntuples++;
+			/*
+			 * For anything else, we'll use the normal selectivity estimate.
+			 * Count the first page as a random page, the rest as sequential.
+			 */
+			Selectivity selectivity = clause_selectivity(root, lfirst(l),
+														 baserel->relid,
+														 JOIN_INNER,
+														 NULL);
+			BlockNumber pages = selectivity * baserel->pages;
+
+			if (pages <= 0)
+				pages = 1;
+
+			/* TODO decide what the costs should be */
+			ntuples += selectivity * baserel->tuples;
+			nseqpages += pages - 1;
+			nrandompages++;
 		}
 	}
 
@@ -1248,10 +1271,10 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	/* fetch estimated page cost for tablespace containing table */
 	get_tablespace_page_costs(baserel->reltablespace,
 							  &spc_random_page_cost,
-							  NULL);
+							  &spc_seq_page_cost);
 
-	/* disk costs --- assume each tuple on a different page */
-	run_cost += spc_random_page_cost * ntuples;
+	/* disk costs */
+	run_cost += spc_random_page_cost * nrandompages + spc_seq_page_cost + nseqpages;
 
 	/* Add scanning CPU costs */
 	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index ec66cb9..b847151 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -18,6 +18,9 @@
 #include "postgres.h"
 
 #include "access/stratnum.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/plannodes.h"
@@ -848,6 +851,22 @@ build_join_pathkeys(PlannerInfo *root,
 	return truncate_useless_pathkeys(root, joinrel, outer_pathkeys);
 }
 
+/*
+ * build_tidscan_pathkeys
+ *	  Build the path keys corresponding to ORDER BY ctid ASC|DESC.
+ */
+List *
+build_tidscan_pathkeys(PlannerInfo *root,
+					   RelOptInfo *rel,
+					   ScanDirection direction)
+{
+	int			opno = (direction == ForwardScanDirection) ? TIDLessOperator : TIDGreaterOperator;
+	Var		   *varexpr = makeVar(rel->relid, SelfItemPointerAttributeNumber, TIDOID, -1, InvalidOid, 0);
+	List	   *pathkeys = build_expression_pathkey(root, (Expr *) varexpr, NULL, opno, rel->relids, true);
+
+	return pathkeys;
+}
+
 /****************************************************************************
  *		PATHKEYS AND SORT CLAUSES
  ****************************************************************************/
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 3bb5b8d..f0e5949 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -4,13 +4,16 @@
  *	  Routines to determine which TID conditions are usable for scanning
  *	  a given relation, and create TidPaths accordingly.
  *
- * What we are looking for here is WHERE conditions of the form
- * "CTID = pseudoconstant", which can be implemented by just fetching
- * the tuple directly via heap_fetch().  We can also handle OR'd conditions
- * such as (CTID = const1) OR (CTID = const2), as well as ScalarArrayOpExpr
- * conditions of the form CTID = ANY(pseudoconstant_array).  In particular
- * this allows
- *		WHERE ctid IN (tid1, tid2, ...)
+ * What we are looking for here is WHERE conditions of the forms:
+ * - "CTID = c", which can be implemented by just fetching
+ *    the tuple directly via heap_fetch().
+ * - "CTID IN (pseudoconstant, ...)" or "CTID = ANY(pseudoconstant_array)"
+ * - "CTID > pseudoconstant", etc. for >, >=, <, and <=.
+ * - "CTID > pseudoconstant AND CTID < pseudoconstant", etc., with up to one
+ *   lower bound and one upper bound.
+ *
+ * We can also handle OR'd conditions of the above form, such as
+ * "(CTID = const1) OR (CTID >= const2) OR CTID IN (...)".
  *
  * We also support "WHERE CURRENT OF cursor" conditions (CurrentOfExpr),
  * which amount to "CTID = run-time-determined-TID".  These could in
@@ -46,32 +49,46 @@
 #include "optimizer/restrictinfo.h"
 
 
-static bool IsTidEqualClause(OpExpr *node, int varno);
+static bool IsTidVar(Var *var, int varno);
+static bool IsTidComparison(OpExpr *node, int varno, Oid expected_comparison_operator);
 static bool IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno);
+static bool IsUsableRangeQual(Node *expr, int varno, bool want_lower_bound);
+static List *MakeTidRangeQuals(List *quals);
+static List *TidCompoundRangeQualFromExpr(Node *expr, int varno);
 static List *TidQualFromExpr(Node *expr, int varno);
 static List *TidQualFromBaseRestrictinfo(RelOptInfo *rel);
 
 
+static bool
+IsTidVar(Var *var, int varno)
+{
+	return (var->varattno == SelfItemPointerAttributeNumber &&
+			var->vartype == TIDOID &&
+			var->varno == varno &&
+			var->varlevelsup == 0);
+}
+
 /*
  * Check to see if an opclause is of the form
- *		CTID = pseudoconstant
+ *		CTID OP pseudoconstant
  * or
- *		pseudoconstant = CTID
+ *		pseudoconstant OP CTID
+ * where OP is the expected comparison operator.
  *
  * We check that the CTID Var belongs to relation "varno".  That is probably
  * redundant considering this is only applied to restriction clauses, but
  * let's be safe.
  */
 static bool
-IsTidEqualClause(OpExpr *node, int varno)
+IsTidComparison(OpExpr *node, int varno, Oid expected_comparison_operator)
 {
 	Node	   *arg1,
 			   *arg2,
 			   *other;
 	Var		   *var;
 
-	/* Operator must be tideq */
-	if (node->opno != TIDEqualOperator)
+	/* Operator must be the expected one */
+	if (node->opno != expected_comparison_operator)
 		return false;
 	if (list_length(node->args) != 2)
 		return false;
@@ -83,19 +100,13 @@ IsTidEqualClause(OpExpr *node, int varno)
 	if (arg1 && IsA(arg1, Var))
 	{
 		var = (Var *) arg1;
-		if (var->varattno == SelfItemPointerAttributeNumber &&
-			var->vartype == TIDOID &&
-			var->varno == varno &&
-			var->varlevelsup == 0)
+		if (IsTidVar(var, varno))
 			other = arg2;
 	}
 	if (!other && arg2 && IsA(arg2, Var))
 	{
 		var = (Var *) arg2;
-		if (var->varattno == SelfItemPointerAttributeNumber &&
-			var->vartype == TIDOID &&
-			var->varno == varno &&
-			var->varlevelsup == 0)
+		if (IsTidVar(var, varno))
 			other = arg1;
 	}
 	if (!other)
@@ -110,6 +121,17 @@ IsTidEqualClause(OpExpr *node, int varno)
 	return true;				/* success */
 }
 
+#define IsTidEqualClause(node, varno)	IsTidComparison(node, varno, TIDEqualOperator)
+#define IsTidLTClause(node, varno)		IsTidComparison(node, varno, TIDLessOperator)
+#define IsTidLEClause(node, varno)		IsTidComparison(node, varno, TIDLessEqOperator)
+#define IsTidGTClause(node, varno)		IsTidComparison(node, varno, TIDGreaterOperator)
+#define IsTidGEClause(node, varno)		IsTidComparison(node, varno, TIDGreaterEqOperator)
+
+#define IsTidRangeClause(node, varno)	(IsTidLTClause(node, varno) || \
+										 IsTidLEClause(node, varno) || \
+										 IsTidGTClause(node, varno) || \
+										 IsTidGEClause(node, varno))
+
 /*
  * Check to see if a clause is of the form
  *		CTID = ANY (pseudoconstant_array)
@@ -134,10 +156,7 @@ IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno)
 	{
 		Var		   *var = (Var *) arg1;
 
-		if (var->varattno == SelfItemPointerAttributeNumber &&
-			var->vartype == TIDOID &&
-			var->varno == varno &&
-			var->varlevelsup == 0)
+		if (IsTidVar(var, varno))
 		{
 			/* The other argument must be a pseudoconstant */
 			if (is_pseudo_constant_clause(arg2))
@@ -149,6 +168,76 @@ IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno)
 }
 
 /*
+ * IsUsableRangeQual
+ *		Check if the expr is range qual of the expected type.
+ */
+static bool
+IsUsableRangeQual(Node *expr, int varno, bool want_lower_bound)
+{
+	if (is_opclause(expr) && IsTidRangeClause((OpExpr *) expr, varno))
+	{
+		bool		is_lower_bound = IsTidGTClause((OpExpr *) expr, varno) || IsTidGEClause((OpExpr *) expr, varno);
+		Node	   *leftop = get_leftop((Expr *) expr);
+
+		if (!IsA(leftop, Var) ||!IsTidVar((Var *) leftop, varno))
+			is_lower_bound = !is_lower_bound;
+
+		if (is_lower_bound == want_lower_bound)
+			return true;
+	}
+
+	return false;
+}
+
+static List *
+MakeTidRangeQuals(List *quals)
+{
+	if (list_length(quals) == 1)
+		return quals;
+	else
+		return list_make1(make_andclause(quals));
+}
+
+/*
+ * TidCompoundRangeQualFromExpr
+ *
+ * 		Extract a compound CTID range condition from the given qual expression
+ */
+static List *
+TidCompoundRangeQualFromExpr(Node *expr, int varno)
+{
+	List	   *rlst = NIL;
+	ListCell   *l;
+	bool		found_lower = false;
+	bool		found_upper = false;
+	List	   *found_quals = NIL;
+
+	foreach(l, ((BoolExpr *) expr)->args)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* Check if this clause contains a range qual */
+		if (!found_lower && IsUsableRangeQual(clause, varno, true))
+		{
+			found_lower = true;
+			found_quals = lappend(found_quals, clause);
+		}
+
+		if (!found_upper && IsUsableRangeQual(clause, varno, false))
+		{
+			found_upper = true;
+			found_quals = lappend(found_quals, clause);
+		}
+	}
+
+	/* If one or both range quals was specified, use them. */
+	if (found_quals)
+		rlst = MakeTidRangeQuals(found_quals);
+
+	return rlst;
+}
+
+/*
  *	Extract a set of CTID conditions from the given qual expression
  *
  *	Returns a List of CTID qual expressions (with implicit OR semantics
@@ -174,6 +263,8 @@ TidQualFromExpr(Node *expr, int varno)
 		/* base case: check for tideq opclause */
 		if (IsTidEqualClause((OpExpr *) expr, varno))
 			rlst = list_make1(expr);
+		else if (IsTidRangeClause((OpExpr *) expr, varno))
+			rlst = list_make1(expr);
 	}
 	else if (expr && IsA(expr, ScalarArrayOpExpr))
 	{
@@ -189,11 +280,18 @@ TidQualFromExpr(Node *expr, int varno)
 	}
 	else if (and_clause(expr))
 	{
-		foreach(l, ((BoolExpr *) expr)->args)
+		/* look for a range qual in the clause */
+		rlst = TidCompoundRangeQualFromExpr(expr, varno);
+
+		/* if no range qual was found, look for any other TID qual */
+		if (!rlst)
 		{
-			rlst = TidQualFromExpr((Node *) lfirst(l), varno);
-			if (rlst)
-				break;
+			foreach(l, ((BoolExpr *) expr)->args)
+			{
+				rlst = TidQualFromExpr((Node *) lfirst(l), varno);
+				if (rlst)
+					break;
+			}
 		}
 	}
 	else if (or_clause(expr))
@@ -217,17 +315,28 @@ TidQualFromExpr(Node *expr, int varno)
 }
 
 /*
- *	Extract a set of CTID conditions from the rel's baserestrictinfo list
+ * Extract a set of CTID conditions from the rel's baserestrictinfo list
+ *
+ * Normally we just use the first RestrictInfo item with some usable quals,
+ * but it's also possible for a good compound range qual, such as
+ * "CTID > ? AND CTID < ?", to be split across two items.  So we look for
+ * lower/upper bound range quals in all items and use them if any were found.
+ * In principal there might be more than one lower or upper bound), but we
+ * just use the first one found of each type.
  */
 static List *
 TidQualFromBaseRestrictinfo(RelOptInfo *rel)
 {
 	List	   *rlst = NIL;
 	ListCell   *l;
+	bool		found_lower = false;
+	bool		found_upper = false;
+	List	   *found_quals = NIL;
 
 	foreach(l, rel->baserestrictinfo)
 	{
 		RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+		Node	   *clause = (Node *) rinfo->clause;
 
 		/*
 		 * If clause must wait till after some lower-security-level
@@ -236,10 +345,31 @@ TidQualFromBaseRestrictinfo(RelOptInfo *rel)
 		if (!restriction_is_securely_promotable(rinfo, rel))
 			continue;
 
-		rlst = TidQualFromExpr((Node *) rinfo->clause, rel->relid);
+		/* Look for lower and upper bound range quals. */
+		if (!found_lower && IsUsableRangeQual((Node *) clause, rel->relid, true))
+		{
+			found_lower = true;
+			found_quals = lappend(found_quals, clause);
+			continue;
+		}
+
+		if (!found_upper && IsUsableRangeQual((Node *) clause, rel->relid, false))
+		{
+			found_upper = true;
+			found_quals = lappend(found_quals, clause);
+			continue;
+		}
+
+		/* Look for other TID quals. */
+		rlst = TidQualFromExpr((Node *) clause, rel->relid);
 		if (rlst)
 			break;
 	}
+
+	/* Use a range qual if any were found. */
+	if (found_quals)
+		rlst = MakeTidRangeQuals(found_quals);
+
 	return rlst;
 }
 
@@ -247,12 +377,16 @@ TidQualFromBaseRestrictinfo(RelOptInfo *rel)
  * create_tidscan_paths
  *	  Create paths corresponding to direct TID scans of the given rel.
  *
+ *	  Path keys and direction will be set on the scans if it looks useful.
+ *
  *	  Candidate paths are added to the rel's pathlist (using add_path).
  */
 void
 create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 {
 	Relids		required_outer;
+	List	   *pathkeys = NULL;
+	ScanDirection direction = ForwardScanDirection;
 	List	   *tidquals;
 
 	/*
@@ -262,9 +396,39 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 	 */
 	required_outer = rel->lateral_relids;
 
+	/*
+	 * Try to determine the best scan direction and create some useful
+	 * pathkeys.
+	 */
+	if (has_useful_pathkeys(root, rel))
+	{
+		/*
+		 * Build path keys corresponding to ORDER BY ctid ASC, and check
+		 * whether they will be useful for this scan.  If not, build path keys
+		 * for DESC, and try that; set the direction to BackwardScanDirection
+		 * if so.  If neither of them will be useful, no path keys will be
+		 * set.
+		 */
+		pathkeys = build_tidscan_pathkeys(root, rel, ForwardScanDirection);
+		if (!pathkeys_contained_in(pathkeys, root->query_pathkeys))
+		{
+			pathkeys = build_tidscan_pathkeys(root, rel, BackwardScanDirection);
+			if (pathkeys_contained_in(pathkeys, root->query_pathkeys))
+				direction = BackwardScanDirection;
+			else
+				pathkeys = NULL;
+		}
+	}
+
 	tidquals = TidQualFromBaseRestrictinfo(rel);
 
-	if (tidquals)
-		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals,
-												   required_outer));
+	/*
+	 * If there are tidquals or some useful pathkeys were found, then it's
+	 * worth generating a tidscan path.
+	 */
+	if (tidquals || pathkeys)
+	{
+		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals, pathkeys,
+												   direction, required_outer));
+	}
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ae41c9e..5452730 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -185,7 +185,7 @@ static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 					 List *bitmapqualorig,
 					 Index scanrelid);
 static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
-			 List *tidquals);
+			 List *tidquals, ScanDirection direction);
 static SubqueryScan *make_subqueryscan(List *qptlist,
 				  List *qpqual,
 				  Index scanrelid,
@@ -3086,6 +3086,21 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	}
 
 	/*
+	 * In the case of a compound range qual, the two parts may have come
+	 * from different RestrictInfos.  So remove each part separately.
+	 */
+	if (list_length(tidquals) == 1)
+	{
+		Node	   *qual = linitial(tidquals);
+
+		if (and_clause(qual))
+		{
+			BoolExpr   *and_qual = ((BoolExpr *) qual);
+			scan_clauses = list_difference(scan_clauses, and_qual->args);
+		}
+	}
+
+	/*
 	 * Remove any clauses that are TID quals.  This is a bit tricky since the
 	 * tidquals list has implicit OR semantics.
 	 */
@@ -3097,7 +3112,9 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	scan_plan = make_tidscan(tlist,
 							 scan_clauses,
 							 scan_relid,
-							 tidquals);
+							 tidquals,
+							 best_path->direction
+		);
 
 	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
 
@@ -5179,7 +5196,8 @@ static TidScan *
 make_tidscan(List *qptlist,
 			 List *qpqual,
 			 Index scanrelid,
-			 List *tidquals)
+			 List *tidquals,
+			 ScanDirection direction)
 {
 	TidScan    *node = makeNode(TidScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5190,6 +5208,7 @@ make_tidscan(List *qptlist,
 	plan->righttree = NULL;
 	node->scan.scanrelid = scanrelid;
 	node->tidquals = tidquals;
+	node->direction = direction;
 
 	return node;
 }
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index c5aaaf5..e2d51a9 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1186,6 +1186,7 @@ create_bitmap_or_path(PlannerInfo *root,
  */
 TidPath *
 create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
+					List *pathkeys, ScanDirection direction,
 					Relids required_outer)
 {
 	TidPath    *pathnode = makeNode(TidPath);
@@ -1198,9 +1199,10 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 	pathnode->path.parallel_aware = false;
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = 0;
-	pathnode->path.pathkeys = NIL;	/* always unordered */
+	pathnode->path.pathkeys = pathkeys;
 
 	pathnode->tidquals = tidquals;
+	pathnode->direction = direction;
 
 	cost_tidscan(&pathnode->path, root, rel, tidquals,
 				 pathnode->path.param_info);
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index b8c0e03..eaacab7 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -572,6 +572,30 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 
 	if (!HeapTupleIsValid(vardata->statsTuple))
 	{
+		/*
+		 * There are no stats for system columns, but for CTID we can estimate
+		 * based on table size.
+		 */
+		if (vardata->var && IsA(vardata->var, Var) &&
+			((Var *) vardata->var)->varattno == SelfItemPointerAttributeNumber)
+		{
+			ItemPointer itemptr;
+			BlockNumber block;
+
+			/* If the relation's empty, we're going to read all of it. */
+			if (vardata->rel->pages == 0)
+				return 1.0;
+
+			itemptr = (ItemPointer) DatumGetPointer(constval);
+			block = ItemPointerGetBlockNumberNoCheck(itemptr);
+			selec = block / (double) vardata->rel->pages;
+			if (isgt)
+				selec = 1.0 - selec;
+
+			CLAMP_PROBABILITY(selec);
+			return selec;
+		}
+
 		/* no stats available, so default result */
 		return DEFAULT_INEQ_SEL;
 	}
@@ -1786,6 +1810,15 @@ nulltestsel(PlannerInfo *root, NullTestType nulltesttype, Node *arg,
 				return (Selectivity) 0; /* keep compiler quiet */
 		}
 	}
+	else if (vardata.var && IsA(vardata.var, Var) &&
+			 ((Var *) vardata.var)->varattno == SelfItemPointerAttributeNumber)
+	{
+		/*
+		 * There are no stats for system columns, but we know CTID is never
+		 * NULL.
+		 */
+		selec = (nulltesttype == IS_NULL) ? 0.0 : 1.0;
+	}
 	else
 	{
 		/*
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index d9b6bad..cdd2cd3 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -156,15 +156,15 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 03ad516..ee6a04d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1477,6 +1477,8 @@ typedef struct BitmapHeapScanState
 	ParallelBitmapHeapState *pstate;
 } BitmapHeapScanState;
 
+typedef struct TidRange TidRange;
+
 /* ----------------
  *	 TidScanState information
  *
@@ -1493,10 +1495,11 @@ typedef struct TidScanState
 	ScanState	ss;				/* its first field is NodeTag */
 	List	   *tss_tidexprs;
 	bool		tss_isCurrentOf;
-	int			tss_NumTids;
+	int			tss_NumRanges;
 	int			tss_TidPtr;
-	ItemPointerData *tss_TidList;
-	HeapTupleData tss_htup;
+	TidRange   *tss_TidRanges;
+	bool		tss_inScan;
+	HeapTupleData tss_htup;		/* for current-of and single TID fetches */
 } TidScanState;
 
 /* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 4d9b016..2db2c07 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -492,6 +492,7 @@ typedef struct TidScan
 {
 	Scan		scan;
 	List	   *tidquals;		/* qual(s) involving CTID = something */
+	ScanDirection direction;
 } TidScan;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 118f993..8ff23f2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1228,14 +1228,24 @@ typedef struct BitmapOrPath
 /*
  * TidPath represents a scan by TID
  *
- * tidquals is an implicitly OR'ed list of qual expressions of the form
- * "CTID = pseudoconstant" or "CTID = ANY(pseudoconstant_array)".
+ * tidquals is an implicitly OR'ed list of qual expressions of the forms:
+ *   - "CTID = pseudoconstant"
+ *   - "CTID = ANY(pseudoconstant_array)"
+ *   - "CURRENT OF cursor"
+ *   - "CTID relop pseudoconstant"
+ *   - "(CTID relop pseudoconstant) AND (CTID relop pseudoconstant)"
+ *
+ * It is permissable for the CTID variable to be the LHS or RHS of operator
+ * expressions; in the last case, there is always a lower bound and upper bound,
+ * in any order.  If tidquals is empty, all CTIDs will match.
+ *
  * Note they are bare expressions, not RestrictInfos.
  */
 typedef struct TidPath
 {
 	Path		path;
-	List	   *tidquals;		/* qual(s) involving CTID = something */
+	List	   *tidquals;
+	ScanDirection direction;
 } TidPath;
 
 /*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 7c5ff22..a0a88a5 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,7 +63,8 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  RelOptInfo *rel,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
-					List *tidquals, Relids required_outer);
+					List *tidquals, List *pathkeys, ScanDirection direction,
+					Relids required_outer);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 				   List *subpaths, List *partial_subpaths,
 				   Relids required_outer,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde30..9d0699e 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -211,6 +211,9 @@ extern List *build_join_pathkeys(PlannerInfo *root,
 					RelOptInfo *joinrel,
 					JoinType jointype,
 					List *outer_pathkeys);
+extern List *build_tidscan_pathkeys(PlannerInfo *root,
+					   RelOptInfo *rel,
+					   ScanDirection direction);
 extern List *make_pathkeys_for_sortclauses(PlannerInfo *root,
 							  List *sortclauses,
 							  List *tlist);
diff --git a/src/test/regress/expected/tidscan.out b/src/test/regress/expected/tidscan.out
index 521ed1b..4b9564b 100644
--- a/src/test/regress/expected/tidscan.out
+++ b/src/test/regress/expected/tidscan.out
@@ -116,6 +116,39 @@ FETCH FIRST FROM c;
 (1 row)
 
 ROLLBACK;
+-- check that ordering on a tidscan doesn't require a sort
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ Tid Scan on tidscan
+   TID Cond: (ctid = ANY ('{"(0,2)","(0,1)","(0,3)"}'::tid[]))
+(2 rows)
+
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+ ctid  | id 
+-------+----
+ (0,1) |  1
+ (0,2) |  2
+ (0,3) |  3
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ Tid Scan Backward on tidscan
+   TID Cond: (ctid = ANY ('{"(0,2)","(0,1)","(0,3)"}'::tid[]))
+(2 rows)
+
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+ ctid  | id 
+-------+----
+ (0,3) |  3
+ (0,2) |  2
+ (0,1) |  1
+(3 rows)
+
 -- tidscan via CURRENT OF
 BEGIN;
 DECLARE c CURSOR FOR SELECT ctid, * FROM tidscan;
@@ -177,3 +210,315 @@ UPDATE tidscan SET id = -id WHERE CURRENT OF c RETURNING *;
 ERROR:  cursor "c" is not positioned on a row
 ROLLBACK;
 DROP TABLE tidscan;
+-- tests for tidrangescans
+CREATE TABLE tidrangescan(id integer, data text);
+INSERT INTO tidrangescan SELECT i,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(1,0)';
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (0,1)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,2)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,3)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,4)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,5)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,6)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,7)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,8)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid <= '(1,5)';
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid <= '(1,5)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid <= '(1,5)';
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (0,1)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,2)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,3)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,4)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,5)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,6)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,7)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,8)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,1)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,2)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,3)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,4)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,5)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(15 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(0,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid < '(0,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid | data 
+------+------
+(0 rows)
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(9,8)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid > '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(9,8)';
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (9,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (9,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(9,8)' < ctid;
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ('(9,8)'::tid < ctid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE '(9,8)' < ctid;
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (9,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (9,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(9,8)';
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid >= '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(9,8)';
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (9,8)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (9,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (9,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(100,0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid >= '(100,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(100,0)';
+ ctid | data 
+------+------
+(0 rows)
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ((ctid > '(4,4)'::tid) AND ('(4,7)'::tid >= ctid))
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+ ctid  |                                       data                                       
+-------+----------------------------------------------------------------------------------
+ (4,5) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,6) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,7) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid))
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+ ctid  |                                       data                                       
+-------+----------------------------------------------------------------------------------
+ (4,5) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,6) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,7) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(3 rows)
+
+-- combinations
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+                                        QUERY PLAN                                         
+-------------------------------------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ((('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid)) OR (ctid = '(2,2)'::tid))
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+ ctid  |                                       data                                       
+-------+----------------------------------------------------------------------------------
+ (2,2) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,5) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,6) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,7) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+                                                     QUERY PLAN                                                     
+--------------------------------------------------------------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ((('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid)) OR (ctid = '(2,2)'::tid))
+   Filter: ((('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid)) OR ((ctid = '(2,2)'::tid) AND (data = 'foo'::text)))
+(3 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+ ctid  |                                       data                                       
+-------+----------------------------------------------------------------------------------
+ (4,5) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,6) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,7) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(3 rows)
+
+-- ordering with no quals should use tid range scan
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan ORDER BY ctid ASC;
+        QUERY PLAN        
+--------------------------
+ Tid Scan on tidrangescan
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan ORDER BY ctid DESC;
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan Backward on tidrangescan
+(1 row)
+
+-- min/max
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan;
+                 QUERY PLAN                 
+--------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan on tidrangescan
+                 Filter: (ctid IS NOT NULL)
+(5 rows)
+
+SELECT MIN(ctid) FROM tidrangescan;
+  min  
+-------
+ (0,1)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan Backward on tidrangescan
+                 Filter: (ctid IS NOT NULL)
+(5 rows)
+
+SELECT MAX(ctid) FROM tidrangescan;
+  max   
+--------
+ (9,10)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+                   QUERY PLAN                    
+-------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan on tidrangescan
+                 TID Cond: (ctid > '(5,0)'::tid)
+                 Filter: (ctid IS NOT NULL)
+(6 rows)
+
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+  min  
+-------
+ (5,1)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+                   QUERY PLAN                    
+-------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan Backward on tidrangescan
+                 TID Cond: (ctid < '(5,0)'::tid)
+                 Filter: (ctid IS NOT NULL)
+(6 rows)
+
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+  max   
+--------
+ (4,10)
+(1 row)
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan_empty
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+ ctid | data 
+------+------
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan_empty
+   TID Cond: (ctid > '(9,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+ ctid | data 
+------+------
+(0 rows)
+
diff --git a/src/test/regress/sql/tidscan.sql b/src/test/regress/sql/tidscan.sql
index a8472e0..e9519ee 100644
--- a/src/test/regress/sql/tidscan.sql
+++ b/src/test/regress/sql/tidscan.sql
@@ -43,6 +43,15 @@ FETCH BACKWARD 1 FROM c;
 FETCH FIRST FROM c;
 ROLLBACK;
 
+-- check that ordering on a tidscan doesn't require a sort
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+
 -- tidscan via CURRENT OF
 BEGIN;
 DECLARE c CURSOR FOR SELECT ctid, * FROM tidscan;
@@ -64,3 +73,94 @@ UPDATE tidscan SET id = -id WHERE CURRENT OF c RETURNING *;
 ROLLBACK;
 
 DROP TABLE tidscan;
+
+-- tests for tidrangescans
+
+CREATE TABLE tidrangescan(id integer, data text);
+
+INSERT INTO tidrangescan SELECT i,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(1,0)';
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(1,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid <= '(1,5)';
+SELECT ctid, data FROM tidrangescan WHERE ctid <= '(1,5)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(0,0)';
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(9,8)';
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(9,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(9,8)' < ctid;
+SELECT ctid, data FROM tidrangescan WHERE '(9,8)' < ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(9,8)';
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(9,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(100,0)';
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(100,0)';
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+
+-- combinations
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+
+-- ordering with no quals should use tid range scan
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan ORDER BY ctid ASC;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan ORDER BY ctid DESC;
+
+-- min/max
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan;
+SELECT MIN(ctid) FROM tidrangescan;
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan;
+SELECT MAX(ctid) FROM tidrangescan;
+
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';

Edmund Horner

ejrh00@gmail.com

over 7 years ago

In reply to: Edmund Horner (#8)

1 attachment(s)

Re: Tid scan improvements

On Fri, 28 Sep 2018 at 17:02, Edmund Horner <ejrh00@gmail.com> wrote:

I did run pgindent over it though. :)

But I didn't check if it still applied to master. Sigh. Here's one that does.

Attachments:

tid_scan_improvements-v2.patchapplication/x-patch; name=tid_scan_improvements-v2.patchDownload

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3395445..e89343f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -575,11 +575,18 @@ heapgettup(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_syncscan = false;
+
 			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
+			if (scan->rs_numblocks == InvalidBlockNumber)
+			{
+				if (scan->rs_startblock > 0)
+					page = scan->rs_startblock - 1;
+				else
+					page = scan->rs_nblocks - 1;
+			}
 			else
-				page = scan->rs_nblocks - 1;
+				page = scan->rs_startblock + scan->rs_numblocks - 1;
+
 			heapgetpage(scan, page);
 		}
 		else
@@ -876,11 +883,18 @@ heapgettup_pagemode(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_syncscan = false;
+
 			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
+			if (scan->rs_numblocks == InvalidBlockNumber)
+			{
+				if (scan->rs_startblock > 0)
+					page = scan->rs_startblock - 1;
+				else
+					page = scan->rs_nblocks - 1;
+			}
 			else
-				page = scan->rs_nblocks - 1;
+				page = scan->rs_startblock + scan->rs_numblocks - 1;
+
 			heapgetpage(scan, page);
 		}
 		else
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ed6afe7..aed7016 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -111,6 +111,7 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
 static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static void show_scan_direction(ExplainState *es, ScanDirection direction);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
 static void ExplainScanTarget(Scan *plan, ExplainState *es);
@@ -1245,7 +1246,6 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SeqScan:
 		case T_SampleScan:
 		case T_BitmapHeapScan:
-		case T_TidScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1254,6 +1254,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_WorkTableScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_TidScan:
+			show_scan_direction(es, ((TidScan *) plan)->direction);
+			ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_ForeignScan:
 		case T_CustomScan:
 			if (((Scan *) plan)->scanrelid > 0)
@@ -2867,25 +2871,21 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
 }
 
 /*
- * Add some additional details about an IndexScan or IndexOnlyScan
+ * Show the direction of a scan.
  */
 static void
-ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
-						ExplainState *es)
+show_scan_direction(ExplainState *es, ScanDirection direction)
 {
-	const char *indexname = explain_get_index_name(indexid);
-
 	if (es->format == EXPLAIN_FORMAT_TEXT)
 	{
-		if (ScanDirectionIsBackward(indexorderdir))
+		if (ScanDirectionIsBackward(direction))
 			appendStringInfoString(es->str, " Backward");
-		appendStringInfo(es->str, " using %s", indexname);
 	}
 	else
 	{
 		const char *scandir;
 
-		switch (indexorderdir)
+		switch (direction)
 		{
 			case BackwardScanDirection:
 				scandir = "Backward";
@@ -2901,8 +2901,24 @@ ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 				break;
 		}
 		ExplainPropertyText("Scan Direction", scandir, es);
+	}
+}
+
+/*
+ * Add some additional details about an IndexScan or IndexOnlyScan
+ */
+static void
+ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
+						ExplainState *es)
+{
+	const char *indexname = explain_get_index_name(indexid);
+
+	show_scan_direction(es, indexorderdir);
+
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+		appendStringInfo(es->str, " using %s", indexname);
+	else
 		ExplainPropertyText("Index Name", indexname, es);
-	}
 }
 
 /*
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 0cb1946..9b455d8 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -22,7 +22,9 @@
  */
 #include "postgres.h"
 
+#include "access/relscan.h"
 #include "access/sysattr.h"
+#include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "executor/execdebug.h"
 #include "executor/nodeTidscan.h"
@@ -39,21 +41,78 @@
 	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
 	 ((Var *) (node))->varlevelsup == 0)
 
+typedef enum
+{
+	TIDEXPR_CURRENT_OF,
+	TIDEXPR_IN_ARRAY,
+	TIDEXPR_EQ,
+	TIDEXPR_LT,
+	TIDEXPR_GT,
+	TIDEXPR_BETWEEN,
+	TIDEXPR_ANY
+}			TidExprType;
+
 /* one element in tss_tidexprs */
 typedef struct TidExpr
 {
+	TidExprType type;
 	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
-	bool		isarray;		/* if true, it yields tid[] not just tid */
-	CurrentOfExpr *cexpr;		/* alternatively, we can have CURRENT OF */
+	ExprState  *exprstate2;		/* For TIDEXPR_BETWEEN */
+	CurrentOfExpr *cexpr;		/* For TIDEXPR_CURRENT_OF */
+	bool		strict;			/* Indicates < rather than <=, or > rather */
+	bool		strict2;		/* than >= */
 } TidExpr;
 
+typedef struct TidRange
+{
+	ItemPointerData first;
+	ItemPointerData last;
+}			TidRange;
+
+static ExprState *MakeTidOpExprState(OpExpr *expr, TidScanState *tidstate, bool *strict, bool *invert);
 static void TidExprListCreate(TidScanState *tidstate);
+static TidRange * EnlargeTidRangeArray(TidRange * tidRanges, int numRanges, int *numAllocRanges);
+static bool SetTidLowerBound(ItemPointer tid, bool strict, int nblocks, ItemPointer lowerBound);
+static bool SetTidUpperBound(ItemPointer tid, bool strict, int nblocks, ItemPointer upperBound);
 static void TidListEval(TidScanState *tidstate);
+static bool MergeTidRanges(TidRange * a, TidRange * b);
 static int	itemptr_comparator(const void *a, const void *b);
+static int	tidrange_comparator(const void *a, const void *b);
+static HeapScanDesc BeginTidRangeScan(TidScanState *node, TidRange * range);
+static HeapTuple NextInTidRange(HeapScanDesc scandesc, ScanDirection direction, TidRange * range);
 static TupleTableSlot *TidNext(TidScanState *node);
 
 
 /*
+ * Create an ExprState corresponding to the value part of a TID comparison.
+ * If the comparison operator is > or <, strict is set.
+ * If the comparison is of the form VALUE op CTID, then invert is set.
+ */
+static ExprState *
+MakeTidOpExprState(OpExpr *expr, TidScanState *tidstate, bool *strict, bool *invert)
+{
+	Node	   *arg1 = get_leftop((Expr *) expr);
+	Node	   *arg2 = get_rightop((Expr *) expr);
+	ExprState  *exprstate = NULL;
+
+	*invert = false;
+
+	if (IsCTIDVar(arg1))
+		exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
+	else if (IsCTIDVar(arg2))
+	{
+		exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
+		*invert = true;
+	}
+	else
+		elog(ERROR, "could not identify CTID variable");
+
+	*strict = expr->opno == TIDLessOperator || expr->opno == TIDGreaterOperator;
+
+	return exprstate;
+}
+
+/*
  * Extract the qual subexpressions that yield TIDs to search for,
  * and compile them into ExprStates if they're ordinary expressions.
  *
@@ -69,6 +128,14 @@ TidExprListCreate(TidScanState *tidstate)
 	tidstate->tss_tidexprs = NIL;
 	tidstate->tss_isCurrentOf = false;
 
+	if (!node->tidquals)
+	{
+		TidExpr    *tidexpr = (TidExpr *) palloc0(sizeof(TidExpr));
+
+		tidexpr->type = TIDEXPR_ANY;
+		tidstate->tss_tidexprs = lappend(tidstate->tss_tidexprs, tidexpr);
+	}
+
 	foreach(l, node->tidquals)
 	{
 		Expr	   *expr = (Expr *) lfirst(l);
@@ -76,20 +143,16 @@ TidExprListCreate(TidScanState *tidstate)
 
 		if (is_opclause(expr))
 		{
-			Node	   *arg1;
-			Node	   *arg2;
+			OpExpr	   *opexpr = (OpExpr *) expr;
+			bool		invert;
 
-			arg1 = get_leftop(expr);
-			arg2 = get_rightop(expr);
-			if (IsCTIDVar(arg1))
-				tidexpr->exprstate = ExecInitExpr((Expr *) arg2,
-												  &tidstate->ss.ps);
-			else if (IsCTIDVar(arg2))
-				tidexpr->exprstate = ExecInitExpr((Expr *) arg1,
-												  &tidstate->ss.ps);
+			tidexpr->exprstate = MakeTidOpExprState(opexpr, tidstate, &tidexpr->strict, &invert);
+			if (opexpr->opno == TIDLessOperator || opexpr->opno == TIDLessEqOperator)
+				tidexpr->type = invert ? TIDEXPR_GT : TIDEXPR_LT;
+			else if (opexpr->opno == TIDGreaterOperator || opexpr->opno == TIDGreaterEqOperator)
+				tidexpr->type = invert ? TIDEXPR_LT : TIDEXPR_GT;
 			else
-				elog(ERROR, "could not identify CTID variable");
-			tidexpr->isarray = false;
+				tidexpr->type = TIDEXPR_EQ;
 		}
 		else if (expr && IsA(expr, ScalarArrayOpExpr))
 		{
@@ -98,15 +161,46 @@ TidExprListCreate(TidScanState *tidstate)
 			Assert(IsCTIDVar(linitial(saex->args)));
 			tidexpr->exprstate = ExecInitExpr(lsecond(saex->args),
 											  &tidstate->ss.ps);
-			tidexpr->isarray = true;
+			tidexpr->type = TIDEXPR_IN_ARRAY;
 		}
 		else if (expr && IsA(expr, CurrentOfExpr))
 		{
 			CurrentOfExpr *cexpr = (CurrentOfExpr *) expr;
 
 			tidexpr->cexpr = cexpr;
+			tidexpr->type = TIDEXPR_CURRENT_OF;
 			tidstate->tss_isCurrentOf = true;
 		}
+		else if (and_clause((Node *) expr))
+		{
+			OpExpr	   *arg1;
+			OpExpr	   *arg2;
+			bool		invert;
+			bool		invert2;
+
+			Assert(list_length(((BoolExpr *) expr)->args) == 2);
+			arg1 = (OpExpr *) linitial(((BoolExpr *) expr)->args);
+			arg2 = (OpExpr *) lsecond(((BoolExpr *) expr)->args);
+			tidexpr->exprstate = MakeTidOpExprState(arg1, tidstate, &tidexpr->strict, &invert);
+			tidexpr->exprstate2 = MakeTidOpExprState(arg2, tidstate, &tidexpr->strict2, &invert2);
+
+			/* If the LHS is not the lower bound, swap them. */
+			if (invert == (arg1->opno == TIDGreaterOperator || arg1->opno == TIDGreaterEqOperator))
+			{
+				bool		temp_strict;
+				ExprState  *temp_es;
+
+				temp_es = tidexpr->exprstate;
+				tidexpr->exprstate = tidexpr->exprstate2;
+				tidexpr->exprstate2 = temp_es;
+
+				temp_strict = tidexpr->strict;
+				tidexpr->strict = tidexpr->strict2;
+				tidexpr->strict2 = temp_strict;
+			}
+
+			tidexpr->type = TIDEXPR_BETWEEN;
+		}
 		else
 			elog(ERROR, "could not identify CTID expression");
 
@@ -118,6 +212,113 @@ TidExprListCreate(TidScanState *tidstate)
 		   !tidstate->tss_isCurrentOf);
 }
 
+static TidRange *
+EnlargeTidRangeArray(TidRange * tidRanges, int numRanges, int *numAllocRanges)
+{
+	if (numRanges >= *numAllocRanges)
+	{
+		*numAllocRanges *= 2;
+		tidRanges = (TidRange *)
+			repalloc(tidRanges,
+					 *numAllocRanges * sizeof(TidRange));
+	}
+	return tidRanges;
+}
+
+/*
+ * Set a lower bound tid, taking into account the strictness of the bound.
+ * Return false if the lower bound is outside the size of the table.
+ */
+static bool
+SetTidLowerBound(ItemPointer tid, bool strict, int nblocks, ItemPointer lowerBound)
+{
+	OffsetNumber offset;
+
+	if (tid == NULL)
+	{
+		ItemPointerSetBlockNumber(lowerBound, 0);
+		ItemPointerSetOffsetNumber(lowerBound, 1);
+		return true;
+	}
+
+	if (ItemPointerGetBlockNumberNoCheck(tid) > nblocks)
+		return false;
+
+	*lowerBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	if (strict)
+		ItemPointerSetOffsetNumber(lowerBound, OffsetNumberNext(offset));
+	else if (offset == 0)
+		ItemPointerSetOffsetNumber(lowerBound, 1);
+
+	return true;
+}
+
+/*
+ * Set an upper bound tid, taking into account the strictness of the bound.
+ * Return false if the bound excludes anything from the table.
+ */
+static bool
+SetTidUpperBound(ItemPointer tid, bool strict, int nblocks, ItemPointer upperBound)
+{
+	OffsetNumber offset;
+
+	/* If the table is empty, the range must be empty. */
+	if (nblocks == 0)
+		return false;
+
+	if (tid == NULL)
+	{
+		ItemPointerSetBlockNumber(upperBound, nblocks - 1);
+		ItemPointerSetOffsetNumber(upperBound, MaxOffsetNumber);
+		return true;
+	}
+
+	*upperBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	/*
+	 * If the expression was non-strict (<=) and the offset is 0, then just
+	 * pretend it was strict, because offset 0 doesn't exist and we may as
+	 * well exclude that block.
+	 */
+	if (!strict && offset == 0)
+		strict = true;
+
+	if (strict)
+	{
+		if (offset == 0)
+		{
+			BlockNumber block = ItemPointerGetBlockNumberNoCheck(upperBound);
+
+			/*
+			 * If the upper bound was already block 0, then there is no valid
+			 * range.
+			 */
+			if (block == 0)
+				return false;
+
+			ItemPointerSetBlockNumber(upperBound, block - 1);
+			ItemPointerSetOffsetNumber(upperBound, MaxOffsetNumber);
+		}
+		else
+			ItemPointerSetOffsetNumber(upperBound, OffsetNumberPrev(offset));
+	}
+
+	/*
+	 * If the upper bound is beyond the last block of the table, truncate it
+	 * to the last TID of the last block.
+	 */
+	if (ItemPointerGetBlockNumberNoCheck(upperBound) > nblocks)
+	{
+		ItemPointerSetBlockNumber(upperBound, nblocks - 1);
+		ItemPointerSetOffsetNumber(upperBound, MaxOffsetNumber);
+	}
+
+	return true;
+}
+
 /*
  * Compute the list of TIDs to be visited, by evaluating the expressions
  * for them.
@@ -129,9 +330,9 @@ TidListEval(TidScanState *tidstate)
 {
 	ExprContext *econtext = tidstate->ss.ps.ps_ExprContext;
 	BlockNumber nblocks;
-	ItemPointerData *tidList;
-	int			numAllocTids;
-	int			numTids;
+	TidRange   *tidRanges;
+	int			numAllocRanges;
+	int			numRanges;
 	ListCell   *l;
 
 	/*
@@ -147,10 +348,9 @@ TidListEval(TidScanState *tidstate)
 	 * are simple OpExprs or CurrentOfExprs.  If there are any
 	 * ScalarArrayOpExprs, we may have to enlarge the array.
 	 */
-	numAllocTids = list_length(tidstate->tss_tidexprs);
-	tidList = (ItemPointerData *)
-		palloc(numAllocTids * sizeof(ItemPointerData));
-	numTids = 0;
+	numAllocRanges = list_length(tidstate->tss_tidexprs);
+	tidRanges = (TidRange *) palloc0(numAllocRanges * sizeof(TidRange));
+	numRanges = 0;
 
 	foreach(l, tidstate->tss_tidexprs)
 	{
@@ -158,7 +358,7 @@ TidListEval(TidScanState *tidstate)
 		ItemPointer itemptr;
 		bool		isNull;
 
-		if (tidexpr->exprstate && !tidexpr->isarray)
+		if (tidexpr->exprstate && tidexpr->type == TIDEXPR_EQ)
 		{
 			itemptr = (ItemPointer)
 				DatumGetPointer(ExecEvalExprSwitchContext(tidexpr->exprstate,
@@ -168,17 +368,76 @@ TidListEval(TidScanState *tidstate)
 				ItemPointerIsValid(itemptr) &&
 				ItemPointerGetBlockNumber(itemptr) < nblocks)
 			{
-				if (numTids >= numAllocTids)
-				{
-					numAllocTids *= 2;
-					tidList = (ItemPointerData *)
-						repalloc(tidList,
-								 numAllocTids * sizeof(ItemPointerData));
-				}
-				tidList[numTids++] = *itemptr;
+				tidRanges = EnlargeTidRangeArray(tidRanges, numRanges, &numAllocRanges);
+				tidRanges[numRanges].first = *itemptr;
+				tidRanges[numRanges].last = *itemptr;
+				numRanges++;
 			}
 		}
-		else if (tidexpr->exprstate && tidexpr->isarray)
+		else if (tidexpr->exprstate && tidexpr->type == TIDEXPR_LT)
+		{
+			bool		upper_isNull;
+			ItemPointer upper_itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidexpr->exprstate,
+													  econtext,
+													  &upper_isNull));
+
+			if (upper_isNull)
+				continue;
+
+			tidRanges = EnlargeTidRangeArray(tidRanges, numRanges, &numAllocRanges);
+
+			SetTidLowerBound(NULL, false, nblocks, &tidRanges[numRanges].first);
+			if (SetTidUpperBound(upper_itemptr, tidexpr->strict, nblocks, &tidRanges[numRanges].last))
+				numRanges++;
+		}
+		else if (tidexpr->exprstate && tidexpr->type == TIDEXPR_GT)
+		{
+			bool		lower_isNull;
+			ItemPointer lower_itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidexpr->exprstate,
+													  econtext,
+													  &lower_isNull));
+
+			if (lower_isNull)
+				continue;
+
+			tidRanges = EnlargeTidRangeArray(tidRanges, numRanges, &numAllocRanges);
+
+			if (SetTidLowerBound(lower_itemptr, tidexpr->strict, nblocks, &tidRanges[numRanges].first) &&
+				SetTidUpperBound(NULL, false, nblocks, &tidRanges[numRanges].last))
+				numRanges++;
+		}
+		else if (tidexpr->exprstate && tidexpr->type == TIDEXPR_BETWEEN)
+		{
+			bool		lower_isNull,
+						upper_isNull;
+			ItemPointer lower_itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidexpr->exprstate,
+													  econtext,
+													  &lower_isNull));
+			ItemPointer upper_itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidexpr->exprstate2,
+													  econtext,
+													  &upper_isNull));
+
+			if (lower_isNull || upper_isNull)
+				continue;
+
+			tidRanges = EnlargeTidRangeArray(tidRanges, numRanges, &numAllocRanges);
+
+			if (SetTidLowerBound(lower_itemptr, tidexpr->strict, nblocks, &tidRanges[numRanges].first) &&
+				SetTidUpperBound(upper_itemptr, tidexpr->strict2, nblocks, &tidRanges[numRanges].last))
+				numRanges++;
+		}
+		else if (tidexpr->type == TIDEXPR_ANY)
+		{
+			tidRanges = EnlargeTidRangeArray(tidRanges, numRanges, &numAllocRanges);
+			SetTidLowerBound(NULL, false, nblocks, &tidRanges[numRanges].first);
+			SetTidUpperBound(NULL, false, nblocks, &tidRanges[numRanges].last);
+			numRanges++;
+		}
+		else if (tidexpr->exprstate && tidexpr->type == TIDEXPR_IN_ARRAY)
 		{
 			Datum		arraydatum;
 			ArrayType  *itemarray;
@@ -196,12 +455,12 @@ TidListEval(TidScanState *tidstate)
 			deconstruct_array(itemarray,
 							  TIDOID, sizeof(ItemPointerData), false, 's',
 							  &ipdatums, &ipnulls, &ndatums);
-			if (numTids + ndatums > numAllocTids)
+			if (numRanges + ndatums > numAllocRanges)
 			{
-				numAllocTids = numTids + ndatums;
-				tidList = (ItemPointerData *)
-					repalloc(tidList,
-							 numAllocTids * sizeof(ItemPointerData));
+				numAllocRanges = numRanges + ndatums;
+				tidRanges = (TidRange *)
+					repalloc(tidRanges,
+							 numAllocRanges * sizeof(TidRange));
 			}
 			for (i = 0; i < ndatums; i++)
 			{
@@ -210,13 +469,15 @@ TidListEval(TidScanState *tidstate)
 					itemptr = (ItemPointer) DatumGetPointer(ipdatums[i]);
 					if (ItemPointerIsValid(itemptr) &&
 						ItemPointerGetBlockNumber(itemptr) < nblocks)
-						tidList[numTids++] = *itemptr;
+						tidRanges[numRanges].first = *itemptr;
+					tidRanges[numRanges].last = *itemptr;
+					numRanges++;
 				}
 			}
 			pfree(ipdatums);
 			pfree(ipnulls);
 		}
-		else
+		else if (tidexpr->type == TIDEXPR_CURRENT_OF)
 		{
 			ItemPointerData cursor_tid;
 
@@ -225,16 +486,20 @@ TidListEval(TidScanState *tidstate)
 							  RelationGetRelid(tidstate->ss.ss_currentRelation),
 							  &cursor_tid))
 			{
-				if (numTids >= numAllocTids)
-				{
-					numAllocTids *= 2;
-					tidList = (ItemPointerData *)
-						repalloc(tidList,
-								 numAllocTids * sizeof(ItemPointerData));
-				}
-				tidList[numTids++] = cursor_tid;
+				/*
+				 * A current-of TidExpr only exists by itself, and we should
+				 * already have allocated a tidList entry for it.  We don't
+				 * need to check whether the tidList array needs to be
+				 * resized.
+				 */
+				Assert(numRanges < numAllocRanges);
+				tidRanges[numRanges].first = cursor_tid;
+				tidRanges[numRanges].last = cursor_tid;
+				numRanges++;
 			}
 		}
+		else
+			Assert(false);
 	}
 
 	/*
@@ -243,31 +508,55 @@ TidListEval(TidScanState *tidstate)
 	 * the list.  Sorting makes it easier to detect duplicates, and as a bonus
 	 * ensures that we will visit the heap in the most efficient way.
 	 */
-	if (numTids > 1)
+	if (numRanges > 1)
 	{
-		int			lastTid;
+		int			lastRange;
 		int			i;
 
 		/* CurrentOfExpr could never appear OR'd with something else */
 		Assert(!tidstate->tss_isCurrentOf);
 
-		qsort((void *) tidList, numTids, sizeof(ItemPointerData),
-			  itemptr_comparator);
-		lastTid = 0;
-		for (i = 1; i < numTids; i++)
+		qsort((void *) tidRanges, numRanges, sizeof(TidRange), tidrange_comparator);
+		lastRange = 0;
+		for (i = 1; i < numRanges; i++)
 		{
-			if (!ItemPointerEquals(&tidList[lastTid], &tidList[i]))
-				tidList[++lastTid] = tidList[i];
+			if (!MergeTidRanges(&tidRanges[lastRange], &tidRanges[i]))
+				tidRanges[++lastRange] = tidRanges[i];
 		}
-		numTids = lastTid + 1;
+		numRanges = lastRange + 1;
 	}
 
-	tidstate->tss_TidList = tidList;
-	tidstate->tss_NumTids = numTids;
+	tidstate->tss_TidRanges = tidRanges;
+	tidstate->tss_NumRanges = numRanges;
 	tidstate->tss_TidPtr = -1;
 }
 
 /*
+ * If two ranges overlap, merge them into one.
+ * Assumes the two ranges are already ordered by (first, last).
+ * Returns true if they were merged.
+ */
+static bool
+MergeTidRanges(TidRange * a, TidRange * b)
+{
+	ItemPointerData a_last = a->last;
+	ItemPointerData b_last;
+
+	if (!ItemPointerIsValid(&a_last))
+		a_last = a->first;
+
+	if (itemptr_comparator(&a_last, &b->first) <= 0)
+		return false;
+
+	b_last = b->last;
+	if (!ItemPointerIsValid(&b_last))
+		b_last = b->first;
+
+	a->last = b->last;
+	return true;
+}
+
+/*
  * qsort comparator for ItemPointerData items
  */
 static int
@@ -291,6 +580,86 @@ itemptr_comparator(const void *a, const void *b)
 	return 0;
 }
 
+/*
+ * qsort comparator for TidRange items
+ */
+static int
+tidrange_comparator(const void *a, const void *b)
+{
+	const		TidRange *tra = (const TidRange *) a;
+	const		TidRange *trb = (const TidRange *) b;
+	int			cmp_first = itemptr_comparator(&tra->first, &trb->first);
+
+	if (cmp_first != 0)
+		return cmp_first;
+	else
+		return itemptr_comparator(&tra->last, &trb->last);
+}
+
+static HeapScanDesc
+BeginTidRangeScan(TidScanState *node, TidRange * range)
+{
+	HeapScanDesc scandesc = node->ss.ss_currentScanDesc;
+	BlockNumber first_block = ItemPointerGetBlockNumberNoCheck(&range->first);
+	BlockNumber last_block = ItemPointerGetBlockNumberNoCheck(&range->last);
+
+	if (!scandesc)
+	{
+		EState	   *estate = node->ss.ps.state;
+
+		scandesc = heap_beginscan_strat(node->ss.ss_currentRelation,
+										estate->es_snapshot,
+										0, NULL,
+										false, false);
+		node->ss.ss_currentScanDesc = scandesc;
+	}
+	else
+		heap_rescan(scandesc, NULL);
+
+	heap_setscanlimits(scandesc, first_block, last_block - first_block + 1);
+	node->tss_inScan = true;
+	return scandesc;
+}
+
+static HeapTuple
+NextInTidRange(HeapScanDesc scandesc, ScanDirection direction, TidRange * range)
+{
+	BlockNumber first_block = ItemPointerGetBlockNumber(&range->first);
+	OffsetNumber first_offset = ItemPointerGetOffsetNumber(&range->first);
+	BlockNumber last_block = ItemPointerGetBlockNumber(&range->last);
+	OffsetNumber last_offset = ItemPointerGetOffsetNumber(&range->last);
+	HeapTuple	tuple;
+
+	for (;;)
+	{
+		BlockNumber block;
+		OffsetNumber offset;
+
+		tuple = heap_getnext(scandesc, direction);
+		if (!tuple)
+			break;
+
+		/* Check that the tuple is within the required range. */
+		block = ItemPointerGetBlockNumber(&tuple->t_self);
+		offset = ItemPointerGetOffsetNumber(&tuple->t_self);
+
+		/*
+		 * TODO if scanning forward, can stop as soon as we see a tuple
+		 * greater than last_offset
+		 */
+		/* similarly with backward, less than, first_offset */
+		if (block == first_block && offset < first_offset)
+			continue;
+
+		if (block == last_block && offset > last_offset)
+			continue;
+
+		break;
+	}
+
+	return tuple;
+}
+
 /* ----------------------------------------------------------------
  *		TidNext
  *
@@ -302,6 +671,7 @@ itemptr_comparator(const void *a, const void *b)
 static TupleTableSlot *
 TidNext(TidScanState *node)
 {
+	HeapScanDesc scandesc;
 	EState	   *estate;
 	ScanDirection direction;
 	Snapshot	snapshot;
@@ -309,105 +679,149 @@ TidNext(TidScanState *node)
 	HeapTuple	tuple;
 	TupleTableSlot *slot;
 	Buffer		buffer = InvalidBuffer;
-	ItemPointerData *tidList;
-	int			numTids;
+	int			numRanges;
 	bool		bBackward;
 
 	/*
 	 * extract necessary information from tid scan node
 	 */
+	scandesc = node->ss.ss_currentScanDesc;
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	snapshot = estate->es_snapshot;
 	heapRelation = node->ss.ss_currentRelation;
 	slot = node->ss.ss_ScanTupleSlot;
 
-	/*
-	 * First time through, compute the list of TIDs to be visited
-	 */
-	if (node->tss_TidList == NULL)
+	/* First time through, compute the list of TID ranges to be visited */
+	if (node->tss_TidRanges == NULL)
+	{
 		TidListEval(node);
 
-	tidList = node->tss_TidList;
-	numTids = node->tss_NumTids;
+		node->tss_TidPtr = -1;
+	}
 
-	/*
-	 * We use node->tss_htup as the tuple pointer; note this can't just be a
-	 * local variable here, as the scan tuple slot will keep a pointer to it.
-	 */
-	tuple = &(node->tss_htup);
+	numRanges = node->tss_NumRanges;
 
-	/*
-	 * Initialize or advance scan position, depending on direction.
-	 */
-	bBackward = ScanDirectionIsBackward(direction);
-	if (bBackward)
+	/* If the plan direction is backward, invert the direction. */
+	if (ScanDirectionIsBackward(((TidScan *) node->ss.ps.plan)->direction))
 	{
-		if (node->tss_TidPtr < 0)
-		{
-			/* initialize for backward scan */
-			node->tss_TidPtr = numTids - 1;
-		}
-		else
-			node->tss_TidPtr--;
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
 	}
-	else
+
+	tuple = NULL;
+	for (;;)
 	{
-		if (node->tss_TidPtr < 0)
+		TidRange   *currentRange;
+
+		if (!node->tss_inScan)
 		{
-			/* initialize for forward scan */
-			node->tss_TidPtr = 0;
+			/* Initialize or advance scan position, depending on direction. */
+			bBackward = ScanDirectionIsBackward(direction);
+			if (bBackward)
+			{
+				if (node->tss_TidPtr < 0)
+				{
+					/* initialize for backward scan */
+					node->tss_TidPtr = numRanges - 1;
+				}
+				else
+					node->tss_TidPtr--;
+			}
+			else
+			{
+				if (node->tss_TidPtr < 0)
+				{
+					/* initialize for forward scan */
+					node->tss_TidPtr = 0;
+				}
+				else
+					node->tss_TidPtr++;
+			}
 		}
-		else
-			node->tss_TidPtr++;
-	}
 
-	while (node->tss_TidPtr >= 0 && node->tss_TidPtr < numTids)
-	{
-		tuple->t_self = tidList[node->tss_TidPtr];
+		if (node->tss_TidPtr >= numRanges || node->tss_TidPtr < 0)
+			break;
+
+		currentRange = &node->tss_TidRanges[node->tss_TidPtr];
 
-		/*
-		 * For WHERE CURRENT OF, the tuple retrieved from the cursor might
-		 * since have been updated; if so, we should fetch the version that is
-		 * current according to our snapshot.
-		 */
+		/* TODO ranges of size 1 should also use a simple tuple fetch */
 		if (node->tss_isCurrentOf)
-			heap_get_latest_tid(heapRelation, snapshot, &tuple->t_self);
-
-		if (heap_fetch(heapRelation, snapshot, tuple, &buffer, false, NULL))
 		{
 			/*
-			 * Store the scanned tuple in the scan tuple slot of the scan
-			 * state.  Eventually we will only do this and not return a tuple.
+			 * We use node->tss_htup as the tuple pointer; note this can't
+			 * just be a local variable here, as the scan tuple slot will keep
+			 * a pointer to it.
 			 */
-			ExecStoreBufferHeapTuple(tuple,	/* tuple to store */
-									 slot,	/* slot to store in */
-									 buffer);	/* buffer associated with
-												 * tuple */
+			tuple = &(node->tss_htup);
+			tuple->t_self = currentRange->first;
 
 			/*
-			 * At this point we have an extra pin on the buffer, because
-			 * ExecStoreHeapTuple incremented the pin count. Drop our local
-			 * pin.
+			 * For WHERE CURRENT OF, the tuple retrieved from the cursor might
+			 * since have been updated; if so, we should fetch the version
+			 * that is current according to our snapshot.
 			 */
-			ReleaseBuffer(buffer);
+			if (node->tss_isCurrentOf)
+				heap_get_latest_tid(heapRelation, snapshot, &tuple->t_self);
 
-			return slot;
+			if (heap_fetch(heapRelation, snapshot, tuple, &buffer, false, NULL))
+			{
+				/*
+				 * Store the scanned tuple in the scan tuple slot of the scan
+				 * state.  Eventually we will only do this and not return a
+				 * tuple.
+				 */
+				ExecStoreBufferHeapTuple(tuple, /* tuple to store */
+										 slot,	/* slot to store in */
+										 buffer);	/* buffer associated with
+													 * tuple */
+
+				/*
+				 * At this point we have an extra pin on the buffer, because
+				 * ExecStoreHeapTuple incremented the pin count. Drop our
+				 * local pin.
+				 */
+				ReleaseBuffer(buffer);
+
+				return slot;
+			}
+			else
+			{
+				tuple = NULL;
+			}
 		}
-		/* Bad TID or failed snapshot qual; try next */
-		if (bBackward)
-			node->tss_TidPtr--;
 		else
-			node->tss_TidPtr++;
+		{
+			if (!node->tss_inScan)
+				scandesc = BeginTidRangeScan(node, currentRange);
 
-		CHECK_FOR_INTERRUPTS();
+			tuple = NextInTidRange(scandesc, direction, currentRange);
+			if (tuple)
+				break;
+
+			node->tss_inScan = false;
+		}
 	}
 
 	/*
-	 * if we get here it means the tid scan failed so we are at the end of the
-	 * scan..
+	 * save the tuple and the buffer returned to us by the access methods in
+	 * our scan tuple slot and return the slot.  Note: we pass 'false' because
+	 * tuples returned by heap_getnext() are pointers onto disk pages and were
+	 * not created with palloc() and so should not be pfree()'d.  Note also
+	 * that ExecStoreHeapTuple will increment the refcount of the buffer; the
+	 * refcount will not be dropped until the tuple table slot is cleared.
 	 */
-	return ExecClearTuple(slot);
+	if (tuple)
+		ExecStoreBufferHeapTuple(tuple, /* tuple to store */
+								 slot,	/* slot to store in */
+								 scandesc->rs_cbuf);	/* buffer associated
+														 * with this tuple */
+	else
+		ExecClearTuple(slot);
+
+	return slot;
 }
 
 /*
@@ -460,11 +874,13 @@ ExecTidScan(PlanState *pstate)
 void
 ExecReScanTidScan(TidScanState *node)
 {
-	if (node->tss_TidList)
-		pfree(node->tss_TidList);
-	node->tss_TidList = NULL;
-	node->tss_NumTids = 0;
+	if (node->tss_TidRanges)
+		pfree(node->tss_TidRanges);
+
+	node->tss_TidRanges = NULL;
+	node->tss_NumRanges = 0;
 	node->tss_TidPtr = -1;
+	node->tss_inScan = false;
 
 	ExecScanReScan(&node->ss);
 }
@@ -479,6 +895,8 @@ ExecReScanTidScan(TidScanState *node)
 void
 ExecEndTidScan(TidScanState *node)
 {
+	HeapScanDesc scan = node->ss.ss_currentScanDesc;
+
 	/*
 	 * Free the exprcontext
 	 */
@@ -490,6 +908,10 @@ ExecEndTidScan(TidScanState *node)
 	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
+	/* close heap scan */
+	if (scan != NULL)
+		heap_endscan(scan);
+
 	/*
 	 * close the heap relation.
 	 */
@@ -529,11 +951,12 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
 	ExecAssignExprContext(estate, &tidstate->ss.ps);
 
 	/*
-	 * mark tid list as not computed yet
+	 * mark tid range list as not computed yet
 	 */
-	tidstate->tss_TidList = NULL;
-	tidstate->tss_NumTids = 0;
+	tidstate->tss_TidRanges = NULL;
+	tidstate->tss_NumRanges = 0;
 	tidstate->tss_TidPtr = -1;
+	tidstate->tss_inScan = false;
 
 	/*
 	 * open the base relation and acquire appropriate lock on it.
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 7c8220c..5f84984 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -583,6 +583,7 @@ _copyTidScan(const TidScan *from)
 	 * copy remainder of node
 	 */
 	COPY_NODE_FIELD(tidquals);
+	COPY_SCALAR_FIELD(direction);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 93f1e2c..e20ef0e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -619,6 +619,7 @@ _outTidScan(StringInfo str, const TidScan *node)
 	_outScanInfo(str, (const Scan *) node);
 
 	WRITE_NODE_FIELD(tidquals);
+	WRITE_ENUM_FIELD(direction, ScanDirection);
 }
 
 static void
@@ -1895,6 +1896,7 @@ _outTidPath(StringInfo str, const TidPath *node)
 	_outPathInfo(str, (const Path *) node);
 
 	WRITE_NODE_FIELD(tidquals);
+	WRITE_ENUM_FIELD(direction, ScanDirection);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 519deab..79de340 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1848,6 +1848,7 @@ _readTidScan(void)
 	ReadCommonScan(&local_node->scan);
 
 	READ_NODE_FIELD(tidquals);
+	READ_ENUM_FIELD(direction, ScanDirection);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7bf67a0..72b4fc6 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1185,8 +1185,11 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	Cost		cpu_per_tuple;
 	QualCost	tid_qual_cost;
 	int			ntuples;
+	int			nrandompages;
+	int			nseqpages;
 	ListCell   *l;
 	double		spc_random_page_cost;
+	double		spc_seq_page_cost;
 
 	/* Should only be applied to base relations */
 	Assert(baserel->relid > 0);
@@ -1200,6 +1203,8 @@ cost_tidscan(Path *path, PlannerInfo *root,
 
 	/* Count how many tuples we expect to retrieve */
 	ntuples = 0;
+	nrandompages = 0;
+	nseqpages = 0;
 	foreach(l, tidquals)
 	{
 		if (IsA(lfirst(l), ScalarArrayOpExpr))
@@ -1207,19 +1212,37 @@ cost_tidscan(Path *path, PlannerInfo *root,
 			/* Each element of the array yields 1 tuple */
 			ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) lfirst(l);
 			Node	   *arraynode = (Node *) lsecond(saop->args);
+			int			array_len = estimate_array_length(arraynode);
 
-			ntuples += estimate_array_length(arraynode);
+			ntuples += array_len;
+			nrandompages += array_len;
 		}
 		else if (IsA(lfirst(l), CurrentOfExpr))
 		{
 			/* CURRENT OF yields 1 tuple */
 			isCurrentOf = true;
 			ntuples++;
+			nrandompages++;
 		}
 		else
 		{
-			/* It's just CTID = something, count 1 tuple */
-			ntuples++;
+			/*
+			 * For anything else, we'll use the normal selectivity estimate.
+			 * Count the first page as a random page, the rest as sequential.
+			 */
+			Selectivity selectivity = clause_selectivity(root, lfirst(l),
+														 baserel->relid,
+														 JOIN_INNER,
+														 NULL);
+			BlockNumber pages = selectivity * baserel->pages;
+
+			if (pages <= 0)
+				pages = 1;
+
+			/* TODO decide what the costs should be */
+			ntuples += selectivity * baserel->tuples;
+			nseqpages += pages - 1;
+			nrandompages++;
 		}
 	}
 
@@ -1248,10 +1271,10 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	/* fetch estimated page cost for tablespace containing table */
 	get_tablespace_page_costs(baserel->reltablespace,
 							  &spc_random_page_cost,
-							  NULL);
+							  &spc_seq_page_cost);
 
-	/* disk costs --- assume each tuple on a different page */
-	run_cost += spc_random_page_cost * ntuples;
+	/* disk costs */
+	run_cost += spc_random_page_cost * nrandompages + spc_seq_page_cost + nseqpages;
 
 	/* Add scanning CPU costs */
 	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index ec66cb9..b847151 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -18,6 +18,9 @@
 #include "postgres.h"
 
 #include "access/stratnum.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/plannodes.h"
@@ -848,6 +851,22 @@ build_join_pathkeys(PlannerInfo *root,
 	return truncate_useless_pathkeys(root, joinrel, outer_pathkeys);
 }
 
+/*
+ * build_tidscan_pathkeys
+ *	  Build the path keys corresponding to ORDER BY ctid ASC|DESC.
+ */
+List *
+build_tidscan_pathkeys(PlannerInfo *root,
+					   RelOptInfo *rel,
+					   ScanDirection direction)
+{
+	int			opno = (direction == ForwardScanDirection) ? TIDLessOperator : TIDGreaterOperator;
+	Var		   *varexpr = makeVar(rel->relid, SelfItemPointerAttributeNumber, TIDOID, -1, InvalidOid, 0);
+	List	   *pathkeys = build_expression_pathkey(root, (Expr *) varexpr, NULL, opno, rel->relids, true);
+
+	return pathkeys;
+}
+
 /****************************************************************************
  *		PATHKEYS AND SORT CLAUSES
  ****************************************************************************/
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 3bb5b8d..8839aed 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -4,13 +4,16 @@
  *	  Routines to determine which TID conditions are usable for scanning
  *	  a given relation, and create TidPaths accordingly.
  *
- * What we are looking for here is WHERE conditions of the form
- * "CTID = pseudoconstant", which can be implemented by just fetching
- * the tuple directly via heap_fetch().  We can also handle OR'd conditions
- * such as (CTID = const1) OR (CTID = const2), as well as ScalarArrayOpExpr
- * conditions of the form CTID = ANY(pseudoconstant_array).  In particular
- * this allows
- *		WHERE ctid IN (tid1, tid2, ...)
+ * What we are looking for here is WHERE conditions of the forms:
+ * - "CTID = c", which can be implemented by just fetching
+ *    the tuple directly via heap_fetch().
+ * - "CTID IN (pseudoconstant, ...)" or "CTID = ANY(pseudoconstant_array)"
+ * - "CTID > pseudoconstant", etc. for >, >=, <, and <=.
+ * - "CTID > pseudoconstant AND CTID < pseudoconstant", etc., with up to one
+ *   lower bound and one upper bound.
+ *
+ * We can also handle OR'd conditions of the above form, such as
+ * "(CTID = const1) OR (CTID >= const2) OR CTID IN (...)".
  *
  * We also support "WHERE CURRENT OF cursor" conditions (CurrentOfExpr),
  * which amount to "CTID = run-time-determined-TID".  These could in
@@ -46,32 +49,46 @@
 #include "optimizer/restrictinfo.h"
 
 
-static bool IsTidEqualClause(OpExpr *node, int varno);
+static bool IsTidVar(Var *var, int varno);
+static bool IsTidComparison(OpExpr *node, int varno, Oid expected_comparison_operator);
 static bool IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno);
+static bool IsUsableRangeQual(Node *expr, int varno, bool want_lower_bound);
+static List *MakeTidRangeQuals(List *quals);
+static List *TidCompoundRangeQualFromExpr(Node *expr, int varno);
 static List *TidQualFromExpr(Node *expr, int varno);
 static List *TidQualFromBaseRestrictinfo(RelOptInfo *rel);
 
 
+static bool
+IsTidVar(Var *var, int varno)
+{
+	return (var->varattno == SelfItemPointerAttributeNumber &&
+			var->vartype == TIDOID &&
+			var->varno == varno &&
+			var->varlevelsup == 0);
+}
+
 /*
  * Check to see if an opclause is of the form
- *		CTID = pseudoconstant
+ *		CTID OP pseudoconstant
  * or
- *		pseudoconstant = CTID
+ *		pseudoconstant OP CTID
+ * where OP is the expected comparison operator.
  *
  * We check that the CTID Var belongs to relation "varno".  That is probably
  * redundant considering this is only applied to restriction clauses, but
  * let's be safe.
  */
 static bool
-IsTidEqualClause(OpExpr *node, int varno)
+IsTidComparison(OpExpr *node, int varno, Oid expected_comparison_operator)
 {
 	Node	   *arg1,
 			   *arg2,
 			   *other;
 	Var		   *var;
 
-	/* Operator must be tideq */
-	if (node->opno != TIDEqualOperator)
+	/* Operator must be the expected one */
+	if (node->opno != expected_comparison_operator)
 		return false;
 	if (list_length(node->args) != 2)
 		return false;
@@ -83,19 +100,13 @@ IsTidEqualClause(OpExpr *node, int varno)
 	if (arg1 && IsA(arg1, Var))
 	{
 		var = (Var *) arg1;
-		if (var->varattno == SelfItemPointerAttributeNumber &&
-			var->vartype == TIDOID &&
-			var->varno == varno &&
-			var->varlevelsup == 0)
+		if (IsTidVar(var, varno))
 			other = arg2;
 	}
 	if (!other && arg2 && IsA(arg2, Var))
 	{
 		var = (Var *) arg2;
-		if (var->varattno == SelfItemPointerAttributeNumber &&
-			var->vartype == TIDOID &&
-			var->varno == varno &&
-			var->varlevelsup == 0)
+		if (IsTidVar(var, varno))
 			other = arg1;
 	}
 	if (!other)
@@ -110,6 +121,17 @@ IsTidEqualClause(OpExpr *node, int varno)
 	return true;				/* success */
 }
 
+#define IsTidEqualClause(node, varno)	IsTidComparison(node, varno, TIDEqualOperator)
+#define IsTidLTClause(node, varno)		IsTidComparison(node, varno, TIDLessOperator)
+#define IsTidLEClause(node, varno)		IsTidComparison(node, varno, TIDLessEqOperator)
+#define IsTidGTClause(node, varno)		IsTidComparison(node, varno, TIDGreaterOperator)
+#define IsTidGEClause(node, varno)		IsTidComparison(node, varno, TIDGreaterEqOperator)
+
+#define IsTidRangeClause(node, varno)	(IsTidLTClause(node, varno) || \
+										 IsTidLEClause(node, varno) || \
+										 IsTidGTClause(node, varno) || \
+										 IsTidGEClause(node, varno))
+
 /*
  * Check to see if a clause is of the form
  *		CTID = ANY (pseudoconstant_array)
@@ -134,10 +156,7 @@ IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno)
 	{
 		Var		   *var = (Var *) arg1;
 
-		if (var->varattno == SelfItemPointerAttributeNumber &&
-			var->vartype == TIDOID &&
-			var->varno == varno &&
-			var->varlevelsup == 0)
+		if (IsTidVar(var, varno))
 		{
 			/* The other argument must be a pseudoconstant */
 			if (is_pseudo_constant_clause(arg2))
@@ -149,6 +168,76 @@ IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno)
 }
 
 /*
+ * IsUsableRangeQual
+ *		Check if the expr is range qual of the expected type.
+ */
+static bool
+IsUsableRangeQual(Node *expr, int varno, bool want_lower_bound)
+{
+	if (is_opclause(expr) && IsTidRangeClause((OpExpr *) expr, varno))
+	{
+		bool		is_lower_bound = IsTidGTClause((OpExpr *) expr, varno) || IsTidGEClause((OpExpr *) expr, varno);
+		Node	   *leftop = get_leftop((Expr *) expr);
+
+		if (!IsA(leftop, Var) ||!IsTidVar((Var *) leftop, varno))
+			is_lower_bound = !is_lower_bound;
+
+		if (is_lower_bound == want_lower_bound)
+			return true;
+	}
+
+	return false;
+}
+
+static List *
+MakeTidRangeQuals(List *quals)
+{
+	if (list_length(quals) == 1)
+		return quals;
+	else
+		return list_make1(make_andclause(quals));
+}
+
+/*
+ * TidCompoundRangeQualFromExpr
+ *
+ * 		Extract a compound CTID range condition from the given qual expression
+ */
+static List *
+TidCompoundRangeQualFromExpr(Node *expr, int varno)
+{
+	List	   *rlst = NIL;
+	ListCell   *l;
+	bool		found_lower = false;
+	bool		found_upper = false;
+	List	   *found_quals = NIL;
+
+	foreach(l, ((BoolExpr *) expr)->args)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* Check if this clause contains a range qual */
+		if (!found_lower && IsUsableRangeQual(clause, varno, true))
+		{
+			found_lower = true;
+			found_quals = lappend(found_quals, clause);
+		}
+
+		if (!found_upper && IsUsableRangeQual(clause, varno, false))
+		{
+			found_upper = true;
+			found_quals = lappend(found_quals, clause);
+		}
+	}
+
+	/* If one or both range quals was specified, use them. */
+	if (found_quals)
+		rlst = MakeTidRangeQuals(found_quals);
+
+	return rlst;
+}
+
+/*
  *	Extract a set of CTID conditions from the given qual expression
  *
  *	Returns a List of CTID qual expressions (with implicit OR semantics
@@ -174,6 +263,8 @@ TidQualFromExpr(Node *expr, int varno)
 		/* base case: check for tideq opclause */
 		if (IsTidEqualClause((OpExpr *) expr, varno))
 			rlst = list_make1(expr);
+		else if (IsTidRangeClause((OpExpr *) expr, varno))
+			rlst = list_make1(expr);
 	}
 	else if (expr && IsA(expr, ScalarArrayOpExpr))
 	{
@@ -189,11 +280,18 @@ TidQualFromExpr(Node *expr, int varno)
 	}
 	else if (and_clause(expr))
 	{
-		foreach(l, ((BoolExpr *) expr)->args)
+		/* look for a range qual in the clause */
+		rlst = TidCompoundRangeQualFromExpr(expr, varno);
+
+		/* if no range qual was found, look for any other TID qual */
+		if (!rlst)
 		{
-			rlst = TidQualFromExpr((Node *) lfirst(l), varno);
-			if (rlst)
-				break;
+			foreach(l, ((BoolExpr *) expr)->args)
+			{
+				rlst = TidQualFromExpr((Node *) lfirst(l), varno);
+				if (rlst)
+					break;
+			}
 		}
 	}
 	else if (or_clause(expr))
@@ -217,17 +315,28 @@ TidQualFromExpr(Node *expr, int varno)
 }
 
 /*
- *	Extract a set of CTID conditions from the rel's baserestrictinfo list
+ * Extract a set of CTID conditions from the rel's baserestrictinfo list
+ *
+ * Normally we just use the first RestrictInfo item with some usable quals,
+ * but it's also possible for a good compound range qual, such as
+ * "CTID > ? AND CTID < ?", to be split across two items.  So we look for
+ * lower/upper bound range quals in all items and use them if any were found.
+ * In principal there might be more than one lower or upper bound), but we
+ * just use the first one found of each type.
  */
 static List *
 TidQualFromBaseRestrictinfo(RelOptInfo *rel)
 {
 	List	   *rlst = NIL;
 	ListCell   *l;
+	bool		found_lower = false;
+	bool		found_upper = false;
+	List	   *found_quals = NIL;
 
 	foreach(l, rel->baserestrictinfo)
 	{
 		RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+		Node	   *clause = (Node *) rinfo->clause;
 
 		/*
 		 * If clause must wait till after some lower-security-level
@@ -236,10 +345,31 @@ TidQualFromBaseRestrictinfo(RelOptInfo *rel)
 		if (!restriction_is_securely_promotable(rinfo, rel))
 			continue;
 
-		rlst = TidQualFromExpr((Node *) rinfo->clause, rel->relid);
+		/* Look for lower and upper bound range quals. */
+		if (!found_lower && IsUsableRangeQual((Node *) clause, rel->relid, true))
+		{
+			found_lower = true;
+			found_quals = lappend(found_quals, clause);
+			continue;
+		}
+
+		if (!found_upper && IsUsableRangeQual((Node *) clause, rel->relid, false))
+		{
+			found_upper = true;
+			found_quals = lappend(found_quals, clause);
+			continue;
+		}
+
+		/* Look for other TID quals. */
+		rlst = TidQualFromExpr((Node *) clause, rel->relid);
 		if (rlst)
 			break;
 	}
+
+	/* Use a range qual if any were found. */
+	if (found_quals)
+		rlst = MakeTidRangeQuals(found_quals);
+
 	return rlst;
 }
 
@@ -247,12 +377,16 @@ TidQualFromBaseRestrictinfo(RelOptInfo *rel)
  * create_tidscan_paths
  *	  Create paths corresponding to direct TID scans of the given rel.
  *
+ *	  Path keys and direction will be set on the scans if it looks useful.
+ *
  *	  Candidate paths are added to the rel's pathlist (using add_path).
  */
 void
 create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 {
 	Relids		required_outer;
+	List	   *pathkeys = NULL;
+	ScanDirection direction = ForwardScanDirection;
 	List	   *tidquals;
 
 	/*
@@ -262,9 +396,37 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 	 */
 	required_outer = rel->lateral_relids;
 
+	/*
+	 * Try to determine the best scan direction and create some useful
+	 * pathkeys.
+	 */
+	if (has_useful_pathkeys(root, rel))
+	{
+		/*
+		 * Build path keys corresponding to ORDER BY ctid ASC, and check
+		 * whether they will be useful for this scan.  If not, build path keys
+		 * for DESC, and try that; set the direction to BackwardScanDirection
+		 * if so.  If neither of them will be useful, no path keys will be
+		 * set.
+		 */
+		pathkeys = build_tidscan_pathkeys(root, rel, ForwardScanDirection);
+		if (!pathkeys_contained_in(pathkeys, root->query_pathkeys))
+		{
+			pathkeys = build_tidscan_pathkeys(root, rel, BackwardScanDirection);
+			if (pathkeys_contained_in(pathkeys, root->query_pathkeys))
+				direction = BackwardScanDirection;
+			else
+				pathkeys = NULL;
+		}
+	}
+
 	tidquals = TidQualFromBaseRestrictinfo(rel);
 
-	if (tidquals)
-		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals,
-												   required_outer));
+	/*
+	 * If there are tidquals or some useful pathkeys were found, then it's
+	 * worth generating a tidscan path.
+	 */
+	if (tidquals || pathkeys)
+		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals, pathkeys,
+												   direction, required_outer));
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ae41c9e..5452730 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -185,7 +185,7 @@ static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 					 List *bitmapqualorig,
 					 Index scanrelid);
 static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
-			 List *tidquals);
+			 List *tidquals, ScanDirection direction);
 static SubqueryScan *make_subqueryscan(List *qptlist,
 				  List *qpqual,
 				  Index scanrelid,
@@ -3086,6 +3086,21 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	}
 
 	/*
+	 * In the case of a compound range qual, the two parts may have come
+	 * from different RestrictInfos.  So remove each part separately.
+	 */
+	if (list_length(tidquals) == 1)
+	{
+		Node	   *qual = linitial(tidquals);
+
+		if (and_clause(qual))
+		{
+			BoolExpr   *and_qual = ((BoolExpr *) qual);
+			scan_clauses = list_difference(scan_clauses, and_qual->args);
+		}
+	}
+
+	/*
 	 * Remove any clauses that are TID quals.  This is a bit tricky since the
 	 * tidquals list has implicit OR semantics.
 	 */
@@ -3097,7 +3112,9 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	scan_plan = make_tidscan(tlist,
 							 scan_clauses,
 							 scan_relid,
-							 tidquals);
+							 tidquals,
+							 best_path->direction
+		);
 
 	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
 
@@ -5179,7 +5196,8 @@ static TidScan *
 make_tidscan(List *qptlist,
 			 List *qpqual,
 			 Index scanrelid,
-			 List *tidquals)
+			 List *tidquals,
+			 ScanDirection direction)
 {
 	TidScan    *node = makeNode(TidScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5190,6 +5208,7 @@ make_tidscan(List *qptlist,
 	plan->righttree = NULL;
 	node->scan.scanrelid = scanrelid;
 	node->tidquals = tidquals;
+	node->direction = direction;
 
 	return node;
 }
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index c5aaaf5..e2d51a9 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1186,6 +1186,7 @@ create_bitmap_or_path(PlannerInfo *root,
  */
 TidPath *
 create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
+					List *pathkeys, ScanDirection direction,
 					Relids required_outer)
 {
 	TidPath    *pathnode = makeNode(TidPath);
@@ -1198,9 +1199,10 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 	pathnode->path.parallel_aware = false;
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = 0;
-	pathnode->path.pathkeys = NIL;	/* always unordered */
+	pathnode->path.pathkeys = pathkeys;
 
 	pathnode->tidquals = tidquals;
+	pathnode->direction = direction;
 
 	cost_tidscan(&pathnode->path, root, rel, tidquals,
 				 pathnode->path.param_info);
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index b8c0e03..eaacab7 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -572,6 +572,30 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 
 	if (!HeapTupleIsValid(vardata->statsTuple))
 	{
+		/*
+		 * There are no stats for system columns, but for CTID we can estimate
+		 * based on table size.
+		 */
+		if (vardata->var && IsA(vardata->var, Var) &&
+			((Var *) vardata->var)->varattno == SelfItemPointerAttributeNumber)
+		{
+			ItemPointer itemptr;
+			BlockNumber block;
+
+			/* If the relation's empty, we're going to read all of it. */
+			if (vardata->rel->pages == 0)
+				return 1.0;
+
+			itemptr = (ItemPointer) DatumGetPointer(constval);
+			block = ItemPointerGetBlockNumberNoCheck(itemptr);
+			selec = block / (double) vardata->rel->pages;
+			if (isgt)
+				selec = 1.0 - selec;
+
+			CLAMP_PROBABILITY(selec);
+			return selec;
+		}
+
 		/* no stats available, so default result */
 		return DEFAULT_INEQ_SEL;
 	}
@@ -1786,6 +1810,15 @@ nulltestsel(PlannerInfo *root, NullTestType nulltesttype, Node *arg,
 				return (Selectivity) 0; /* keep compiler quiet */
 		}
 	}
+	else if (vardata.var && IsA(vardata.var, Var) &&
+			 ((Var *) vardata.var)->varattno == SelfItemPointerAttributeNumber)
+	{
+		/*
+		 * There are no stats for system columns, but we know CTID is never
+		 * NULL.
+		 */
+		selec = (nulltesttype == IS_NULL) ? 0.0 : 1.0;
+	}
 	else
 	{
 		/*
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index d9b6bad..cdd2cd3 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -156,15 +156,15 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 03ad516..ee6a04d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1477,6 +1477,8 @@ typedef struct BitmapHeapScanState
 	ParallelBitmapHeapState *pstate;
 } BitmapHeapScanState;
 
+typedef struct TidRange TidRange;
+
 /* ----------------
  *	 TidScanState information
  *
@@ -1493,10 +1495,11 @@ typedef struct TidScanState
 	ScanState	ss;				/* its first field is NodeTag */
 	List	   *tss_tidexprs;
 	bool		tss_isCurrentOf;
-	int			tss_NumTids;
+	int			tss_NumRanges;
 	int			tss_TidPtr;
-	ItemPointerData *tss_TidList;
-	HeapTupleData tss_htup;
+	TidRange   *tss_TidRanges;
+	bool		tss_inScan;
+	HeapTupleData tss_htup;		/* for current-of and single TID fetches */
 } TidScanState;
 
 /* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 7c2abbd..96d30aa 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -492,6 +492,7 @@ typedef struct TidScan
 {
 	Scan		scan;
 	List	   *tidquals;		/* qual(s) involving CTID = something */
+	ScanDirection direction;
 } TidScan;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index adb4265..2fee1e1 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1229,14 +1229,24 @@ typedef struct BitmapOrPath
 /*
  * TidPath represents a scan by TID
  *
- * tidquals is an implicitly OR'ed list of qual expressions of the form
- * "CTID = pseudoconstant" or "CTID = ANY(pseudoconstant_array)".
+ * tidquals is an implicitly OR'ed list of qual expressions of the forms:
+ *   - "CTID = pseudoconstant"
+ *   - "CTID = ANY(pseudoconstant_array)"
+ *   - "CURRENT OF cursor"
+ *   - "CTID relop pseudoconstant"
+ *   - "(CTID relop pseudoconstant) AND (CTID relop pseudoconstant)"
+ *
+ * It is permissable for the CTID variable to be the LHS or RHS of operator
+ * expressions; in the last case, there is always a lower bound and upper bound,
+ * in any order.  If tidquals is empty, all CTIDs will match.
+ *
  * Note they are bare expressions, not RestrictInfos.
  */
 typedef struct TidPath
 {
 	Path		path;
-	List	   *tidquals;		/* qual(s) involving CTID = something */
+	List	   *tidquals;
+	ScanDirection direction;
 } TidPath;
 
 /*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 7c5ff22..a0a88a5 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,7 +63,8 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  RelOptInfo *rel,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
-					List *tidquals, Relids required_outer);
+					List *tidquals, List *pathkeys, ScanDirection direction,
+					Relids required_outer);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 				   List *subpaths, List *partial_subpaths,
 				   Relids required_outer,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde30..9d0699e 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -211,6 +211,9 @@ extern List *build_join_pathkeys(PlannerInfo *root,
 					RelOptInfo *joinrel,
 					JoinType jointype,
 					List *outer_pathkeys);
+extern List *build_tidscan_pathkeys(PlannerInfo *root,
+					   RelOptInfo *rel,
+					   ScanDirection direction);
 extern List *make_pathkeys_for_sortclauses(PlannerInfo *root,
 							  List *sortclauses,
 							  List *tlist);
diff --git a/src/test/regress/expected/tidscan.out b/src/test/regress/expected/tidscan.out
index 521ed1b..4b9564b 100644
--- a/src/test/regress/expected/tidscan.out
+++ b/src/test/regress/expected/tidscan.out
@@ -116,6 +116,39 @@ FETCH FIRST FROM c;
 (1 row)
 
 ROLLBACK;
+-- check that ordering on a tidscan doesn't require a sort
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ Tid Scan on tidscan
+   TID Cond: (ctid = ANY ('{"(0,2)","(0,1)","(0,3)"}'::tid[]))
+(2 rows)
+
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+ ctid  | id 
+-------+----
+ (0,1) |  1
+ (0,2) |  2
+ (0,3) |  3
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ Tid Scan Backward on tidscan
+   TID Cond: (ctid = ANY ('{"(0,2)","(0,1)","(0,3)"}'::tid[]))
+(2 rows)
+
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+ ctid  | id 
+-------+----
+ (0,3) |  3
+ (0,2) |  2
+ (0,1) |  1
+(3 rows)
+
 -- tidscan via CURRENT OF
 BEGIN;
 DECLARE c CURSOR FOR SELECT ctid, * FROM tidscan;
@@ -177,3 +210,315 @@ UPDATE tidscan SET id = -id WHERE CURRENT OF c RETURNING *;
 ERROR:  cursor "c" is not positioned on a row
 ROLLBACK;
 DROP TABLE tidscan;
+-- tests for tidrangescans
+CREATE TABLE tidrangescan(id integer, data text);
+INSERT INTO tidrangescan SELECT i,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(1,0)';
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (0,1)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,2)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,3)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,4)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,5)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,6)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,7)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,8)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid <= '(1,5)';
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid <= '(1,5)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid <= '(1,5)';
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (0,1)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,2)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,3)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,4)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,5)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,6)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,7)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,8)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,1)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,2)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,3)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,4)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,5)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(15 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(0,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid < '(0,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid | data 
+------+------
+(0 rows)
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(9,8)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid > '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(9,8)';
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (9,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (9,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(9,8)' < ctid;
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ('(9,8)'::tid < ctid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE '(9,8)' < ctid;
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (9,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (9,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(9,8)';
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid >= '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(9,8)';
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (9,8)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (9,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (9,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(100,0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid >= '(100,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(100,0)';
+ ctid | data 
+------+------
+(0 rows)
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ((ctid > '(4,4)'::tid) AND ('(4,7)'::tid >= ctid))
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+ ctid  |                                       data                                       
+-------+----------------------------------------------------------------------------------
+ (4,5) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,6) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,7) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid))
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+ ctid  |                                       data                                       
+-------+----------------------------------------------------------------------------------
+ (4,5) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,6) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,7) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(3 rows)
+
+-- combinations
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+                                        QUERY PLAN                                         
+-------------------------------------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ((('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid)) OR (ctid = '(2,2)'::tid))
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+ ctid  |                                       data                                       
+-------+----------------------------------------------------------------------------------
+ (2,2) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,5) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,6) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,7) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+                                                     QUERY PLAN                                                     
+--------------------------------------------------------------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ((('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid)) OR (ctid = '(2,2)'::tid))
+   Filter: ((('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid)) OR ((ctid = '(2,2)'::tid) AND (data = 'foo'::text)))
+(3 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+ ctid  |                                       data                                       
+-------+----------------------------------------------------------------------------------
+ (4,5) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,6) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,7) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(3 rows)
+
+-- ordering with no quals should use tid range scan
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan ORDER BY ctid ASC;
+        QUERY PLAN        
+--------------------------
+ Tid Scan on tidrangescan
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan ORDER BY ctid DESC;
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan Backward on tidrangescan
+(1 row)
+
+-- min/max
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan;
+                 QUERY PLAN                 
+--------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan on tidrangescan
+                 Filter: (ctid IS NOT NULL)
+(5 rows)
+
+SELECT MIN(ctid) FROM tidrangescan;
+  min  
+-------
+ (0,1)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan Backward on tidrangescan
+                 Filter: (ctid IS NOT NULL)
+(5 rows)
+
+SELECT MAX(ctid) FROM tidrangescan;
+  max   
+--------
+ (9,10)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+                   QUERY PLAN                    
+-------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan on tidrangescan
+                 TID Cond: (ctid > '(5,0)'::tid)
+                 Filter: (ctid IS NOT NULL)
+(6 rows)
+
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+  min  
+-------
+ (5,1)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+                   QUERY PLAN                    
+-------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan Backward on tidrangescan
+                 TID Cond: (ctid < '(5,0)'::tid)
+                 Filter: (ctid IS NOT NULL)
+(6 rows)
+
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+  max   
+--------
+ (4,10)
+(1 row)
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan_empty
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+ ctid | data 
+------+------
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan_empty
+   TID Cond: (ctid > '(9,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+ ctid | data 
+------+------
+(0 rows)
+
diff --git a/src/test/regress/sql/tidscan.sql b/src/test/regress/sql/tidscan.sql
index a8472e0..e9519ee 100644
--- a/src/test/regress/sql/tidscan.sql
+++ b/src/test/regress/sql/tidscan.sql
@@ -43,6 +43,15 @@ FETCH BACKWARD 1 FROM c;
 FETCH FIRST FROM c;
 ROLLBACK;
 
+-- check that ordering on a tidscan doesn't require a sort
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+
 -- tidscan via CURRENT OF
 BEGIN;
 DECLARE c CURSOR FOR SELECT ctid, * FROM tidscan;
@@ -64,3 +73,94 @@ UPDATE tidscan SET id = -id WHERE CURRENT OF c RETURNING *;
 ROLLBACK;
 
 DROP TABLE tidscan;
+
+-- tests for tidrangescans
+
+CREATE TABLE tidrangescan(id integer, data text);
+
+INSERT INTO tidrangescan SELECT i,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(1,0)';
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(1,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid <= '(1,5)';
+SELECT ctid, data FROM tidrangescan WHERE ctid <= '(1,5)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(0,0)';
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(9,8)';
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(9,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(9,8)' < ctid;
+SELECT ctid, data FROM tidrangescan WHERE '(9,8)' < ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(9,8)';
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(9,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(100,0)';
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(100,0)';
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+
+-- combinations
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+
+-- ordering with no quals should use tid range scan
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan ORDER BY ctid ASC;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan ORDER BY ctid DESC;
+
+-- min/max
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan;
+SELECT MIN(ctid) FROM tidrangescan;
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan;
+SELECT MAX(ctid) FROM tidrangescan;
+
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';

#10

David Rowley

david.rowley@2ndquadrant.com

over 7 years ago

In reply to: Edmund Horner (#9)

Re: Tid scan improvements

On 28 September 2018 at 18:13, Edmund Horner <ejrh00@gmail.com> wrote:

On Fri, 28 Sep 2018 at 17:02, Edmund Horner <ejrh00@gmail.com> wrote:

I did run pgindent over it though. :)

But I didn't check if it still applied to master. Sigh. Here's one that does.

I know commit fest is over, but I made a pass of this to hopefully
provide a bit of guidance so that it's closer for the November 'fest.

I've only done light testing on the patch and it does seem to work,
but there are a few things that I think should be changed. Most
importantly #11 below I think needs to be done. That might overwrite
some of the items that come before it in the list as you likely will
have to pull some of code which I mention out out due to changing #11.
I've kept them around anyway just in case some of it remains.

1. Could wrap for tables > 16TB. Please use double. See index_pages_fetched()

int nrandompages;
int nseqpages;

2. Should multiply by nseqpages, not add.

run_cost += spc_random_page_cost * nrandompages + spc_seq_page_cost + nseqpages;

3. Should be double:

BlockNumber pages = selectivity * baserel->pages;

4. Comment needs updated to mention what the new code does in
heapgettup() and heapgettup_pagemode()

+
  /* start from last page of the scan */
- if (scan->rs_startblock > 0)
- page = scan->rs_startblock - 1;
+ if (scan->rs_numblocks == InvalidBlockNumber)
+ {
+ if (scan->rs_startblock > 0)
+ page = scan->rs_startblock - 1;
+ else
+ page = scan->rs_nblocks - 1;
+ }
  else
- page = scan->rs_nblocks - 1;
+ page = scan->rs_startblock + scan->rs_numblocks - 1;
+

5. Variables should be called "inclusive". We use "strict" to indicate
an operator comparison cannot match NULL values.

+ bool strict; /* Indicates < rather than <=, or > rather */
+ bool strict2; /* than >= */

Don't break the comment like that. If you need more space don't end
the comment and use a new line and tab the next line out to match the
* of the first line.

6. Why not pass the TidExpr into MakeTidOpExprState() and have it set
the type instead of repeating code

7. It's not very obvious why the following Assert() can't fail.

+ bool invert;
+ bool invert2;
+
+ Assert(list_length(((BoolExpr *) expr)->args) == 2);

I had to hunt around quite a bit to see that
TidQualFromBaseRestrictinfo could only ever make the list have 2
elements, and we'd not form a BoolExpr with just 1. (but see #11)

8. Many instances of the word "strict" are used to mean "inclusive".
Can you please change all of them.

9. Confusing comment:

+ * If the expression was non-strict (<=) and the offset is 0, then just
+ * pretend it was strict, because offset 0 doesn't exist and we may as
+ * well exclude that block.

Shouldn't this be, "If the operator is non-inclusive, then since TID
offsets are 1-based, for simplicity, we can just class the expression
as inclusive.", or something along those lines.

10. Comment talks about LHS, but the first OpExpr in a list of two
OpExprs has nothing to do with left hand side. You could use LHS if
you were talking about the first arg in an OpExpr, but this is not the
case here.

/* If the LHS is not the lower bound, swap them. */

You could probably just ensure that the >=, > ops is the first in the
list inside TidQualFromBaseRestrictinfo(), but you'd need to clearly
comment that this is the case in both locations. Perhaps use lcons()
for the lower end and lappend() for the upper end, but see #11.

11. I think the qual matching code needs an overhaul. Really you
should attempt to find the smallest and largest ctid for your
implicitly ANDed ranges. This would require you getting rid of the
BETWEEN type claused you're trying to build in
TidQualFromBaseRestrictinfo
and instead just include all quals, don't ignore other quals when
you've already found your complete range bounds.

The problem with doing it the way that you're doing it now is in cases like:

create table t1(a int);
insert into t1 select generate_Series(1,10000000);
create index on t1 (a);
select ctid,a from t1 order by a desc limit 1; -- find the max ctid.
ctid | a
-------------+----------
(44247,178) | 10000000
(1 row)

set max_parallel_workers_per_gather=0;
explain analyze select ctid,* from t1 where ctid > '(0,0)' and ctid <=
'(44247,178)' and ctid <= '(0,1)';
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Tid Scan on t1 (cost=0.01..169248.78 rows=1 width=10) (actual
time=0.042..2123.432 rows=1 loops=1)
TID Cond: ((ctid > '(0,0)'::tid) AND (ctid <= '(44247,178)'::tid))
Filter: (ctid <= '(0,1)'::tid)
Rows Removed by Filter: 9999999
Planning Time: 4.049 ms
Execution Time: 2123.464 ms
(6 rows)

Due to how you've coded TidQualFromBaseRestrictinfo(), the ctid <=
'(0,1)' qual does not make it into the range. It's left as a filter in
the Tid Scan.

I think I'm going to stop here as changing this going to cause quite a
bit of churn.

but one more...

12. I think the changes to selfuncs.c to get the selectivity estimate
is a fairly credible idea, but I think it also needs to account for
offsets. You should be able to work out the average number of items
per page with rel->tuples / rel->pages and factor that in to get a
better estimate for cases like:

postgres=# explain analyze select ctid,* from t1 where ctid <= '(0,200)';
QUERY PLAN
-----------------------------------------------------------------------------------------------
Tid Scan on t1 (cost=0.00..5.00 rows=1 width=10) (actual
time=0.025..0.065 rows=200 loops=1)
TID Cond: (ctid <= '(0,200)'::tid)
Planning Time: 0.081 ms
Execution Time: 0.088 ms
(4 rows)

You can likely add on "(offset / avg_tuples_per_page) / rel->pages" to
the selectivity and get a fairly accurate estimate... at least when
there are no dead tuples in the heap

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#11

Edmund Horner

ejrh00@gmail.com

over 7 years ago

In reply to: David Rowley (#10)

Re: Tid scan improvements

On Wed, 3 Oct 2018 at 17:36, David Rowley <david.rowley@2ndquadrant.com> wrote:

I know commit fest is over, but I made a pass of this to hopefully
provide a bit of guidance so that it's closer for the November 'fest.

Hi David. Thanks for the review. It's fairly thorough and you must
have put some time into it -- I really appreciate it.

I've only done light testing on the patch and it does seem to work,
but there are a few things that I think should be changed. Most
importantly #11 below I think needs to be done. That might overwrite
some of the items that come before it in the list as you likely will
have to pull some of code which I mention out out due to changing #11.
I've kept them around anyway just in case some of it remains.

1. Could wrap for tables > 16TB. Please use double. See index_pages_fetched()
2. Should multiply by nseqpages, not add.
3. Should be double.

I agree with these three.

4. Comment needs updated to mention what the new code does in
heapgettup() and heapgettup_pagemode()

+
/* start from last page of the scan */
- if (scan->rs_startblock > 0)
- page = scan->rs_startblock - 1;
+ if (scan->rs_numblocks == InvalidBlockNumber)
+ {
+ if (scan->rs_startblock > 0)
+ page = scan->rs_startblock - 1;
+ else
+ page = scan->rs_nblocks - 1;
+ }
else
- page = scan->rs_nblocks - 1;
+ page = scan->rs_startblock + scan->rs_numblocks - 1;
+

I'm thinking that, as they don't depend on the others, the heapam.c
changes should be a separate preparatory patch?

The heap scan code has to support things like synchonised scans and
parallel scans, but as far as I know, its support for scanning
subranges is currently used only for building BRIN indexes. I found
that although I could specify a subrange with heap_setscanlimits, I
could not scan backward over it, because the original version of the
above code would start at the end of the whole table.

I'm not especially comfortable with this understanding of heapam, so
close review would be appreciated.

I note that there's a lot of common code in heapgettup and
heapgettup_pagemode, which my changes add to. It might be worth
trying to factor out somehow.

5. Variables should be called "inclusive". We use "strict" to indicate
an operator comparison cannot match NULL values.
+ bool strict; /* Indicates < rather than <=, or > rather */
+ bool strict2; /* than >= */
Don't break the comment like that. If you need more space don't end
the comment and use a new line and tab the next line out to match the
* of the first line.

8. Many instances of the word "strict" are used to mean "inclusive".
Can you please change all of them.

I don't mind renaming it. I took "strict" from "strictly greater/less
than" but I knew it was confusable with the other usages of "strict".

9. Confusing comment:
+ * If the expression was non-strict (<=) and the offset is 0, then just
+ * pretend it was strict, because offset 0 doesn't exist and we may as
+ * well exclude that block.
Shouldn't this be, "If the operator is non-inclusive, then since TID
offsets are 1-based, for simplicity, we can just class the expression
as inclusive.", or something along those lines.

Ok, I'll try to reword it along those lines.

I think I'm going to stop here as changing this going to cause quite a
bit of churn.

but one more...

12. I think the changes to selfuncs.c to get the selectivity estimate
is a fairly credible idea, but I think it also needs to account for
offsets. You should be able to work out the average number of items
per page with rel->tuples / rel->pages and factor that in to get a
better estimate for cases like:

postgres=# explain analyze select ctid,* from t1 where ctid <= '(0,200)';
QUERY PLAN
-----------------------------------------------------------------------------------------------
Tid Scan on t1 (cost=0.00..5.00 rows=1 width=10) (actual
time=0.025..0.065 rows=200 loops=1)
TID Cond: (ctid <= '(0,200)'::tid)
Planning Time: 0.081 ms
Execution Time: 0.088 ms
(4 rows)

You can likely add on "(offset / avg_tuples_per_page) / rel->pages" to
the selectivity and get a fairly accurate estimate... at least when
there are no dead tuples in the heap

I think the changes to selfuncs.c could also be a separate patch?

I'll try to include the offset in the selectivity too.

Related -- what should the selectivity be on an empty table? My code has:

/* If the relation's empty, we're going to read all of it. */
if (vardata->rel->pages == 0)
return 1.0;

(which needs rewording, since selectivity isn't always about reading).
Is 1.0 the right thing to return?

6. Why not pass the TidExpr into MakeTidOpExprState() and have it set
the type instead of repeating code
7. It's not very obvious why the following Assert() can't fail. [...]
I had to hunt around quite a bit to see that
TidQualFromBaseRestrictinfo could only ever make the list have 2
elements, and we'd not form a BoolExpr with just 1. (but see #11)
10. Comment talks about LHS, but the first OpExpr in a list of two
OpExprs has nothing to do with left hand side. You could use LHS if
you were talking about the first arg in an OpExpr, but this is not the
case here.

These three might become non-issues if we change it along the lines of #11:

11. I think the qual matching code needs an overhaul. Really you
should attempt to find the smallest and largest ctid for your
implicitly ANDed ranges. This would require you getting rid of the
BETWEEN type claused you're trying to build in
TidQualFromBaseRestrictinfo
and instead just include all quals, don't ignore other quals when
you've already found your complete range bounds.

The problem with doing it the way that you're doing it now is in cases like:

create table t1(a int);
insert into t1 select generate_Series(1,10000000);
create index on t1 (a);
select ctid,a from t1 order by a desc limit 1; -- find the max ctid.
ctid | a
-------------+----------
(44247,178) | 10000000
(1 row)

set max_parallel_workers_per_gather=0;
explain analyze select ctid,* from t1 where ctid > '(0,0)' and ctid <=
'(44247,178)' and ctid <= '(0,1)';
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Tid Scan on t1 (cost=0.01..169248.78 rows=1 width=10) (actual
time=0.042..2123.432 rows=1 loops=1)
TID Cond: ((ctid > '(0,0)'::tid) AND (ctid <= '(44247,178)'::tid))
Filter: (ctid <= '(0,1)'::tid)
Rows Removed by Filter: 9999999
Planning Time: 4.049 ms
Execution Time: 2123.464 ms
(6 rows)

Due to how you've coded TidQualFromBaseRestrictinfo(), the ctid <=
'(0,1)' qual does not make it into the range. It's left as a filter in
the Tid Scan.

My first thought was to support the fairly-common use case of the
two-bound range "ctid >= ? AND ctid< ?" (or single-bound variations);
hence my original patch for a "TID Range Scan".

Following the comments made earlier, I tried incorporating this into
the existing TID Scan; but I still had the same use case in mind, so
only the first lower and upper bounds were used. My thoughts were
that, while we need to *correctly* handle more complicated cases like
"ctid > '(0,0)' AND ctid <= '(44247,178)' AND ctid <= '(0,1)'", such
queries will not come up in practice and hence it's OK if those extra
bounds are applied in the filter. For the same reason, I did not
consider it worthwhile trying to pick which bound to use in the scan.

I've since realised that such queries aren't always redundant. At
query time we might not know which of the bounds if the "best", but we
will after evaluating them in the executor. So I quite like the idea
of keeping all of them.

This means a TID path's quals is an OR-list of:
- "ctid = ?"
- "ctid = ANY (?)" / "ctid IN (?)"
- "(ctid op ?) AND ..." (where op is one of >,>=,<,<=)
- "CURRENT OF"

I still don't think the scan needs to support quals like "ctid = ? AND
ctid > ?", or "ctid IN (?) AND ctid IN (?)" -- the executor *could*
try to form the intersection but I don't think it's worth the code.
In these cases, picking a simple qual is usually enough for an
efficient scan; the full qual can go into the filter.

I'm part way through implementing this. It looks like it might
actually be less code than what I had before.

#12

Edmund Horner

ejrh00@gmail.com

about 7 years ago

In reply to: Edmund Horner (#11)

4 attachment(s)

Re: Tid scan improvements

Hi all,

I have managed to split my changes into 4 patches:

v3-0001-Add-selectivity-and-nullness-estimates-for-the-ItemP.patch
v3-0002-Support-range-quals-in-Tid-Scan.patch
v3-0003-Support-backward-scans-over-restricted-ranges-in-hea.patch
v3-0004-Tid-Scan-results-are-ordered.patch

(1) is basically independent, and usefully improves estimates for ctid quals.
(2) is the main patch, adding basic range scan support to TidPath and TidScan.
(3) is a small change to properly support backward scans over a
restricted range in heapam.c, and is needed for (4).
(4) adds Backward Tid Scans, and adds path keys to Tid Paths so that
the planner doesn't have to add a sort for certain queries.

I have tried to apply David's suggestions.

In (1), I've included the offset part of a CTID constant in the
selectivity calculation. I've not included "allvisfrac" in the
calculation; I'm not sure it's worth it as it would only affect the
offset part.
I have tried to use iseq to differentiate between <=,>= versus <,>,
but I'm not sure I've got this right. I am also not entirely sure
it's worth it; the changes are already an improvement over the current
behaviour of using hardcoded selectivity constants.

In (2), the planner now picks up a greater variety of TID quals,
including AND-clauses with arbitrary children instead of the original
lower bound/upper bound pair. These are resolved in the executor into
a list of ranges to scan.

(3) is the same code, but I've added a couple of comments to explain the change.

(4) is basically the same pathkey/direction code as before (but as a
separate patch).

I hope the separation will make it easier to review. Item (2) is
still quite big, but a third of it is tests.

Cheers.
Edmund

Attachments:

v3-0002-Support-range-quals-in-Tid-Scan.patchapplication/octet-stream; name=v3-0002-Support-range-quals-in-Tid-Scan.patchDownload

From dafc14ae1ca7d5dcfb454619aa3ca1d2c282a550 Mon Sep 17 00:00:00 2001
From: Edmund Horner <ejrh00@gmail.com>
Date: Fri, 12 Oct 2018 16:28:19 +1300
Subject: [PATCH 2/4] Support range quals in Tid Scan

This means queries with expressions such as "ctid >= ? AND ctid < ?" can be
answered by scanning over that part of a table, rather than falling back to a
full SeqScan.
---
 src/backend/executor/nodeTidscan.c      | 821 ++++++++++++++++++++++++--------
 src/backend/optimizer/path/costsize.c   |  43 +-
 src/backend/optimizer/path/tidpath.c    | 148 ++++--
 src/backend/optimizer/plan/createplan.c |  20 +-
 src/include/catalog/pg_operator.dat     |   6 +-
 src/include/nodes/execnodes.h           |  13 +-
 src/include/nodes/relation.h            |  13 +-
 src/test/regress/expected/tidscan.out   | 239 ++++++++++
 src/test/regress/sql/tidscan.sql        |  72 +++
 9 files changed, 1123 insertions(+), 252 deletions(-)

diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index d21d655..992ad48 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -22,7 +22,9 @@
  */
 #include "postgres.h"
 
+#include "access/relscan.h"
 #include "access/sysattr.h"
+#include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "executor/execdebug.h"
 #include "executor/nodeTidscan.h"
@@ -39,21 +41,124 @@
 	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
 	 ((Var *) (node))->varlevelsup == 0)
 
+typedef enum
+{
+	TIDEXPR_IN_ARRAY,
+	TIDEXPR_EQ,
+	TIDEXPR_UPPER_BOUND,
+	TIDEXPR_LOWER_BOUND
+}			TidExprType;
+
+/* one element in TidExpr's opexprs */
+typedef struct TidOpExpr
+{
+	TidExprType type;			/* type of op */
+	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
+	bool		inclusive;		/* whether op is inclusive */
+}			TidOpExpr;
+
 /* one element in tss_tidexprs */
 typedef struct TidExpr
 {
-	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
-	bool		isarray;		/* if true, it yields tid[] not just tid */
-	CurrentOfExpr *cexpr;		/* alternatively, we can have CURRENT OF */
+	List	   *opexprs;		/* list of individual op exprs */
+	CurrentOfExpr *cexpr;		/* For TIDEXPR_CURRENT_OF */
 } TidExpr;
 
+typedef struct TidRange
+{
+	ItemPointerData first;
+	ItemPointerData last;
+}			TidRange;
+
+static TidOpExpr * MakeTidOpExpr(OpExpr *expr, TidScanState *tidstate);
+static TidOpExpr * MakeTidScalarArrayOpExpr(ScalarArrayOpExpr *saex, TidScanState *tidstate);
+static List *MakeTidOpExprList(List *exprs, TidScanState *tidstate);
 static void TidExprListCreate(TidScanState *tidstate);
+static TidRange * EnsureTidRangeSpace(TidRange * tidRanges, int numRanges, int *numAllocRanges,
+									  int numNewItems);
+static void SetTidLowerBound(ItemPointer tid, bool inclusive, ItemPointer lowerBound);
+static void SetTidUpperBound(ItemPointer tid, bool inclusive, ItemPointer upperBound);
 static void TidListEval(TidScanState *tidstate);
-static int	itemptr_comparator(const void *a, const void *b);
+static bool MergeTidRanges(TidRange * a, TidRange * b);
+static int	tidrange_comparator(const void *a, const void *b);
+static HeapScanDesc BeginTidRangeScan(TidScanState *node, TidRange * range);
+static HeapTuple NextInTidRange(HeapScanDesc scandesc, ScanDirection direction, TidRange * range);
 static TupleTableSlot *TidNext(TidScanState *node);
 
 
 /*
+ * Create an ExprState corresponding to the value part of a TID comparison,
+ * and wrap it in a TidOpExpr.  Set the type and inclusivity of the TidOpExpr
+ * appropriately, depending on the operator and position of the its arguments.
+ */
+static TidOpExpr *
+MakeTidOpExpr(OpExpr *expr, TidScanState *tidstate)
+{
+	Node	   *arg1 = get_leftop((Expr *) expr);
+	Node	   *arg2 = get_rightop((Expr *) expr);
+	ExprState  *exprstate = NULL;
+	bool		invert = false;
+	TidOpExpr  *tidopexpr;
+
+	if (IsCTIDVar(arg1))
+		exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
+	else if (IsCTIDVar(arg2))
+	{
+		exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
+		invert = true;
+	}
+	else
+		elog(ERROR, "could not identify CTID variable");
+
+	tidopexpr = (TidOpExpr *) palloc0(sizeof(TidOpExpr));
+
+	if (expr->opno == TIDLessOperator || expr->opno == TIDLessEqOperator)
+		tidopexpr->type = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
+	else if (expr->opno == TIDGreaterOperator || expr->opno == TIDGreaterEqOperator)
+		tidopexpr->type = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
+	else
+		tidopexpr->type = TIDEXPR_EQ;
+
+	tidopexpr->exprstate = exprstate;
+
+	tidopexpr->inclusive = expr->opno == TIDLessEqOperator || expr->opno == TIDGreaterEqOperator;
+
+	return tidopexpr;
+}
+
+static TidOpExpr *
+MakeTidScalarArrayOpExpr(ScalarArrayOpExpr *saex, TidScanState *tidstate)
+{
+	TidOpExpr  *tidopexpr;
+
+	Assert(IsCTIDVar(linitial(saex->args)));
+
+	tidopexpr = (TidOpExpr *) palloc0(sizeof(TidOpExpr));
+	tidopexpr->exprstate = ExecInitExpr(lsecond(saex->args),
+										&tidstate->ss.ps);
+	tidopexpr->type = TIDEXPR_IN_ARRAY;
+
+	return tidopexpr;
+}
+
+static List *
+MakeTidOpExprList(List *exprs, TidScanState *tidstate)
+{
+	ListCell   *l;
+	List	   *tidopexprs = NIL;
+
+	foreach(l, exprs)
+	{
+		OpExpr	   *opexpr = lfirst(l);
+		TidOpExpr  *tidopexpr = MakeTidOpExpr(opexpr, tidstate);
+
+		tidopexprs = lappend(tidopexprs, tidopexpr);
+	}
+
+	return tidopexprs;
+}
+
+/*
  * Extract the qual subexpressions that yield TIDs to search for,
  * and compile them into ExprStates if they're ordinary expressions.
  *
@@ -69,6 +174,17 @@ TidExprListCreate(TidScanState *tidstate)
 	tidstate->tss_tidexprs = NIL;
 	tidstate->tss_isCurrentOf = false;
 
+	/*
+	 * If no quals were specified, then a complete scan is assumed.  Make a
+	 * TidExpr with an empty list of TidOpExprs.
+	 */
+	if (!node->tidquals)
+	{
+		TidExpr    *tidexpr = (TidExpr *) palloc0(sizeof(TidExpr));
+
+		tidstate->tss_tidexprs = lappend(tidstate->tss_tidexprs, tidexpr);
+	}
+
 	foreach(l, node->tidquals)
 	{
 		Expr	   *expr = (Expr *) lfirst(l);
@@ -76,37 +192,30 @@ TidExprListCreate(TidScanState *tidstate)
 
 		if (is_opclause(expr))
 		{
-			Node	   *arg1;
-			Node	   *arg2;
-
-			arg1 = get_leftop(expr);
-			arg2 = get_rightop(expr);
-			if (IsCTIDVar(arg1))
-				tidexpr->exprstate = ExecInitExpr((Expr *) arg2,
-												  &tidstate->ss.ps);
-			else if (IsCTIDVar(arg2))
-				tidexpr->exprstate = ExecInitExpr((Expr *) arg1,
-												  &tidstate->ss.ps);
-			else
-				elog(ERROR, "could not identify CTID variable");
-			tidexpr->isarray = false;
+			OpExpr	   *opexpr = (OpExpr *) expr;
+			TidOpExpr  *tidopexpr = MakeTidOpExpr(opexpr, tidstate);
+
+			tidexpr->opexprs = list_make1(tidopexpr);
 		}
 		else if (expr && IsA(expr, ScalarArrayOpExpr))
 		{
 			ScalarArrayOpExpr *saex = (ScalarArrayOpExpr *) expr;
+			TidOpExpr  *tidopexpr = MakeTidScalarArrayOpExpr(saex, tidstate);
 
-			Assert(IsCTIDVar(linitial(saex->args)));
-			tidexpr->exprstate = ExecInitExpr(lsecond(saex->args),
-											  &tidstate->ss.ps);
-			tidexpr->isarray = true;
+			tidexpr->opexprs = list_make1(tidopexpr);
 		}
 		else if (expr && IsA(expr, CurrentOfExpr))
 		{
 			CurrentOfExpr *cexpr = (CurrentOfExpr *) expr;
 
+			/* For CURRENT OF, save the expression in the TidExpr. */
 			tidexpr->cexpr = cexpr;
 			tidstate->tss_isCurrentOf = true;
 		}
+		else if (and_clause((Node *) expr))
+		{
+			tidexpr->opexprs = MakeTidOpExprList(((BoolExpr *) expr)->args, tidstate);
+		}
 		else
 			elog(ERROR, "could not identify CTID expression");
 
@@ -119,7 +228,227 @@ TidExprListCreate(TidScanState *tidstate)
 }
 
 /*
- * Compute the list of TIDs to be visited, by evaluating the expressions
+ * Ensure the array of TidRange objects has enough space for new items.
+ * May reallocate array.
+ */
+static TidRange *
+EnsureTidRangeSpace(TidRange * tidRanges, int numRanges, int *numAllocRanges,
+					int numNewItems)
+{
+	if (numRanges + numNewItems > *numAllocRanges)
+	{
+		/* If growing by one, grow exponentially; otherwise, grow just enough. */
+		if (numNewItems == 1)
+			*numAllocRanges *= 2;
+		else
+			*numAllocRanges = numRanges + numNewItems;
+
+		tidRanges = (TidRange *)
+			repalloc(tidRanges,
+					 *numAllocRanges * sizeof(TidRange));
+	}
+	return tidRanges;
+}
+
+/*
+ * Set a lower bound tid, taking into account the inclusivity of the bound.
+ */
+static void
+SetTidLowerBound(ItemPointer tid, bool inclusive, ItemPointer lowerBound)
+{
+	OffsetNumber offset;
+
+	*lowerBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	if (!inclusive)
+		ItemPointerSetOffsetNumber(lowerBound, OffsetNumberNext(offset));
+	else if (offset == 0)
+		ItemPointerSetOffsetNumber(lowerBound, 1);
+}
+
+/*
+ * Set an upper bound tid, taking into account the inclusivity of the bound.
+ */
+static void
+SetTidUpperBound(ItemPointer tid, bool inclusive, ItemPointer upperBound)
+{
+	OffsetNumber offset;
+
+	*upperBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	/*
+	 * Since TID offsets start at 1, an inclusive upper bound with offset 0
+	 * can be treated as an exclusive bound.  This has the benefit of
+	 * eliminating that block from the scan range.
+	 */
+	if (inclusive && offset == 0)
+		inclusive = false;
+
+	if (!inclusive)
+	{
+		if (offset == 0)
+		{
+			BlockNumber block = ItemPointerGetBlockNumberNoCheck(upperBound);
+
+			/*
+			 * If the upper bound was already block 0, then there is no valid
+			 * range.
+			 */
+			if (block == 0)
+				return;
+
+			ItemPointerSetBlockNumber(upperBound, block - 1);
+			ItemPointerSetOffsetNumber(upperBound, MaxOffsetNumber);
+		}
+		else
+			ItemPointerSetOffsetNumber(upperBound, OffsetNumberPrev(offset));
+	}
+}
+
+static void
+TidInArrayExprEval(TidOpExpr * tidopexpr, BlockNumber nblocks, TidScanState *tidstate,
+				   TidRange * *tidRanges, int *numRanges, int *numAllocRanges)
+{
+	ExprContext *econtext = tidstate->ss.ps.ps_ExprContext;
+	bool		isNull;
+	Datum		arraydatum;
+	ArrayType  *itemarray;
+	Datum	   *ipdatums;
+	bool	   *ipnulls;
+	int			ndatums;
+	int			i;
+
+	arraydatum = ExecEvalExprSwitchContext(tidopexpr->exprstate,
+										   econtext,
+										   &isNull);
+	if (isNull)
+		return;
+
+	itemarray = DatumGetArrayTypeP(arraydatum);
+	deconstruct_array(itemarray,
+					  TIDOID, sizeof(ItemPointerData), false, 's',
+					  &ipdatums, &ipnulls, &ndatums);
+
+	*tidRanges = EnsureTidRangeSpace(*tidRanges, *numRanges, numAllocRanges, ndatums);
+
+	for (i = 0; i < ndatums; i++)
+	{
+		if (!ipnulls[i])
+		{
+			ItemPointer itemptr = (ItemPointer) DatumGetPointer(ipdatums[i]);
+
+			if (ItemPointerIsValid(itemptr) &&
+				ItemPointerGetBlockNumber(itemptr) < nblocks)
+			{
+				(*tidRanges)[*numRanges].first = *itemptr;
+				(*tidRanges)[*numRanges].last = *itemptr;
+				(*numRanges)++;
+			}
+		}
+	}
+	pfree(ipdatums);
+	pfree(ipnulls);
+}
+
+static void
+TidExprEval(TidExpr *expr, BlockNumber nblocks, TidScanState *tidstate,
+			TidRange * *tidRanges, int *numRanges, int *numAllocRanges)
+{
+	ExprContext *econtext = tidstate->ss.ps.ps_ExprContext;
+	ListCell   *l;
+	ItemPointerData lowerBound;
+	ItemPointerData upperBound;
+
+	/* The biggest range on an empty table is empty; just skip it. */
+	if (nblocks == 0)
+		return;
+
+	/* Set the lower and upper bound to scan the whole table. */
+	ItemPointerSetBlockNumber(&lowerBound, 0);
+	ItemPointerSetOffsetNumber(&lowerBound, 1);
+	ItemPointerSetBlockNumber(&upperBound, nblocks - 1);
+	ItemPointerSetOffsetNumber(&upperBound, MaxOffsetNumber);
+
+	foreach(l, expr->opexprs)
+	{
+		TidOpExpr  *tidopexpr = (TidOpExpr *) lfirst(l);
+
+		if (tidopexpr->type == TIDEXPR_IN_ARRAY)
+		{
+			TidInArrayExprEval(tidopexpr, nblocks, tidstate,
+							   tidRanges, numRanges, numAllocRanges);
+
+			/*
+			 * A CTID = ANY expression only exists by itself; there shouldn't
+			 * be any other quals alongside it.  TidInArrayExprEval has
+			 * already added the ranges, so just return here.
+			 */
+			Assert(list_length(expr->opexprs) == 1);
+			return;
+		}
+		else
+		{
+			ItemPointer itemptr;
+			bool		isNull;
+
+			/* Evaluate this bound. */
+			itemptr = (ItemPointer)
+				DatumGetPointer(ExecEvalExprSwitchContext(tidopexpr->exprstate,
+														  econtext,
+														  &isNull));
+
+			/* If the bound is NULL, *nothing* matches the qual. */
+			if (isNull)
+				return;
+
+			if (tidopexpr->type == TIDEXPR_EQ && ItemPointerIsValid(itemptr))
+			{
+				lowerBound = *itemptr;
+				upperBound = *itemptr;
+
+				/*
+				 * A CTID = ? expression only exists by itself, so set the
+				 * range to this single TID, and exit the loop (the remainder
+				 * of this function will add the range).
+				 */
+				Assert(list_length(expr->opexprs) == 1);
+				break;
+			}
+
+			if (tidopexpr->type == TIDEXPR_LOWER_BOUND)
+			{
+				ItemPointerData lb;
+
+				SetTidLowerBound(itemptr, tidopexpr->inclusive, &lb);
+				if (ItemPointerCompare(&lb, &lowerBound) > 0)
+					lowerBound = lb;
+			}
+
+			if (tidopexpr->type == TIDEXPR_UPPER_BOUND)
+			{
+				ItemPointerData ub;
+
+				SetTidUpperBound(itemptr, tidopexpr->inclusive, &ub);
+				if (ItemPointerCompare(&ub, &upperBound) < 0)
+					upperBound = ub;
+			}
+		}
+	}
+
+	/* If the resulting range is not empty, add it to the array. */
+	if (ItemPointerCompare(&lowerBound, &upperBound) <= 0)
+	{
+		*tidRanges = EnsureTidRangeSpace(*tidRanges, *numRanges, numAllocRanges, 1);
+		(*tidRanges)[*numRanges].first = lowerBound;
+		(*tidRanges)[*numRanges].last = upperBound;
+		(*numRanges)++;
+	}
+}
+
+/*
+ * Compute the list of TID ranges to be visited, by evaluating the expressions
  * for them.
  *
  * (The result is actually an array, not a list.)
@@ -129,9 +458,9 @@ TidListEval(TidScanState *tidstate)
 {
 	ExprContext *econtext = tidstate->ss.ps.ps_ExprContext;
 	BlockNumber nblocks;
-	ItemPointerData *tidList;
-	int			numAllocTids;
-	int			numTids;
+	TidRange   *tidRanges;
+	int			numAllocRanges;
+	int			numRanges;
 	ListCell   *l;
 
 	/*
@@ -147,76 +476,15 @@ TidListEval(TidScanState *tidstate)
 	 * are simple OpExprs or CurrentOfExprs.  If there are any
 	 * ScalarArrayOpExprs, we may have to enlarge the array.
 	 */
-	numAllocTids = list_length(tidstate->tss_tidexprs);
-	tidList = (ItemPointerData *)
-		palloc(numAllocTids * sizeof(ItemPointerData));
-	numTids = 0;
+	numAllocRanges = list_length(tidstate->tss_tidexprs);
+	tidRanges = (TidRange *) palloc0(numAllocRanges * sizeof(TidRange));
+	numRanges = 0;
 
 	foreach(l, tidstate->tss_tidexprs)
 	{
 		TidExpr    *tidexpr = (TidExpr *) lfirst(l);
-		ItemPointer itemptr;
-		bool		isNull;
 
-		if (tidexpr->exprstate && !tidexpr->isarray)
-		{
-			itemptr = (ItemPointer)
-				DatumGetPointer(ExecEvalExprSwitchContext(tidexpr->exprstate,
-														  econtext,
-														  &isNull));
-			if (!isNull &&
-				ItemPointerIsValid(itemptr) &&
-				ItemPointerGetBlockNumber(itemptr) < nblocks)
-			{
-				if (numTids >= numAllocTids)
-				{
-					numAllocTids *= 2;
-					tidList = (ItemPointerData *)
-						repalloc(tidList,
-								 numAllocTids * sizeof(ItemPointerData));
-				}
-				tidList[numTids++] = *itemptr;
-			}
-		}
-		else if (tidexpr->exprstate && tidexpr->isarray)
-		{
-			Datum		arraydatum;
-			ArrayType  *itemarray;
-			Datum	   *ipdatums;
-			bool	   *ipnulls;
-			int			ndatums;
-			int			i;
-
-			arraydatum = ExecEvalExprSwitchContext(tidexpr->exprstate,
-												   econtext,
-												   &isNull);
-			if (isNull)
-				continue;
-			itemarray = DatumGetArrayTypeP(arraydatum);
-			deconstruct_array(itemarray,
-							  TIDOID, sizeof(ItemPointerData), false, 's',
-							  &ipdatums, &ipnulls, &ndatums);
-			if (numTids + ndatums > numAllocTids)
-			{
-				numAllocTids = numTids + ndatums;
-				tidList = (ItemPointerData *)
-					repalloc(tidList,
-							 numAllocTids * sizeof(ItemPointerData));
-			}
-			for (i = 0; i < ndatums; i++)
-			{
-				if (!ipnulls[i])
-				{
-					itemptr = (ItemPointer) DatumGetPointer(ipdatums[i]);
-					if (ItemPointerIsValid(itemptr) &&
-						ItemPointerGetBlockNumber(itemptr) < nblocks)
-						tidList[numTids++] = *itemptr;
-				}
-			}
-			pfree(ipdatums);
-			pfree(ipnulls);
-		}
-		else
+		if (tidexpr->cexpr)
 		{
 			ItemPointerData cursor_tid;
 
@@ -225,16 +493,23 @@ TidListEval(TidScanState *tidstate)
 							  RelationGetRelid(tidstate->ss.ss_currentRelation),
 							  &cursor_tid))
 			{
-				if (numTids >= numAllocTids)
-				{
-					numAllocTids *= 2;
-					tidList = (ItemPointerData *)
-						repalloc(tidList,
-								 numAllocTids * sizeof(ItemPointerData));
-				}
-				tidList[numTids++] = cursor_tid;
+				/*
+				 * A current-of TidExpr only exists by itself, and we should
+				 * already have allocated a tidList entry for it.  We don't
+				 * need to check whether the tidList array needs to be
+				 * resized.
+				 */
+				Assert(numRanges < numAllocRanges);
+				tidRanges[numRanges].first = cursor_tid;
+				tidRanges[numRanges].last = cursor_tid;
+				numRanges++;
 			}
 		}
+		else
+		{
+			TidExprEval(tidexpr, nblocks, tidstate,
+						&tidRanges, &numRanges, &numAllocRanges);
+		}
 	}
 
 	/*
@@ -243,52 +518,152 @@ TidListEval(TidScanState *tidstate)
 	 * the list.  Sorting makes it easier to detect duplicates, and as a bonus
 	 * ensures that we will visit the heap in the most efficient way.
 	 */
-	if (numTids > 1)
+	if (numRanges > 1)
 	{
-		int			lastTid;
+		int			lastRange;
 		int			i;
 
 		/* CurrentOfExpr could never appear OR'd with something else */
 		Assert(!tidstate->tss_isCurrentOf);
 
-		qsort((void *) tidList, numTids, sizeof(ItemPointerData),
-			  itemptr_comparator);
-		lastTid = 0;
-		for (i = 1; i < numTids; i++)
+		qsort((void *) tidRanges, numRanges, sizeof(TidRange), tidrange_comparator);
+		lastRange = 0;
+		for (i = 1; i < numRanges; i++)
 		{
-			if (!ItemPointerEquals(&tidList[lastTid], &tidList[i]))
-				tidList[++lastTid] = tidList[i];
+			if (!MergeTidRanges(&tidRanges[lastRange], &tidRanges[i]))
+				tidRanges[++lastRange] = tidRanges[i];
 		}
-		numTids = lastTid + 1;
+		numRanges = lastRange + 1;
 	}
 
-	tidstate->tss_TidList = tidList;
-	tidstate->tss_NumTids = numTids;
+	tidstate->tss_TidRanges = tidRanges;
+	tidstate->tss_NumRanges = numRanges;
 	tidstate->tss_TidPtr = -1;
 }
 
 /*
- * qsort comparator for ItemPointerData items
+ * If two ranges overlap, merge them into one.
+ * Assumes the two ranges are already ordered by (first, last).
+ * Returns true if they were merged.
+ */
+static bool
+MergeTidRanges(TidRange * a, TidRange * b)
+{
+	ItemPointerData a_last = a->last;
+	ItemPointerData b_last;
+
+	if (!ItemPointerIsValid(&a_last))
+		a_last = a->first;
+
+	/*
+	 * If the first range ends before the second one begins, they don't
+	 * overlap.
+	 */
+	if (ItemPointerCompare(&a_last, &b->first) < 0)
+		return false;
+
+	b_last = b->last;
+	if (!ItemPointerIsValid(&b_last))
+		b_last = b->first;
+
+	/*
+	 * Since they overlap, the end of the new range should be the maximum of
+	 * the original two range ends.
+	 */
+	if (ItemPointerCompare(&a_last, &b_last) < 0)
+		a->last = b->last;
+	return true;
+}
+
+/*
+ * qsort comparator for TidRange items
  */
 static int
-itemptr_comparator(const void *a, const void *b)
+tidrange_comparator(const void *a, const void *b)
+{
+	TidRange   *tra = (TidRange *) a;
+	TidRange   *trb = (TidRange *) b;
+	int			cmp_first = ItemPointerCompare(&tra->first, &trb->first);
+
+	if (cmp_first != 0)
+		return cmp_first;
+	else
+		return ItemPointerCompare(&tra->last, &trb->last);
+}
+
+static HeapScanDesc
+BeginTidRangeScan(TidScanState *node, TidRange * range)
+{
+	HeapScanDesc scandesc = node->ss.ss_currentScanDesc;
+	BlockNumber first_block = ItemPointerGetBlockNumberNoCheck(&range->first);
+	BlockNumber last_block = ItemPointerGetBlockNumberNoCheck(&range->last);
+
+	if (!scandesc)
+	{
+		EState	   *estate = node->ss.ps.state;
+
+		scandesc = heap_beginscan_strat(node->ss.ss_currentRelation,
+										estate->es_snapshot,
+										0, NULL,
+										false, false);
+		node->ss.ss_currentScanDesc = scandesc;
+	}
+	else
+		heap_rescan(scandesc, NULL);
+
+	heap_setscanlimits(scandesc, first_block, last_block - first_block + 1);
+	node->tss_inScan = true;
+	return scandesc;
+}
+
+static HeapTuple
+NextInTidRange(HeapScanDesc scandesc, ScanDirection direction, TidRange * range)
 {
-	const ItemPointerData *ipa = (const ItemPointerData *) a;
-	const ItemPointerData *ipb = (const ItemPointerData *) b;
-	BlockNumber ba = ItemPointerGetBlockNumber(ipa);
-	BlockNumber bb = ItemPointerGetBlockNumber(ipb);
-	OffsetNumber oa = ItemPointerGetOffsetNumber(ipa);
-	OffsetNumber ob = ItemPointerGetOffsetNumber(ipb);
-
-	if (ba < bb)
-		return -1;
-	if (ba > bb)
-		return 1;
-	if (oa < ob)
-		return -1;
-	if (oa > ob)
-		return 1;
-	return 0;
+	BlockNumber first_block = ItemPointerGetBlockNumber(&range->first);
+	OffsetNumber first_offset = ItemPointerGetOffsetNumber(&range->first);
+	BlockNumber last_block = ItemPointerGetBlockNumber(&range->last);
+	OffsetNumber last_offset = ItemPointerGetOffsetNumber(&range->last);
+	HeapTuple	tuple;
+
+	for (;;)
+	{
+		BlockNumber block;
+		OffsetNumber offset;
+
+		tuple = heap_getnext(scandesc, direction);
+		if (!tuple)
+			break;
+
+		/* Check that the tuple is within the required range. */
+		block = ItemPointerGetBlockNumber(&tuple->t_self);
+		offset = ItemPointerGetOffsetNumber(&tuple->t_self);
+
+		/*
+		 * If the tuple is in the fist block of the range and before the first
+		 * requested offset, then we can either skip it (if scanning forward),
+		 * or end the scan (if scanning backward).
+		 */
+		if (block == first_block && offset < first_offset)
+		{
+			if (ScanDirectionIsForward(direction))
+				continue;
+			else
+				return NULL;
+		}
+
+		/* Similarly for the last block, after the last requested offset. */
+		if (block == last_block && offset > last_offset)
+		{
+			if (ScanDirectionIsBackward(direction))
+				continue;
+			else
+				return NULL;
+		}
+
+		break;
+	}
+
+	return tuple;
 }
 
 /* ----------------------------------------------------------------
@@ -302,6 +677,7 @@ itemptr_comparator(const void *a, const void *b)
 static TupleTableSlot *
 TidNext(TidScanState *node)
 {
+	HeapScanDesc scandesc;
 	EState	   *estate;
 	ScanDirection direction;
 	Snapshot	snapshot;
@@ -309,105 +685,143 @@ TidNext(TidScanState *node)
 	HeapTuple	tuple;
 	TupleTableSlot *slot;
 	Buffer		buffer = InvalidBuffer;
-	ItemPointerData *tidList;
-	int			numTids;
-	bool		bBackward;
+	int			numRanges;
 
 	/*
 	 * extract necessary information from tid scan node
 	 */
+	scandesc = node->ss.ss_currentScanDesc;
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	snapshot = estate->es_snapshot;
 	heapRelation = node->ss.ss_currentRelation;
 	slot = node->ss.ss_ScanTupleSlot;
 
-	/*
-	 * First time through, compute the list of TIDs to be visited
-	 */
-	if (node->tss_TidList == NULL)
+	/* First time through, compute the list of TID ranges to be visited */
+	if (node->tss_TidRanges == NULL)
+	{
 		TidListEval(node);
 
-	tidList = node->tss_TidList;
-	numTids = node->tss_NumTids;
+		node->tss_TidPtr = -1;
+	}
 
-	/*
-	 * We use node->tss_htup as the tuple pointer; note this can't just be a
-	 * local variable here, as the scan tuple slot will keep a pointer to it.
-	 */
-	tuple = &(node->tss_htup);
+	numRanges = node->tss_NumRanges;
 
-	/*
-	 * Initialize or advance scan position, depending on direction.
-	 */
-	bBackward = ScanDirectionIsBackward(direction);
-	if (bBackward)
-	{
-		if (node->tss_TidPtr < 0)
-		{
-			/* initialize for backward scan */
-			node->tss_TidPtr = numTids - 1;
-		}
-		else
-			node->tss_TidPtr--;
-	}
-	else
+	tuple = NULL;
+	for (;;)
 	{
-		if (node->tss_TidPtr < 0)
+		TidRange   *currentRange;
+
+		if (!node->tss_inScan)
 		{
-			/* initialize for forward scan */
-			node->tss_TidPtr = 0;
+			/* Initialize or advance scan position, depending on direction. */
+			bool		bBackward = ScanDirectionIsBackward(direction);
+
+			if (bBackward)
+			{
+				if (node->tss_TidPtr < 0)
+				{
+					/* initialize for backward scan */
+					node->tss_TidPtr = numRanges - 1;
+				}
+				else
+					node->tss_TidPtr--;
+			}
+			else
+			{
+				if (node->tss_TidPtr < 0)
+				{
+					/* initialize for forward scan */
+					node->tss_TidPtr = 0;
+				}
+				else
+					node->tss_TidPtr++;
+			}
 		}
-		else
-			node->tss_TidPtr++;
-	}
 
-	while (node->tss_TidPtr >= 0 && node->tss_TidPtr < numTids)
-	{
-		tuple->t_self = tidList[node->tss_TidPtr];
+		if (node->tss_TidPtr >= numRanges || node->tss_TidPtr < 0)
+			break;
+
+		currentRange = &node->tss_TidRanges[node->tss_TidPtr];
 
 		/*
-		 * For WHERE CURRENT OF, the tuple retrieved from the cursor might
-		 * since have been updated; if so, we should fetch the version that is
-		 * current according to our snapshot.
+		 * Ranges with only one item -- including one resulting from a
+		 * CURRENT-OF qual -- are handled by looking up the item directly.
 		 */
-		if (node->tss_isCurrentOf)
-			heap_get_latest_tid(heapRelation, snapshot, &tuple->t_self);
-
-		if (heap_fetch(heapRelation, snapshot, tuple, &buffer, false, NULL))
+		if (ItemPointerEquals(&currentRange->first, &currentRange->last))
 		{
 			/*
-			 * Store the scanned tuple in the scan tuple slot of the scan
-			 * state.  Eventually we will only do this and not return a tuple.
+			 * We use node->tss_htup as the tuple pointer; note this can't
+			 * just be a local variable here, as the scan tuple slot will keep
+			 * a pointer to it.
 			 */
-			ExecStoreBufferHeapTuple(tuple, /* tuple to store */
-									 slot,	/* slot to store in */
-									 buffer);	/* buffer associated with
-												 * tuple */
+			tuple = &(node->tss_htup);
+			tuple->t_self = currentRange->first;
 
 			/*
-			 * At this point we have an extra pin on the buffer, because
-			 * ExecStoreHeapTuple incremented the pin count. Drop our local
-			 * pin.
+			 * For WHERE CURRENT OF, the tuple retrieved from the cursor might
+			 * since have been updated; if so, we should fetch the version
+			 * that is current according to our snapshot.
 			 */
-			ReleaseBuffer(buffer);
+			if (node->tss_isCurrentOf)
+				heap_get_latest_tid(heapRelation, snapshot, &tuple->t_self);
 
-			return slot;
+			if (heap_fetch(heapRelation, snapshot, tuple, &buffer, false, NULL))
+			{
+				/*
+				 * Store the scanned tuple in the scan tuple slot of the scan
+				 * state.  Eventually we will only do this and not return a
+				 * tuple.
+				 */
+				ExecStoreBufferHeapTuple(tuple, /* tuple to store */
+										 slot,	/* slot to store in */
+										 buffer);	/* buffer associated with
+													 * tuple */
+
+				/*
+				 * At this point we have an extra pin on the buffer, because
+				 * ExecStoreHeapTuple incremented the pin count. Drop our
+				 * local pin.
+				 */
+				ReleaseBuffer(buffer);
+
+				return slot;
+			}
+			else
+			{
+				tuple = NULL;
+			}
 		}
-		/* Bad TID or failed snapshot qual; try next */
-		if (bBackward)
-			node->tss_TidPtr--;
 		else
-			node->tss_TidPtr++;
+		{
+			if (!node->tss_inScan)
+				scandesc = BeginTidRangeScan(node, currentRange);
+
+			tuple = NextInTidRange(scandesc, direction, currentRange);
+			if (tuple)
+				break;
 
-		CHECK_FOR_INTERRUPTS();
+			node->tss_inScan = false;
+		}
 	}
 
 	/*
-	 * if we get here it means the tid scan failed so we are at the end of the
-	 * scan..
+	 * save the tuple and the buffer returned to us by the access methods in
+	 * our scan tuple slot and return the slot.  Note: we pass 'false' because
+	 * tuples returned by heap_getnext() are pointers onto disk pages and were
+	 * not created with palloc() and so should not be pfree()'d.  Note also
+	 * that ExecStoreHeapTuple will increment the refcount of the buffer; the
+	 * refcount will not be dropped until the tuple table slot is cleared.
 	 */
-	return ExecClearTuple(slot);
+	if (tuple)
+		ExecStoreBufferHeapTuple(tuple, /* tuple to store */
+								 slot,	/* slot to store in */
+								 scandesc->rs_cbuf);	/* buffer associated
+														 * with this tuple */
+	else
+		ExecClearTuple(slot);
+
+	return slot;
 }
 
 /*
@@ -460,11 +874,13 @@ ExecTidScan(PlanState *pstate)
 void
 ExecReScanTidScan(TidScanState *node)
 {
-	if (node->tss_TidList)
-		pfree(node->tss_TidList);
-	node->tss_TidList = NULL;
-	node->tss_NumTids = 0;
+	if (node->tss_TidRanges)
+		pfree(node->tss_TidRanges);
+
+	node->tss_TidRanges = NULL;
+	node->tss_NumRanges = 0;
 	node->tss_TidPtr = -1;
+	node->tss_inScan = false;
 
 	ExecScanReScan(&node->ss);
 }
@@ -479,6 +895,8 @@ ExecReScanTidScan(TidScanState *node)
 void
 ExecEndTidScan(TidScanState *node)
 {
+	HeapScanDesc scan = node->ss.ss_currentScanDesc;
+
 	/*
 	 * Free the exprcontext
 	 */
@@ -489,6 +907,10 @@ ExecEndTidScan(TidScanState *node)
 	 */
 	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+	/* close heap scan */
+	if (scan != NULL)
+		heap_endscan(scan);
 }
 
 /* ----------------------------------------------------------------
@@ -524,11 +946,12 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
 	ExecAssignExprContext(estate, &tidstate->ss.ps);
 
 	/*
-	 * mark tid list as not computed yet
+	 * mark tid range list as not computed yet
 	 */
-	tidstate->tss_TidList = NULL;
-	tidstate->tss_NumTids = 0;
+	tidstate->tss_TidRanges = NULL;
+	tidstate->tss_NumRanges = 0;
 	tidstate->tss_TidPtr = -1;
+	tidstate->tss_inScan = false;
 
 	/*
 	 * open the scan relation
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7bf67a0..bffd2c0 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1184,9 +1184,12 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	QualCost	qpqual_cost;
 	Cost		cpu_per_tuple;
 	QualCost	tid_qual_cost;
-	int			ntuples;
+	double		ntuples;
+	double		nrandompages;
+	double		nseqpages;
 	ListCell   *l;
 	double		spc_random_page_cost;
+	double		spc_seq_page_cost;
 
 	/* Should only be applied to base relations */
 	Assert(baserel->relid > 0);
@@ -1198,8 +1201,10 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	else
 		path->rows = baserel->rows;
 
-	/* Count how many tuples we expect to retrieve */
-	ntuples = 0;
+	/* Count how many tuples and pages we expect to retrieve */
+	ntuples = 0.0;
+	nrandompages = 0.0;
+	nseqpages = 0.0;
 	foreach(l, tidquals)
 	{
 		if (IsA(lfirst(l), ScalarArrayOpExpr))
@@ -1207,19 +1212,37 @@ cost_tidscan(Path *path, PlannerInfo *root,
 			/* Each element of the array yields 1 tuple */
 			ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) lfirst(l);
 			Node	   *arraynode = (Node *) lsecond(saop->args);
+			int			array_len = estimate_array_length(arraynode);
 
-			ntuples += estimate_array_length(arraynode);
+			ntuples += array_len;
+			nrandompages += array_len;
 		}
 		else if (IsA(lfirst(l), CurrentOfExpr))
 		{
 			/* CURRENT OF yields 1 tuple */
 			isCurrentOf = true;
-			ntuples++;
+			ntuples += 1.0;
+			nrandompages += 1.0;
 		}
 		else
 		{
-			/* It's just CTID = something, count 1 tuple */
-			ntuples++;
+			/*
+			 * For anything else, we'll use the normal selectivity estimate.
+			 * Count the first page as a random page, the rest as sequential.
+			 */
+			Selectivity selectivity = clause_selectivity(root, lfirst(l),
+														 baserel->relid,
+														 JOIN_INNER,
+														 NULL);
+			double		pages = selectivity * baserel->pages;
+
+			if (pages <= 0.0)
+				pages = 1.0;
+
+			/* TODO decide what the costs should be */
+			ntuples += selectivity * baserel->tuples;
+			nseqpages += pages - 1.0;
+			nrandompages += 1.0;
 		}
 	}
 
@@ -1248,10 +1271,10 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	/* fetch estimated page cost for tablespace containing table */
 	get_tablespace_page_costs(baserel->reltablespace,
 							  &spc_random_page_cost,
-							  NULL);
+							  &spc_seq_page_cost);
 
-	/* disk costs --- assume each tuple on a different page */
-	run_cost += spc_random_page_cost * ntuples;
+	/* disk costs */
+	run_cost += spc_random_page_cost * nrandompages + spc_seq_page_cost * nseqpages;
 
 	/* Add scanning CPU costs */
 	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 3bb5b8d..da7a6ff 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -4,13 +4,15 @@
  *	  Routines to determine which TID conditions are usable for scanning
  *	  a given relation, and create TidPaths accordingly.
  *
- * What we are looking for here is WHERE conditions of the form
- * "CTID = pseudoconstant", which can be implemented by just fetching
- * the tuple directly via heap_fetch().  We can also handle OR'd conditions
- * such as (CTID = const1) OR (CTID = const2), as well as ScalarArrayOpExpr
- * conditions of the form CTID = ANY(pseudoconstant_array).  In particular
- * this allows
- *		WHERE ctid IN (tid1, tid2, ...)
+ * What we are looking for here is WHERE conditions of the forms:
+ * - "CTID = pseudoconstant", which can be implemented by just fetching
+ *    the tuple directly via heap_fetch().
+ * - "CTID IN (pseudoconstant, ...)" or "CTID = ANY(pseudoconstant_array)"
+ * - "CTID > pseudoconstant", etc. for >, >=, <, and <=.
+ * - "CTID > pseudoconstant AND CTID < pseudoconstant AND ...", etc.
+ *
+ * We can also handle OR'd conditions of the above form, such as
+ * "(CTID = const1) OR (CTID >= const2) OR CTID IN (...)".
  *
  * We also support "WHERE CURRENT OF cursor" conditions (CurrentOfExpr),
  * which amount to "CTID = run-time-determined-TID".  These could in
@@ -46,32 +48,45 @@
 #include "optimizer/restrictinfo.h"
 
 
-static bool IsTidEqualClause(OpExpr *node, int varno);
+static bool IsTidVar(Var *var, int varno);
+static bool IsTidComparison(OpExpr *node, int varno, Oid expected_comparison_operator);
 static bool IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno);
+static List *MakeTidRangeQuals(List *quals);
+static List *TidCompoundRangeQualFromExpr(Node *expr, int varno);
 static List *TidQualFromExpr(Node *expr, int varno);
 static List *TidQualFromBaseRestrictinfo(RelOptInfo *rel);
 
 
+static bool
+IsTidVar(Var *var, int varno)
+{
+	return (var->varattno == SelfItemPointerAttributeNumber &&
+			var->vartype == TIDOID &&
+			var->varno == varno &&
+			var->varlevelsup == 0);
+}
+
 /*
  * Check to see if an opclause is of the form
- *		CTID = pseudoconstant
+ *		CTID OP pseudoconstant
  * or
- *		pseudoconstant = CTID
+ *		pseudoconstant OP CTID
+ * where OP is the expected comparison operator.
  *
  * We check that the CTID Var belongs to relation "varno".  That is probably
  * redundant considering this is only applied to restriction clauses, but
  * let's be safe.
  */
 static bool
-IsTidEqualClause(OpExpr *node, int varno)
+IsTidComparison(OpExpr *node, int varno, Oid expected_comparison_operator)
 {
 	Node	   *arg1,
 			   *arg2,
 			   *other;
 	Var		   *var;
 
-	/* Operator must be tideq */
-	if (node->opno != TIDEqualOperator)
+	/* Operator must be the expected one */
+	if (node->opno != expected_comparison_operator)
 		return false;
 	if (list_length(node->args) != 2)
 		return false;
@@ -83,19 +98,13 @@ IsTidEqualClause(OpExpr *node, int varno)
 	if (arg1 && IsA(arg1, Var))
 	{
 		var = (Var *) arg1;
-		if (var->varattno == SelfItemPointerAttributeNumber &&
-			var->vartype == TIDOID &&
-			var->varno == varno &&
-			var->varlevelsup == 0)
+		if (IsTidVar(var, varno))
 			other = arg2;
 	}
 	if (!other && arg2 && IsA(arg2, Var))
 	{
 		var = (Var *) arg2;
-		if (var->varattno == SelfItemPointerAttributeNumber &&
-			var->vartype == TIDOID &&
-			var->varno == varno &&
-			var->varlevelsup == 0)
+		if (IsTidVar(var, varno))
 			other = arg1;
 	}
 	if (!other)
@@ -110,6 +119,17 @@ IsTidEqualClause(OpExpr *node, int varno)
 	return true;				/* success */
 }
 
+#define IsTidEqualClause(node, varno)	IsTidComparison(node, varno, TIDEqualOperator)
+#define IsTidLTClause(node, varno)		IsTidComparison(node, varno, TIDLessOperator)
+#define IsTidLEClause(node, varno)		IsTidComparison(node, varno, TIDLessEqOperator)
+#define IsTidGTClause(node, varno)		IsTidComparison(node, varno, TIDGreaterOperator)
+#define IsTidGEClause(node, varno)		IsTidComparison(node, varno, TIDGreaterEqOperator)
+
+#define IsTidRangeClause(node, varno)	(IsTidLTClause(node, varno) || \
+										 IsTidLEClause(node, varno) || \
+										 IsTidGTClause(node, varno) || \
+										 IsTidGEClause(node, varno))
+
 /*
  * Check to see if a clause is of the form
  *		CTID = ANY (pseudoconstant_array)
@@ -134,10 +154,7 @@ IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno)
 	{
 		Var		   *var = (Var *) arg1;
 
-		if (var->varattno == SelfItemPointerAttributeNumber &&
-			var->vartype == TIDOID &&
-			var->varno == varno &&
-			var->varlevelsup == 0)
+		if (IsTidVar(var, varno))
 		{
 			/* The other argument must be a pseudoconstant */
 			if (is_pseudo_constant_clause(arg2))
@@ -148,6 +165,43 @@ IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno)
 	return false;
 }
 
+static List *
+MakeTidRangeQuals(List *quals)
+{
+	if (list_length(quals) == 1)
+		return quals;
+	else
+		return list_make1(make_andclause(quals));
+}
+
+/*
+ * TidCompoundRangeQualFromExpr
+ *
+ * 		Extract a compound CTID range condition from the given qual expression
+ */
+static List *
+TidCompoundRangeQualFromExpr(Node *expr, int varno)
+{
+	List	   *rlst = NIL;
+	ListCell   *l;
+	List	   *found_quals = NIL;
+
+	foreach(l, ((BoolExpr *) expr)->args)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* If this clause contains a range qual, add it to the list. */
+		if (is_opclause(clause) && IsTidRangeClause((OpExpr *) clause, varno))
+			found_quals = lappend(found_quals, clause);
+	}
+
+	/* If we found any, make an AND clause out of them. */
+	if (found_quals)
+		rlst = MakeTidRangeQuals(found_quals);
+
+	return rlst;
+}
+
 /*
  *	Extract a set of CTID conditions from the given qual expression
  *
@@ -174,6 +228,8 @@ TidQualFromExpr(Node *expr, int varno)
 		/* base case: check for tideq opclause */
 		if (IsTidEqualClause((OpExpr *) expr, varno))
 			rlst = list_make1(expr);
+		else if (IsTidRangeClause((OpExpr *) expr, varno))
+			rlst = list_make1(expr);
 	}
 	else if (expr && IsA(expr, ScalarArrayOpExpr))
 	{
@@ -189,11 +245,18 @@ TidQualFromExpr(Node *expr, int varno)
 	}
 	else if (and_clause(expr))
 	{
-		foreach(l, ((BoolExpr *) expr)->args)
+		/* look for a range qual in the clause */
+		rlst = TidCompoundRangeQualFromExpr(expr, varno);
+
+		/* if no range qual was found, look for any other TID qual */
+		if (!rlst)
 		{
-			rlst = TidQualFromExpr((Node *) lfirst(l), varno);
-			if (rlst)
-				break;
+			foreach(l, ((BoolExpr *) expr)->args)
+			{
+				rlst = TidQualFromExpr((Node *) lfirst(l), varno);
+				if (rlst)
+					break;
+			}
 		}
 	}
 	else if (or_clause(expr))
@@ -217,17 +280,26 @@ TidQualFromExpr(Node *expr, int varno)
 }
 
 /*
- *	Extract a set of CTID conditions from the rel's baserestrictinfo list
+ * Extract a set of CTID conditions from the rel's baserestrictinfo list
+ *
+ * Normally we just use the first RestrictInfo item with some usable quals,
+ * but it's also possible for a good compound range qual, such as
+ * "CTID > ? AND CTID < ?", to be split across two items.  So we look for
+ * lower/upper bound range quals in all items and use them if any were found.
+ * In principal there might be more than one lower or upper bound), but we
+ * just use the first one found of each type.
  */
 static List *
 TidQualFromBaseRestrictinfo(RelOptInfo *rel)
 {
 	List	   *rlst = NIL;
 	ListCell   *l;
+	List	   *found_quals = NIL;
 
 	foreach(l, rel->baserestrictinfo)
 	{
 		RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+		Node	   *clause = (Node *) rinfo->clause;
 
 		/*
 		 * If clause must wait till after some lower-security-level
@@ -236,10 +308,23 @@ TidQualFromBaseRestrictinfo(RelOptInfo *rel)
 		if (!restriction_is_securely_promotable(rinfo, rel))
 			continue;
 
-		rlst = TidQualFromExpr((Node *) rinfo->clause, rel->relid);
+		/* If this clause contains a range qual, add it to the list. */
+		if (is_opclause(clause) && IsTidRangeClause((OpExpr *) clause, rel->relid))
+		{
+			found_quals = lappend(found_quals, clause);
+			continue;
+		}
+
+		/* Look for other TID quals. */
+		rlst = TidQualFromExpr((Node *) clause, rel->relid);
 		if (rlst)
 			break;
 	}
+
+	/* Use a range qual if any were found. */
+	if (found_quals)
+		rlst = MakeTidRangeQuals(found_quals);
+
 	return rlst;
 }
 
@@ -264,6 +349,7 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 
 	tidquals = TidQualFromBaseRestrictinfo(rel);
 
+	/* If there are tidquals, then it's worth generating a tidscan path. */
 	if (tidquals)
 		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals,
 												   required_outer));
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index da7a920..ab6e08a 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -3081,6 +3081,23 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	}
 
 	/*
+	 * In the case of a compound qual such as "ctid > ? AND ctid < ? AND ...",
+	 * the various parts will have come from different RestrictInfos.  So
+	 * remove each part separately.
+	 */
+	if (list_length(tidquals) == 1)
+	{
+		Node	   *qual = linitial(tidquals);
+
+		if (and_clause(qual))
+		{
+			BoolExpr   *and_qual = ((BoolExpr *) qual);
+
+			scan_clauses = list_difference(scan_clauses, and_qual->args);
+		}
+	}
+
+	/*
 	 * Remove any clauses that are TID quals.  This is a bit tricky since the
 	 * tidquals list has implicit OR semantics.
 	 */
@@ -3092,7 +3109,8 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	scan_plan = make_tidscan(tlist,
 							 scan_clauses,
 							 scan_relid,
-							 tidquals);
+							 tidquals
+		);
 
 	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
 
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index ce23c2f..7476916 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -156,15 +156,15 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 880a03e..34465df 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1491,14 +1491,16 @@ typedef struct BitmapHeapScanState
 	ParallelBitmapHeapState *pstate;
 } BitmapHeapScanState;
 
+typedef struct TidRange TidRange;
+
 /* ----------------
  *	 TidScanState information
  *
  *		tidexprs	   list of TidExpr structs (see nodeTidscan.c)
  *		isCurrentOf    scan has a CurrentOfExpr qual
- *		NumTids		   number of tids in this scan
+ *		NumRanges	   number of tids in this scan
  *		TidPtr		   index of currently fetched tid
- *		TidList		   evaluated item pointers (array of size NumTids)
+ *		TidRanges	   evaluated item pointers (array of size NumTids)
  *		htup		   currently-fetched tuple, if any
  * ----------------
  */
@@ -1507,10 +1509,11 @@ typedef struct TidScanState
 	ScanState	ss;				/* its first field is NodeTag */
 	List	   *tss_tidexprs;
 	bool		tss_isCurrentOf;
-	int			tss_NumTids;
+	int			tss_NumRanges;
 	int			tss_TidPtr;
-	ItemPointerData *tss_TidList;
-	HeapTupleData tss_htup;
+	TidRange   *tss_TidRanges;
+	bool		tss_inScan;
+	HeapTupleData tss_htup;		/* for current-of and single TID fetches */
 } TidScanState;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 55211ce..20a5edb 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1227,14 +1227,21 @@ typedef struct BitmapOrPath
 /*
  * TidPath represents a scan by TID
  *
- * tidquals is an implicitly OR'ed list of qual expressions of the form
- * "CTID = pseudoconstant" or "CTID = ANY(pseudoconstant_array)".
+ * tidquals is an implicitly OR'ed list of qual expressions of the forms:
+ *   - "CTID = pseudoconstant"
+ *   - "CTID = ANY(pseudoconstant_array)"
+ *   - "CURRENT OF cursor"
+ *   - "(CTID relop pseudoconstant AND ...)"
+ *
+ * If tidquals is empty, all CTIDs will match (contrary to the usual meaning
+ * of an empty disjunction).
+ *
  * Note they are bare expressions, not RestrictInfos.
  */
 typedef struct TidPath
 {
 	Path		path;
-	List	   *tidquals;		/* qual(s) involving CTID = something */
+	List	   *tidquals;
 } TidPath;
 
 /*
diff --git a/src/test/regress/expected/tidscan.out b/src/test/regress/expected/tidscan.out
index 521ed1b..8c137ff 100644
--- a/src/test/regress/expected/tidscan.out
+++ b/src/test/regress/expected/tidscan.out
@@ -177,3 +177,242 @@ UPDATE tidscan SET id = -id WHERE CURRENT OF c RETURNING *;
 ERROR:  cursor "c" is not positioned on a row
 ROLLBACK;
 DROP TABLE tidscan;
+-- tests for tidrangescans
+CREATE TABLE tidrangescan(id integer, data text);
+INSERT INTO tidrangescan SELECT i,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(1,0)';
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (0,1)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,2)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,3)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,4)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,5)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,6)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,7)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,8)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid <= '(1,5)';
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid <= '(1,5)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid <= '(1,5)';
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (0,1)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,2)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,3)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,4)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,5)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,6)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,7)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,8)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (0,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,1)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,2)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,3)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,4)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (1,5)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(15 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(0,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid < '(0,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid | data 
+------+------
+(0 rows)
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(9,8)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid > '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(9,8)';
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (9,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (9,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(9,8)' < ctid;
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ('(9,8)'::tid < ctid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE '(9,8)' < ctid;
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (9,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (9,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(9,8)';
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid >= '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(9,8)';
+  ctid  |                                       data                                       
+--------+----------------------------------------------------------------------------------
+ (9,8)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (9,9)  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (9,10) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(100,0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid >= '(100,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(100,0)';
+ ctid | data 
+------+------
+(0 rows)
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ((ctid > '(4,4)'::tid) AND ('(4,7)'::tid >= ctid))
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+ ctid  |                                       data                                       
+-------+----------------------------------------------------------------------------------
+ (4,5) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,6) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,7) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid))
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+ ctid  |                                       data                                       
+-------+----------------------------------------------------------------------------------
+ (4,5) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,6) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,7) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(3 rows)
+
+-- combinations
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+                                        QUERY PLAN                                         
+-------------------------------------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ((('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid)) OR (ctid = '(2,2)'::tid))
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+ ctid  |                                       data                                       
+-------+----------------------------------------------------------------------------------
+ (2,2) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,5) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,6) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,7) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+                                                     QUERY PLAN                                                     
+--------------------------------------------------------------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ((('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid)) OR (ctid = '(2,2)'::tid))
+   Filter: ((('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid)) OR ((ctid = '(2,2)'::tid) AND (data = 'foo'::text)))
+(3 rows)
+
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+ ctid  |                                       data                                       
+-------+----------------------------------------------------------------------------------
+ (4,5) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,6) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+ (4,7) | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+(3 rows)
+
+-- make sure ranges are combined correctly
+SELECT COUNT(*) FROM tidrangescan WHERE ctid < '(0,3)' OR ctid >= '(0,2)' AND ctid <= '(0,5)';
+ count 
+-------
+     5
+(1 row)
+
+SELECT COUNT(*) FROM tidrangescan WHERE ctid <= '(0,10)' OR ctid >= '(0,2)' AND ctid <= '(0,5)';
+ count 
+-------
+    10
+(1 row)
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan_empty
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+ ctid | data 
+------+------
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan_empty
+   TID Cond: (ctid > '(9,0)'::tid)
+(2 rows)
+
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+ ctid | data 
+------+------
+(0 rows)
+
diff --git a/src/test/regress/sql/tidscan.sql b/src/test/regress/sql/tidscan.sql
index a8472e0..e8d266b 100644
--- a/src/test/regress/sql/tidscan.sql
+++ b/src/test/regress/sql/tidscan.sql
@@ -64,3 +64,75 @@ UPDATE tidscan SET id = -id WHERE CURRENT OF c RETURNING *;
 ROLLBACK;
 
 DROP TABLE tidscan;
+
+-- tests for tidrangescans
+
+CREATE TABLE tidrangescan(id integer, data text);
+
+INSERT INTO tidrangescan SELECT i,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(1,0)';
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(1,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid <= '(1,5)';
+SELECT ctid, data FROM tidrangescan WHERE ctid <= '(1,5)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(0,0)';
+SELECT ctid, data FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(9,8)';
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(9,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(9,8)' < ctid;
+SELECT ctid, data FROM tidrangescan WHERE '(9,8)' < ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(9,8)';
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(9,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(100,0)';
+SELECT ctid, data FROM tidrangescan WHERE ctid >= '(100,0)';
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+SELECT ctid, data FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+
+-- combinations
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+SELECT ctid, data FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+
+-- make sure ranges are combined correctly
+SELECT COUNT(*) FROM tidrangescan WHERE ctid < '(0,3)' OR ctid >= '(0,2)' AND ctid <= '(0,5)';
+
+SELECT COUNT(*) FROM tidrangescan WHERE ctid <= '(0,10)' OR ctid >= '(0,2)' AND ctid <= '(0,5)';
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';
-- 
2.7.4

v3-0001-Add-selectivity-and-nullness-estimates-for-the-ItemP.patchapplication/octet-stream; name=v3-0001-Add-selectivity-and-nullness-estimates-for-the-ItemP.patchDownload

From ae3e1bd313475a4bf3933870939cdb43de7bdf50 Mon Sep 17 00:00:00 2001
From: Edmund Horner <ejrh00@gmail.com>
Date: Fri, 12 Oct 2018 13:36:24 +1300
Subject: [PATCH 1/4] Add selectivity and nullness estimates for CTID system
 variables

Previously, estimates for ItemPointer range quals, such as "ctid <= '(5,7)'",
resorted to the default values of 0.33 for range selectivity, and 0.005 for
nullness, although there was special-case handling for equality quals like
"ctid = (5,7)", which used the appropriate selectivity for distinct items.

This change uses the relation size to estimate the selectivity of a range qual,
and also uses a nullness estimate of 0 for ctid, since it is never NULL.
---
 src/backend/utils/adt/selfuncs.c | 52 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index e0ece74..f430a2b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -571,6 +571,49 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 
 	if (!HeapTupleIsValid(vardata->statsTuple))
 	{
+		/*
+		 * There are no stats for system columns, but for CTID we can estimate
+		 * based on table size.
+		 */
+		if (vardata->var && IsA(vardata->var, Var) &&
+			((Var *) vardata->var)->varattno == SelfItemPointerAttributeNumber)
+		{
+			ItemPointer itemptr;
+			double		block;
+			double		density;
+
+			/* If the relation's empty, we're going to include all of it. */
+			if (vardata->rel->pages == 0)
+				return 1.0;
+
+			itemptr = (ItemPointer) DatumGetPointer(constval);
+			block = ItemPointerGetBlockNumberNoCheck(itemptr);
+
+			/*
+			 * If there's a useable density (tuples per page) estimate, take
+			 * into account the fraction of a block with a lower TID offset.
+			 */
+			density = vardata->rel->tuples / vardata->rel->pages;
+			if (density > 0.0)
+			{
+				OffsetNumber offset = ItemPointerGetOffsetNumberNoCheck(itemptr);
+
+				block += Min(offset / density, 1.0);
+			}
+
+			selec = block / (double) vardata->rel->pages;
+
+			/* For <= and >=, one extra item is included. */
+			if (iseq && vardata->rel->tuples >= 1.0)
+				selec += (1 / vardata->rel->tuples);
+
+			if (isgt)
+				selec = 1.0 - selec;
+
+			CLAMP_PROBABILITY(selec);
+			return selec;
+		}
+
 		/* no stats available, so default result */
 		return DEFAULT_INEQ_SEL;
 	}
@@ -1785,6 +1828,15 @@ nulltestsel(PlannerInfo *root, NullTestType nulltesttype, Node *arg,
 				return (Selectivity) 0; /* keep compiler quiet */
 		}
 	}
+	else if (vardata.var && IsA(vardata.var, Var) &&
+			 ((Var *) vardata.var)->varattno == SelfItemPointerAttributeNumber)
+	{
+		/*
+		 * There are no stats for system columns, but we know CTID is never
+		 * NULL.
+		 */
+		selec = (nulltesttype == IS_NULL) ? 0.0 : 1.0;
+	}
 	else
 	{
 		/*
-- 
2.7.4

v3-0003-Support-backward-scans-over-restricted-ranges-in-hea.patchapplication/octet-stream; name=v3-0003-Support-backward-scans-over-restricted-ranges-in-hea.patchDownload

From cf6fcb76356cbfe0cbe7ddb2faacb8f7cab6c336 Mon Sep 17 00:00:00 2001
From: Edmund Horner <ejrh00@gmail.com>
Date: Fri, 12 Oct 2018 16:28:58 +1300
Subject: [PATCH 3/4] Support backward scans over restricted ranges in heap
 access method

This is required for backward Tid scans.
---
 src/backend/access/heap/heapam.c | 34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb63471..0d736f2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -575,11 +575,22 @@ heapgettup(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_syncscan = false;
+
 			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
+			if (scan->rs_numblocks == InvalidBlockNumber)
+			{
+				/* Scanning the full relation: start just before start block. */
+				if (scan->rs_startblock > 0)
+					page = scan->rs_startblock - 1;
+				else
+					page = scan->rs_nblocks - 1;
+			}
 			else
-				page = scan->rs_nblocks - 1;
+			{
+				/* Scanning a restricted range: start at end of range. */
+				page = scan->rs_startblock + scan->rs_numblocks - 1;
+			}
+
 			heapgetpage(scan, page);
 		}
 		else
@@ -876,11 +887,22 @@ heapgettup_pagemode(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_syncscan = false;
+
 			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
+			if (scan->rs_numblocks == InvalidBlockNumber)
+			{
+				/* Scanning the full relation: start just before start block. */
+				if (scan->rs_startblock > 0)
+					page = scan->rs_startblock - 1;
+				else
+					page = scan->rs_nblocks - 1;
+			}
 			else
-				page = scan->rs_nblocks - 1;
+			{
+				/* Scanning a restricted range: start at end of range. */
+				page = scan->rs_startblock + scan->rs_numblocks - 1;
+			}
+
 			heapgetpage(scan, page);
 		}
 		else
-- 
2.7.4

v3-0004-Tid-Scan-results-are-ordered.patchapplication/octet-stream; name=v3-0004-Tid-Scan-results-are-ordered.patchDownload

From 149892e33897d036f16b697019617d172bbc5b29 Mon Sep 17 00:00:00 2001
From: Edmund Horner <ejrh00@gmail.com>
Date: Fri, 12 Oct 2018 16:29:44 +1300
Subject: [PATCH 4/4] Tid Scan results are ordered

The planner now knows that the results of a Tid path are ordered by ctid, so
queries that rely on that order no longer need a separate sort.  This improves
cases such as "ORDER BY ctid ASC/DESC", as well as "SELECT MIN(ctid)/MAX(ctid)".
Tid Scans can now be Backward.
---
 src/backend/commands/explain.c          |  36 +++++++---
 src/backend/executor/nodeTidscan.c      |   9 +++
 src/backend/nodes/copyfuncs.c           |   1 +
 src/backend/nodes/outfuncs.c            |   2 +
 src/backend/nodes/readfuncs.c           |   1 +
 src/backend/optimizer/path/pathkeys.c   |  19 +++++
 src/backend/optimizer/path/tidpath.c    |  39 ++++++++--
 src/backend/optimizer/plan/createplan.c |   9 ++-
 src/backend/optimizer/util/pathnode.c   |   4 +-
 src/include/nodes/plannodes.h           |   1 +
 src/include/nodes/relation.h            |   1 +
 src/include/optimizer/pathnode.h        |   3 +-
 src/include/optimizer/paths.h           |   3 +
 src/test/regress/expected/tidscan.out   | 123 +++++++++++++++++++++++++++++++-
 src/test/regress/sql/tidscan.sql        |  39 +++++++++-
 15 files changed, 268 insertions(+), 22 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 888d994..5a4305d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -111,6 +111,7 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
 static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static void show_scan_direction(ExplainState *es, ScanDirection direction);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
 static void ExplainScanTarget(Scan *plan, ExplainState *es);
@@ -1270,7 +1271,6 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SeqScan:
 		case T_SampleScan:
 		case T_BitmapHeapScan:
-		case T_TidScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1279,6 +1279,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_WorkTableScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_TidScan:
+			show_scan_direction(es, ((TidScan *) plan)->direction);
+			ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_ForeignScan:
 		case T_CustomScan:
 			if (((Scan *) plan)->scanrelid > 0)
@@ -2892,25 +2896,21 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
 }
 
 /*
- * Add some additional details about an IndexScan or IndexOnlyScan
+ * Show the direction of a scan.
  */
 static void
-ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
-						ExplainState *es)
+show_scan_direction(ExplainState *es, ScanDirection direction)
 {
-	const char *indexname = explain_get_index_name(indexid);
-
 	if (es->format == EXPLAIN_FORMAT_TEXT)
 	{
-		if (ScanDirectionIsBackward(indexorderdir))
+		if (ScanDirectionIsBackward(direction))
 			appendStringInfoString(es->str, " Backward");
-		appendStringInfo(es->str, " using %s", indexname);
 	}
 	else
 	{
 		const char *scandir;
 
-		switch (indexorderdir)
+		switch (direction)
 		{
 			case BackwardScanDirection:
 				scandir = "Backward";
@@ -2926,11 +2926,27 @@ ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 				break;
 		}
 		ExplainPropertyText("Scan Direction", scandir, es);
-		ExplainPropertyText("Index Name", indexname, es);
 	}
 }
 
 /*
+ * Add some additional details about an IndexScan or IndexOnlyScan
+ */
+static void
+ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
+						ExplainState *es)
+{
+	const char *indexname = explain_get_index_name(indexid);
+
+	show_scan_direction(es, indexorderdir);
+
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+		appendStringInfo(es->str, " using %s", indexname);
+	else
+		ExplainPropertyText("Index Name", indexname, es);
+}
+
+/*
  * Show the target of a Scan node
  */
 static void
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 992ad48..27b6e84 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -707,6 +707,15 @@ TidNext(TidScanState *node)
 
 	numRanges = node->tss_NumRanges;
 
+	/* If the plan direction is backward, invert the direction. */
+	if (ScanDirectionIsBackward(((TidScan *) node->ss.ps.plan)->direction))
+	{
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
+	}
+
 	tuple = NULL;
 	for (;;)
 	{
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 4fd5a67..2a929c9 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -580,6 +580,7 @@ _copyTidScan(const TidScan *from)
 	 * copy remainder of node
 	 */
 	COPY_NODE_FIELD(tidquals);
+	COPY_SCALAR_FIELD(direction);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index db5e030..9345b5d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -616,6 +616,7 @@ _outTidScan(StringInfo str, const TidScan *node)
 	_outScanInfo(str, (const Scan *) node);
 
 	WRITE_NODE_FIELD(tidquals);
+	WRITE_ENUM_FIELD(direction, ScanDirection);
 }
 
 static void
@@ -1892,6 +1893,7 @@ _outTidPath(StringInfo str, const TidPath *node)
 	_outPathInfo(str, (const Path *) node);
 
 	WRITE_NODE_FIELD(tidquals);
+	WRITE_ENUM_FIELD(direction, ScanDirection);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index e117867..f936ef1 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1846,6 +1846,7 @@ _readTidScan(void)
 	ReadCommonScan(&local_node->scan);
 
 	READ_NODE_FIELD(tidquals);
+	READ_ENUM_FIELD(direction, ScanDirection);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index ec66cb9..b847151 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -18,6 +18,9 @@
 #include "postgres.h"
 
 #include "access/stratnum.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/plannodes.h"
@@ -848,6 +851,22 @@ build_join_pathkeys(PlannerInfo *root,
 	return truncate_useless_pathkeys(root, joinrel, outer_pathkeys);
 }
 
+/*
+ * build_tidscan_pathkeys
+ *	  Build the path keys corresponding to ORDER BY ctid ASC|DESC.
+ */
+List *
+build_tidscan_pathkeys(PlannerInfo *root,
+					   RelOptInfo *rel,
+					   ScanDirection direction)
+{
+	int			opno = (direction == ForwardScanDirection) ? TIDLessOperator : TIDGreaterOperator;
+	Var		   *varexpr = makeVar(rel->relid, SelfItemPointerAttributeNumber, TIDOID, -1, InvalidOid, 0);
+	List	   *pathkeys = build_expression_pathkey(root, (Expr *) varexpr, NULL, opno, rel->relids, true);
+
+	return pathkeys;
+}
+
 /****************************************************************************
  *		PATHKEYS AND SORT CLAUSES
  ****************************************************************************/
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index da7a6ff..ab86934 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -332,12 +332,16 @@ TidQualFromBaseRestrictinfo(RelOptInfo *rel)
  * create_tidscan_paths
  *	  Create paths corresponding to direct TID scans of the given rel.
  *
+ *	  Path keys and direction will be set on the scans if it looks useful.
+ *
  *	  Candidate paths are added to the rel's pathlist (using add_path).
  */
 void
 create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 {
 	Relids		required_outer;
+	List	   *pathkeys = NULL;
+	ScanDirection direction = ForwardScanDirection;
 	List	   *tidquals;
 
 	/*
@@ -347,10 +351,37 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 	 */
 	required_outer = rel->lateral_relids;
 
+	/*
+	 * Try to determine the best scan direction and create some useful
+	 * pathkeys.
+	 */
+	if (has_useful_pathkeys(root, rel))
+	{
+		/*
+		 * Build path keys corresponding to ORDER BY ctid ASC, and check
+		 * whether they will be useful for this scan.  If not, build path keys
+		 * for DESC, and try that; set the direction to BackwardScanDirection
+		 * if so.  If neither of them will be useful, no path keys will be
+		 * set.
+		 */
+		pathkeys = build_tidscan_pathkeys(root, rel, ForwardScanDirection);
+		if (!pathkeys_contained_in(pathkeys, root->query_pathkeys))
+		{
+			pathkeys = build_tidscan_pathkeys(root, rel, BackwardScanDirection);
+			if (pathkeys_contained_in(pathkeys, root->query_pathkeys))
+				direction = BackwardScanDirection;
+			else
+				pathkeys = NULL;
+		}
+	}
+
 	tidquals = TidQualFromBaseRestrictinfo(rel);
 
-	/* If there are tidquals, then it's worth generating a tidscan path. */
-	if (tidquals)
-		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals,
-												   required_outer));
+	/*
+	 * If there are tidquals or some useful pathkeys were found, then it's
+	 * worth generating a tidscan path.
+	 */
+	if (tidquals || pathkeys)
+		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals, pathkeys,
+												   direction, required_outer));
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ab6e08a..e4e3ec5 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -184,7 +184,7 @@ static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 					 List *bitmapqualorig,
 					 Index scanrelid);
 static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
-			 List *tidquals);
+			 List *tidquals, ScanDirection direction);
 static SubqueryScan *make_subqueryscan(List *qptlist,
 				  List *qpqual,
 				  Index scanrelid,
@@ -3109,7 +3109,8 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	scan_plan = make_tidscan(tlist,
 							 scan_clauses,
 							 scan_relid,
-							 tidquals
+							 tidquals,
+							 best_path->direction
 		);
 
 	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
@@ -5171,7 +5172,8 @@ static TidScan *
 make_tidscan(List *qptlist,
 			 List *qpqual,
 			 Index scanrelid,
-			 List *tidquals)
+			 List *tidquals,
+			 ScanDirection direction)
 {
 	TidScan    *node = makeNode(TidScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5182,6 +5184,7 @@ make_tidscan(List *qptlist,
 	plan->righttree = NULL;
 	node->scan.scanrelid = scanrelid;
 	node->tidquals = tidquals;
+	node->direction = direction;
 
 	return node;
 }
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d50d86b..31645c4 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1186,6 +1186,7 @@ create_bitmap_or_path(PlannerInfo *root,
  */
 TidPath *
 create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
+					List *pathkeys, ScanDirection direction,
 					Relids required_outer)
 {
 	TidPath    *pathnode = makeNode(TidPath);
@@ -1198,9 +1199,10 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 	pathnode->path.parallel_aware = false;
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = 0;
-	pathnode->path.pathkeys = NIL;	/* always unordered */
+	pathnode->path.pathkeys = pathkeys;
 
 	pathnode->tidquals = tidquals;
+	pathnode->direction = direction;
 
 	cost_tidscan(&pathnode->path, root, rel, tidquals,
 				 pathnode->path.param_info);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 26e1c40..201d315 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -485,6 +485,7 @@ typedef struct TidScan
 {
 	Scan		scan;
 	List	   *tidquals;		/* qual(s) involving CTID = something */
+	ScanDirection direction;
 } TidScan;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 20a5edb..484eb63 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1242,6 +1242,7 @@ typedef struct TidPath
 {
 	Path		path;
 	List	   *tidquals;
+	ScanDirection direction;
 } TidPath;
 
 /*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 81abcf5..d6dda47 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,7 +63,8 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  RelOptInfo *rel,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
-					List *tidquals, Relids required_outer);
+					List *tidquals, List *pathkeys, ScanDirection direction,
+					Relids required_outer);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 				   List *subpaths, List *partial_subpaths,
 				   Relids required_outer,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde30..9d0699e 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -211,6 +211,9 @@ extern List *build_join_pathkeys(PlannerInfo *root,
 					RelOptInfo *joinrel,
 					JoinType jointype,
 					List *outer_pathkeys);
+extern List *build_tidscan_pathkeys(PlannerInfo *root,
+					   RelOptInfo *rel,
+					   ScanDirection direction);
 extern List *make_pathkeys_for_sortclauses(PlannerInfo *root,
 							  List *sortclauses,
 							  List *tlist);
diff --git a/src/test/regress/expected/tidscan.out b/src/test/regress/expected/tidscan.out
index 8c137ff..ff9f6f6 100644
--- a/src/test/regress/expected/tidscan.out
+++ b/src/test/regress/expected/tidscan.out
@@ -176,7 +176,6 @@ EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
 UPDATE tidscan SET id = -id WHERE CURRENT OF c RETURNING *;
 ERROR:  cursor "c" is not positioned on a row
 ROLLBACK;
-DROP TABLE tidscan;
 -- tests for tidrangescans
 CREATE TABLE tidrangescan(id integer, data text);
 INSERT INTO tidrangescan SELECT i,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' FROM generate_series(1,1000) AS s(i);
@@ -416,3 +415,125 @@ SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';
 ------+------
 (0 rows)
 
+-- check that ordering on a tidscan doesn't require a sort
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ Tid Scan on tidscan
+   TID Cond: (ctid = ANY ('{"(0,2)","(0,1)","(0,3)"}'::tid[]))
+(2 rows)
+
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+ ctid  | id 
+-------+----
+ (0,1) |  1
+ (0,2) |  2
+ (0,3) |  3
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ Tid Scan Backward on tidscan
+   TID Cond: (ctid = ANY ('{"(0,2)","(0,1)","(0,3)"}'::tid[]))
+(2 rows)
+
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+ ctid  | id 
+-------+----
+ (0,3) |  3
+ (0,2) |  2
+ (0,1) |  1
+(3 rows)
+
+-- ordering with no quals should use tid range scan
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan ORDER BY ctid ASC;
+        QUERY PLAN        
+--------------------------
+ Tid Scan on tidrangescan
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan ORDER BY ctid DESC;
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan Backward on tidrangescan
+(1 row)
+
+-- min/max
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan;
+                 QUERY PLAN                 
+--------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan on tidrangescan
+                 Filter: (ctid IS NOT NULL)
+(5 rows)
+
+SELECT MIN(ctid) FROM tidrangescan;
+  min  
+-------
+ (0,1)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan Backward on tidrangescan
+                 Filter: (ctid IS NOT NULL)
+(5 rows)
+
+SELECT MAX(ctid) FROM tidrangescan;
+  max   
+--------
+ (9,10)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+                   QUERY PLAN                    
+-------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan on tidrangescan
+                 TID Cond: (ctid > '(5,0)'::tid)
+                 Filter: (ctid IS NOT NULL)
+(6 rows)
+
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+  min  
+-------
+ (5,1)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+                   QUERY PLAN                    
+-------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan Backward on tidrangescan
+                 TID Cond: (ctid < '(5,0)'::tid)
+                 Filter: (ctid IS NOT NULL)
+(6 rows)
+
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+  max   
+--------
+ (4,10)
+(1 row)
+
+-- clean up
+DROP TABLE tidscan;
+DROP TABLE tidrangescan;
diff --git a/src/test/regress/sql/tidscan.sql b/src/test/regress/sql/tidscan.sql
index e8d266b..1eacca3 100644
--- a/src/test/regress/sql/tidscan.sql
+++ b/src/test/regress/sql/tidscan.sql
@@ -63,8 +63,6 @@ EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
 UPDATE tidscan SET id = -id WHERE CURRENT OF c RETURNING *;
 ROLLBACK;
 
-DROP TABLE tidscan;
-
 -- tests for tidrangescans
 
 CREATE TABLE tidrangescan(id integer, data text);
@@ -136,3 +134,40 @@ SELECT ctid, data FROM tidrangescan_empty WHERE ctid < '(1, 0)';
 EXPLAIN (COSTS OFF)
 SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';
 SELECT ctid, data FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+
+-- check that ordering on a tidscan doesn't require a sort
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+SELECT ctid, * FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+
+-- ordering with no quals should use tid range scan
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan ORDER BY ctid ASC;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid, data FROM tidrangescan ORDER BY ctid DESC;
+
+-- min/max
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan;
+SELECT MIN(ctid) FROM tidrangescan;
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan;
+SELECT MAX(ctid) FROM tidrangescan;
+
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+
+-- clean up
+DROP TABLE tidscan;
+DROP TABLE tidrangescan;
-- 
2.7.4

#13

David Rowley

david.rowley@2ndquadrant.com

about 7 years ago

In reply to: Edmund Horner (#12)

Re: Tid scan improvements

On 4 November 2018 at 17:20, Edmund Horner <ejrh00@gmail.com> wrote:

I have managed to split my changes into 4 patches:

v3-0001-Add-selectivity-and-nullness-estimates-for-the-ItemP.patch
v3-0002-Support-range-quals-in-Tid-Scan.patch
v3-0003-Support-backward-scans-over-restricted-ranges-in-hea.patch
v3-0004-Tid-Scan-results-are-ordered.patch

Hi,

I've been looking over 0001 to 0003. I ran out of steam before 0004.

I like the design of the new patch. From what I threw so far at the
selectivity estimation code, it seems pretty good. I also quite like
the design in nodeTidscan.c for range scans.

I didn't quite manage to wrap my head around the code that removes
redundant quals from the tidquals. For example, with:

postgres=# explain select * from t1 where ctid <= '(0,10)' and a = 0;
QUERY PLAN
--------------------------------------------------
Tid Scan on t1 (cost=0.00..3.19 rows=1 width=4)
TID Cond: (ctid <= '(0,10)'::tid)
Filter: (a = 0)
(3 rows)

and:

postgres=# explain select * from t1 where ctid <= '(0,10)' or a = 20
and ctid >= '(0,0)';
QUERY PLAN
------------------------------------------------------------------------------
Tid Scan on t1 (cost=0.01..176.18 rows=12 width=4)
TID Cond: ((ctid <= '(0,10)'::tid) OR (ctid >= '(0,0)'::tid))
Filter: ((ctid <= '(0,10)'::tid) OR ((a = 20) AND (ctid >= '(0,0)'::tid)))
(3 rows)

I understand why the 2nd query didn't remove the ctid quals from the
filter, and I understand why the first query could. I just didn't
manage to convince myself that the code behaves correctly for all
cases.

During my pass through 0001, 0002 and 0003 I noted the following:

0001:

1. I see a few instances of:

#define DatumGetItemPointer(X) ((ItemPointer) DatumGetPointer(X))
#define ItemPointerGetDatum(X) PointerGetDatum(X)

in both tid.c and ginfuncs.c, and I see you have:

+ itemptr = (ItemPointer) DatumGetPointer(constval);

Do you think it would be worth moving the macros out of tid.c and
ginfuncs.c into postgres.h and use that macro instead?

(I see the code in this file already did this, so it might not matter
about this)

0002:

2. In TidCompoundRangeQualFromExpr() rlst is not really needed. You
can just return MakeTidRangeQuals(found_quals); or return NIL.

3. Can you explain why this only needs to take place when list_length() == 1?

/*
* In the case of a compound qual such as "ctid > ? AND ctid < ? AND ...",
* the various parts will have come from different RestrictInfos. So
* remove each part separately.
*/
if (list_length(tidquals) == 1)
{
Node *qual = linitial(tidquals);

if (and_clause(qual))
{
BoolExpr *and_qual = ((BoolExpr *) qual);

scan_clauses = list_difference(scan_clauses, and_qual->args);
}
}

4. Accidental change?

- tidquals);
+ tidquals
+ );

5. Shouldn't this comment get changed?

- * NumTids    number of tids in this scan
+ * NumRanges    number of tids in this scan

6. There's no longer a field named NumTids

- * TidList    evaluated item pointers (array of size NumTids)
+ * TidRanges    evaluated item pointers (array of size NumTids)

7. The following field is not documented in TidScanState:

+ bool tss_inScan;

8. Can you name this exprtype instead?

+ TidExprType type; /* type of op */

"type" is used by Node types to indicate their type.

9. It would be neater this:

if (expr->opno == TIDLessOperator || expr->opno == TIDLessEqOperator)
tidopexpr->type = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
else if (expr->opno == TIDGreaterOperator || expr->opno == TIDGreaterEqOperator)
tidopexpr->type = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
else
tidopexpr->type = TIDEXPR_EQ;

tidopexpr->exprstate = exprstate;

tidopexpr->inclusive = expr->opno == TIDLessEqOperator || expr->opno
== TIDGreaterEqOperator;

as a switch:

switch (expr->opno)
{
case TIDLessEqOperator:
tidopexpr->inclusive = true;
/* fall through */
case TIDLessOperator:
tidopexpr->type = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
break;
case TIDGreaterEqOperator:
tidopexpr->inclusive = true;
/* fall through */
case TIDGreaterOperator:
tidopexpr->type = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
break;
default:
tidopexpr->type = TIDEXPR_EQ;
}
tidopexpr->exprstate = exprstate;

10. I don't quite understand this comment:

+ * Create an ExprState corresponding to the value part of a TID comparison,
+ * and wrap it in a TidOpExpr.  Set the type and inclusivity of the TidOpExpr
+ * appropriately, depending on the operator and position of the its arguments.

I don't quite see how the code sets the inclusivity depending on the
position of the arguments.

Maybe the comment should be:

+ * For the given 'expr' build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.

11. ScalarArrayOpExpr are commonly named "saop":

+static TidOpExpr *
+MakeTidScalarArrayOpExpr(ScalarArrayOpExpr *saex, TidScanState *tidstate)

(Though I see it's saex in other places in that file, so might not matter...)

12. You need to code SetTidLowerBound() with similar wraparound logic
you have in SetTidUpperBound().

It's perhaps unlikely, but the following shows incorrect results.

postgres=# select ctid from t1 where ctid > '(0,65535)' limit 1;
ctid
-------
(0,1)
(1 row)

-- the following is fine.

Time: 1.652 ms
postgres=# select ctid from t1 where ctid >= '(0,65535)' limit 1;
ctid
-------
(1,1)
(1 row)

Likely you can just upgrade to the next block when the offset is >
MaxOffsetNumber.

13. It looks like the previous code didn't make the assumption you're making in:

+ * A current-of TidExpr only exists by itself, and we should
+ * already have allocated a tidList entry for it.  We don't
+ * need to check whether the tidList array needs to be
+ * resized.

I'm not sure if it's a good idea to lock the executor code into what
the grammar currently says is possible. The previous code didn't
assume that.

14. we pass 'false' to what?

+ * save the tuple and the buffer returned to us by the access methods in
+ * our scan tuple slot and return the slot.  Note: we pass 'false' because
+ * tuples returned by heap_getnext() are pointers onto disk pages and were
+ * not created with palloc() and so should not be pfree()'d.  Note also
+ * that ExecStoreHeapTuple will increment the refcount of the buffer; the
+ * refcount will not be dropped until the tuple table slot is cleared.
  */
- return ExecClearTuple(slot);
+ if (tuple)
+ ExecStoreBufferHeapTuple(tuple, /* tuple to store */
+ slot, /* slot to store in */
+ scandesc->rs_cbuf); /* buffer associated
+ * with this tuple */
+ else
+ ExecClearTuple(slot);
+
+ return slot;

0003:

Saw nothing wrong:

0004:

Not yet reviewed.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#14

Alvaro Herrera

alvherre@2ndquadrant.com

about 7 years ago

In reply to: David Rowley (#13)

Re: Tid scan improvements

On 2018-Nov-06, David Rowley wrote:

14. we pass 'false' to what?

+ * save the tuple and the buffer returned to us by the access methods in
+ * our scan tuple slot and return the slot.  Note: we pass 'false' because
+ * tuples returned by heap_getnext() are pointers onto disk pages and were
+ * not created with palloc() and so should not be pfree()'d.  Note also
+ * that ExecStoreHeapTuple will increment the refcount of the buffer; the
+ * refcount will not be dropped until the tuple table slot is cleared.
*/

Seems a mistake stemming from 29c94e03c7d0 ...

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#15

Edmund Horner

ejrh00@gmail.com

about 7 years ago

In reply to: Alvaro Herrera (#14)

1 attachment(s)

Re: Tid scan improvements

On Tue, 6 Nov 2018 at 16:52, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2018-Nov-06, David Rowley wrote:

14. we pass 'false' to what?

+ * save the tuple and the buffer returned to us by the access methods in
+ * our scan tuple slot and return the slot.  Note: we pass 'false' because
+ * tuples returned by heap_getnext() are pointers onto disk pages and were
+ * not created with palloc() and so should not be pfree()'d.  Note also
+ * that ExecStoreHeapTuple will increment the refcount of the buffer; the
+ * refcount will not be dropped until the tuple table slot is cleared.
*/

Seems a mistake stemming from 29c94e03c7d0 ...

Yep -- I copied that bit from nodeSeqscan.c. Some of the notes were
removed in that change, but nodeSeqscan.c and nodeIndexscan.c still
have them.

I made a little patch to remove them.

Attachments:

remove-obsolete-ExecStoreTuple-notes.patchapplication/octet-stream; name=remove-obsolete-ExecStoreTuple-notes.patchDownload

diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index ba7821b..d955200 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -137,8 +137,6 @@ IndexNext(IndexScanState *node)
 
 		/*
 		 * Store the scanned tuple in the scan tuple slot of the scan state.
-		 * Note: we pass 'false' because tuples returned by amgetnext are
-		 * pointers onto disk pages and must not be pfree()'d.
 		 */
 		ExecStoreBufferHeapTuple(tuple, /* tuple to store */
 								 slot,	/* slot to store in */
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 79729db..e3506eb 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -81,10 +81,8 @@ SeqNext(SeqScanState *node)
 
 	/*
 	 * save the tuple and the buffer returned to us by the access methods in
-	 * our scan tuple slot and return the slot.  Note: we pass 'false' because
-	 * tuples returned by heap_getnext() are pointers onto disk pages and were
-	 * not created with palloc() and so should not be pfree()'d.  Note also
-	 * that ExecStoreHeapTuple will increment the refcount of the buffer; the
+	 * our scan tuple slot and return the slot.  Note also that
+	 * ExecStoreHeapTuple will increment the refcount of the buffer; the
 	 * refcount will not be dropped until the tuple table slot is cleared.
 	 */
 	if (tuple)

#16

Edmund Horner

ejrh00@gmail.com

about 7 years ago

In reply to: David Rowley (#13)

Re: Tid scan improvements

On Tue, 6 Nov 2018 at 16:40, David Rowley <david.rowley@2ndquadrant.com> wrote:

I've been looking over 0001 to 0003. I ran out of steam before 0004.

Hi David, thanks for another big review with lots of improvements.

I like the design of the new patch. From what I threw so far at the
selectivity estimation code, it seems pretty good. I also quite like
the design in nodeTidscan.c for range scans.

I didn't quite manage to wrap my head around the code that removes
redundant quals from the tidquals. For example, with:

postgres=# explain select * from t1 where ctid <= '(0,10)' and a = 0;
QUERY PLAN
--------------------------------------------------
Tid Scan on t1 (cost=0.00..3.19 rows=1 width=4)
TID Cond: (ctid <= '(0,10)'::tid)
Filter: (a = 0)
(3 rows)

and:

postgres=# explain select * from t1 where ctid <= '(0,10)' or a = 20
and ctid >= '(0,0)';
QUERY PLAN
------------------------------------------------------------------------------
Tid Scan on t1 (cost=0.01..176.18 rows=12 width=4)
TID Cond: ((ctid <= '(0,10)'::tid) OR (ctid >= '(0,0)'::tid))
Filter: ((ctid <= '(0,10)'::tid) OR ((a = 20) AND (ctid >= '(0,0)'::tid)))
(3 rows)

I understand why the 2nd query didn't remove the ctid quals from the
filter, and I understand why the first query could. I just didn't
manage to convince myself that the code behaves correctly for all
cases.

I agree it's not obvious.

1. We extract a set of tidquals that can be directly implemented by
the Tid scan. This set is of the form: "(CTID op ? AND ...) OR
(...)" (with some limitations).
2. If they happened to come verbatim from the original RestrictInfos,
then they will be found in scan_clauses, and we can remove them.
3. If they're not verbatim, i.e. the original RestrictInfos have
additional criteria that the Tid scan can't use, then tidquals won't
match anything in scan_clauses, and hence scan_clauses will be
unchanged.
4. We do a bit of extra work for the common and useful case of "(CTID
op ? AND ...)". Since the top-level operation of the input quals is
an AND, it will typically be split into multiple RestrictInfo items.
We remove each part from scan_clauses.

1. I see a few instances of:

#define DatumGetItemPointer(X) ((ItemPointer) DatumGetPointer(X))
#define ItemPointerGetDatum(X) PointerGetDatum(X)

in both tid.c and ginfuncs.c, and I see you have:

+ itemptr = (ItemPointer) DatumGetPointer(constval);

Do you think it would be worth moving the macros out of tid.c and
ginfuncs.c into postgres.h and use that macro instead?

(I see the code in this file already did this, so it might not matter
about this)

I'm not sure about this one - - I think it's better as a separate
patch, since we'd also change ginfuncs.c. I have left it alone for
now.

2. In TidCompoundRangeQualFromExpr() rlst is not really needed. You
can just return MakeTidRangeQuals(found_quals); or return NIL.

Yup, gone.

3. Can you explain why this only needs to take place when list_length() == 1?

/*
* In the case of a compound qual such as "ctid > ? AND ctid < ? AND ...",
* the various parts will have come from different RestrictInfos. So
* remove each part separately.
*/
...

I've tried to improve the comment.

4. Accidental change?
- tidquals);
+ tidquals
+ );
5. Shouldn't this comment get changed?
- * NumTids    number of tids in this scan
+ * NumRanges    number of tids in this scan
6. There's no longer a field named NumTids
- * TidList    evaluated item pointers (array of size NumTids)
+ * TidRanges    evaluated item pointers (array of size NumTids)
7. The following field is not documented in TidScanState:

+ bool tss_inScan;

8. Can you name this exprtype instead?

+ TidExprType type; /* type of op */

"type" is used by Node types to indicate their type.

Yup, yup, yup, yup, yup.

9. It would be neater this:

if (expr->opno == TIDLessOperator || expr->opno == TIDLessEqOperator)
tidopexpr->type = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
else if (expr->opno == TIDGreaterOperator || expr->opno == TIDGreaterEqOperator)
tidopexpr->type = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
else
tidopexpr->type = TIDEXPR_EQ;

tidopexpr->exprstate = exprstate;

tidopexpr->inclusive = expr->opno == TIDLessEqOperator || expr->opno
== TIDGreaterEqOperator;

as a switch: ...

Yup, I think the switch is a bit nicer.

10. I don't quite understand this comment:
+ * Create an ExprState corresponding to the value part of a TID comparison,
+ * and wrap it in a TidOpExpr.  Set the type and inclusivity of the TidOpExpr
+ * appropriately, depending on the operator and position of the its arguments.
I don't quite see how the code sets the inclusivity depending on the
position of the arguments.

Maybe the comment should be:
+ * For the given 'expr' build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.

I'll go with your wording.

11. ScalarArrayOpExpr are commonly named "saop": ...

Yup.

12. You need to code SetTidLowerBound() with similar wraparound logic
you have in SetTidUpperBound().

It's perhaps unlikely, but the following shows incorrect results.

postgres=# select ctid from t1 where ctid > '(0,65535)' limit 1;
ctid
-------
(0,1)
(1 row)

-- the following is fine.

Time: 1.652 ms
postgres=# select ctid from t1 where ctid >= '(0,65535)' limit 1;
ctid
-------
(1,1)
(1 row)

Likely you can just upgrade to the next block when the offset is >
MaxOffsetNumber.

This is important, thanks for spotting it.

I've tried to add some code to handle this case (and also that of
"ctid < '(0,0)'") with a couple of tests too.

13. It looks like the previous code didn't make the assumption you're making in:
+ * A current-of TidExpr only exists by itself, and we should
+ * already have allocated a tidList entry for it.  We don't
+ * need to check whether the tidList array needs to be
+ * resized.
I'm not sure if it's a good idea to lock the executor code into what
the grammar currently says is possible. The previous code didn't
assume that.

Fair enough, I've restored the previous code without the assumption.

14. we pass 'false' to what?

Obsolete comment (see reply to Alvaro).

I've applied most of these, and I'll post a new patch soon.

#17

Edmund Horner

ejrh00@gmail.com

about 7 years ago

In reply to: Edmund Horner (#16)

4 attachment(s)

Re: Tid scan improvements

Hi, here's the new patch(s).

Mostly the same, but trying to address your comments from earlier as
well as clean up a few other things I noticed.

Cheers,
Edmund

Show quoted text

On Fri, 9 Nov 2018 at 15:01, Edmund Horner <ejrh00@gmail.com> wrote:

On Tue, 6 Nov 2018 at 16:40, David Rowley <david.rowley@2ndquadrant.com> wrote:

I've been looking over 0001 to 0003. I ran out of steam before 0004.

Hi David, thanks for another big review with lots of improvements.

I like the design of the new patch. From what I threw so far at the
selectivity estimation code, it seems pretty good. I also quite like
the design in nodeTidscan.c for range scans.

I didn't quite manage to wrap my head around the code that removes
redundant quals from the tidquals. For example, with:

postgres=# explain select * from t1 where ctid <= '(0,10)' and a = 0;
QUERY PLAN
--------------------------------------------------
Tid Scan on t1 (cost=0.00..3.19 rows=1 width=4)
TID Cond: (ctid <= '(0,10)'::tid)
Filter: (a = 0)
(3 rows)

and:

postgres=# explain select * from t1 where ctid <= '(0,10)' or a = 20
and ctid >= '(0,0)';
QUERY PLAN
------------------------------------------------------------------------------
Tid Scan on t1 (cost=0.01..176.18 rows=12 width=4)
TID Cond: ((ctid <= '(0,10)'::tid) OR (ctid >= '(0,0)'::tid))
Filter: ((ctid <= '(0,10)'::tid) OR ((a = 20) AND (ctid >= '(0,0)'::tid)))
(3 rows)

I understand why the 2nd query didn't remove the ctid quals from the
filter, and I understand why the first query could. I just didn't
manage to convince myself that the code behaves correctly for all
cases.

I agree it's not obvious.

1. We extract a set of tidquals that can be directly implemented by
the Tid scan. This set is of the form: "(CTID op ? AND ...) OR
(...)" (with some limitations).
2. If they happened to come verbatim from the original RestrictInfos,
then they will be found in scan_clauses, and we can remove them.
3. If they're not verbatim, i.e. the original RestrictInfos have
additional criteria that the Tid scan can't use, then tidquals won't
match anything in scan_clauses, and hence scan_clauses will be
unchanged.
4. We do a bit of extra work for the common and useful case of "(CTID
op ? AND ...)". Since the top-level operation of the input quals is
an AND, it will typically be split into multiple RestrictInfo items.
We remove each part from scan_clauses.

1. I see a few instances of:

#define DatumGetItemPointer(X) ((ItemPointer) DatumGetPointer(X))
#define ItemPointerGetDatum(X) PointerGetDatum(X)

in both tid.c and ginfuncs.c, and I see you have:

+ itemptr = (ItemPointer) DatumGetPointer(constval);

Do you think it would be worth moving the macros out of tid.c and
ginfuncs.c into postgres.h and use that macro instead?

(I see the code in this file already did this, so it might not matter
about this)

I'm not sure about this one - - I think it's better as a separate
patch, since we'd also change ginfuncs.c. I have left it alone for
now.

2. In TidCompoundRangeQualFromExpr() rlst is not really needed. You
can just return MakeTidRangeQuals(found_quals); or return NIL.

Yup, gone.

3. Can you explain why this only needs to take place when list_length() == 1?

/*
* In the case of a compound qual such as "ctid > ? AND ctid < ? AND ...",
* the various parts will have come from different RestrictInfos. So
* remove each part separately.
*/
...

I've tried to improve the comment.
4. Accidental change?
- tidquals);
+ tidquals
+ );
5. Shouldn't this comment get changed?
- * NumTids    number of tids in this scan
+ * NumRanges    number of tids in this scan
6. There's no longer a field named NumTids
- * TidList    evaluated item pointers (array of size NumTids)
+ * TidRanges    evaluated item pointers (array of size NumTids)
7. The following field is not documented in TidScanState:

+ bool tss_inScan;

8. Can you name this exprtype instead?

+ TidExprType type; /* type of op */

"type" is used by Node types to indicate their type.
Yup, yup, yup, yup, yup.

9. It would be neater this:

if (expr->opno == TIDLessOperator || expr->opno == TIDLessEqOperator)
tidopexpr->type = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
else if (expr->opno == TIDGreaterOperator || expr->opno == TIDGreaterEqOperator)
tidopexpr->type = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
else
tidopexpr->type = TIDEXPR_EQ;

tidopexpr->exprstate = exprstate;

tidopexpr->inclusive = expr->opno == TIDLessEqOperator || expr->opno
== TIDGreaterEqOperator;

as a switch: ...

Yup, I think the switch is a bit nicer.
10. I don't quite understand this comment:
+ * Create an ExprState corresponding to the value part of a TID comparison,
+ * and wrap it in a TidOpExpr.  Set the type and inclusivity of the TidOpExpr
+ * appropriately, depending on the operator and position of the its arguments.
I don't quite see how the code sets the inclusivity depending on the
position of the arguments.

Maybe the comment should be:
+ * For the given 'expr' build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.
I'll go with your wording.

11. ScalarArrayOpExpr are commonly named "saop": ...

Yup.

12. You need to code SetTidLowerBound() with similar wraparound logic
you have in SetTidUpperBound().

It's perhaps unlikely, but the following shows incorrect results.

postgres=# select ctid from t1 where ctid > '(0,65535)' limit 1;
ctid
-------
(0,1)
(1 row)

-- the following is fine.

Time: 1.652 ms
postgres=# select ctid from t1 where ctid >= '(0,65535)' limit 1;
ctid
-------
(1,1)
(1 row)

Likely you can just upgrade to the next block when the offset is >
MaxOffsetNumber.

This is important, thanks for spotting it.

I've tried to add some code to handle this case (and also that of
"ctid < '(0,0)'") with a couple of tests too.
13. It looks like the previous code didn't make the assumption you're making in:
+ * A current-of TidExpr only exists by itself, and we should
+ * already have allocated a tidList entry for it.  We don't
+ * need to check whether the tidList array needs to be
+ * resized.
I'm not sure if it's a good idea to lock the executor code into what
the grammar currently says is possible. The previous code didn't
assume that.
Fair enough, I've restored the previous code without the assumption.

14. we pass 'false' to what?

Obsolete comment (see reply to Alvaro).

I've applied most of these, and I'll post a new patch soon.

Attachments:

v4-0001-Add-selectivity-and-nullness-estimates-for-CTID-syst.patchapplication/octet-stream; name=v4-0001-Add-selectivity-and-nullness-estimates-for-CTID-syst.patchDownload

From ec62324d7071d1a61d9fd0c5ad64f88eda67fb04 Mon Sep 17 00:00:00 2001
From: Edmund Horner <ejrh00@gmail.com>
Date: Fri, 12 Oct 2018 13:36:24 +1300
Subject: [PATCH 1/4] Add selectivity and nullness estimates for CTID system
 variables

Previously, estimates for ItemPointer range quals, such as "ctid <= '(5,7)'",
resorted to the default values of 0.33 for range selectivity, and 0.005 for
nullness, although there was special-case handling for equality quals like
"ctid = (5,7)", which used the appropriate selectivity for distinct items.

This change uses the relation size to estimate the selectivity of a range qual,
and also uses a nullness estimate of 0 for ctid, since it is never NULL.
---
 src/backend/utils/adt/selfuncs.c | 52 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index e0ece74..f430a2b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -571,6 +571,49 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 
 	if (!HeapTupleIsValid(vardata->statsTuple))
 	{
+		/*
+		 * There are no stats for system columns, but for CTID we can estimate
+		 * based on table size.
+		 */
+		if (vardata->var && IsA(vardata->var, Var) &&
+			((Var *) vardata->var)->varattno == SelfItemPointerAttributeNumber)
+		{
+			ItemPointer itemptr;
+			double		block;
+			double		density;
+
+			/* If the relation's empty, we're going to include all of it. */
+			if (vardata->rel->pages == 0)
+				return 1.0;
+
+			itemptr = (ItemPointer) DatumGetPointer(constval);
+			block = ItemPointerGetBlockNumberNoCheck(itemptr);
+
+			/*
+			 * If there's a useable density (tuples per page) estimate, take
+			 * into account the fraction of a block with a lower TID offset.
+			 */
+			density = vardata->rel->tuples / vardata->rel->pages;
+			if (density > 0.0)
+			{
+				OffsetNumber offset = ItemPointerGetOffsetNumberNoCheck(itemptr);
+
+				block += Min(offset / density, 1.0);
+			}
+
+			selec = block / (double) vardata->rel->pages;
+
+			/* For <= and >=, one extra item is included. */
+			if (iseq && vardata->rel->tuples >= 1.0)
+				selec += (1 / vardata->rel->tuples);
+
+			if (isgt)
+				selec = 1.0 - selec;
+
+			CLAMP_PROBABILITY(selec);
+			return selec;
+		}
+
 		/* no stats available, so default result */
 		return DEFAULT_INEQ_SEL;
 	}
@@ -1785,6 +1828,15 @@ nulltestsel(PlannerInfo *root, NullTestType nulltesttype, Node *arg,
 				return (Selectivity) 0; /* keep compiler quiet */
 		}
 	}
+	else if (vardata.var && IsA(vardata.var, Var) &&
+			 ((Var *) vardata.var)->varattno == SelfItemPointerAttributeNumber)
+	{
+		/*
+		 * There are no stats for system columns, but we know CTID is never
+		 * NULL.
+		 */
+		selec = (nulltesttype == IS_NULL) ? 0.0 : 1.0;
+	}
 	else
 	{
 		/*
-- 
2.7.4

v4-0003-Support-backward-scans-over-restricted-ranges-in-hea.patchapplication/octet-stream; name=v4-0003-Support-backward-scans-over-restricted-ranges-in-hea.patchDownload

From 5b29a4b6353d35d30cabbc67fe9e0b66b24d93a3 Mon Sep 17 00:00:00 2001
From: Edmund Horner <ejrh00@gmail.com>
Date: Fri, 12 Oct 2018 16:28:58 +1300
Subject: [PATCH 3/4] Support backward scans over restricted ranges in heap
 access method

This is required for backward Tid scans.
---
 src/backend/access/heap/heapam.c | 34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb63471..0d736f2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -575,11 +575,22 @@ heapgettup(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_syncscan = false;
+
 			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
+			if (scan->rs_numblocks == InvalidBlockNumber)
+			{
+				/* Scanning the full relation: start just before start block. */
+				if (scan->rs_startblock > 0)
+					page = scan->rs_startblock - 1;
+				else
+					page = scan->rs_nblocks - 1;
+			}
 			else
-				page = scan->rs_nblocks - 1;
+			{
+				/* Scanning a restricted range: start at end of range. */
+				page = scan->rs_startblock + scan->rs_numblocks - 1;
+			}
+
 			heapgetpage(scan, page);
 		}
 		else
@@ -876,11 +887,22 @@ heapgettup_pagemode(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_syncscan = false;
+
 			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
+			if (scan->rs_numblocks == InvalidBlockNumber)
+			{
+				/* Scanning the full relation: start just before start block. */
+				if (scan->rs_startblock > 0)
+					page = scan->rs_startblock - 1;
+				else
+					page = scan->rs_nblocks - 1;
+			}
 			else
-				page = scan->rs_nblocks - 1;
+			{
+				/* Scanning a restricted range: start at end of range. */
+				page = scan->rs_startblock + scan->rs_numblocks - 1;
+			}
+
 			heapgetpage(scan, page);
 		}
 		else
-- 
2.7.4

v4-0002-Support-range-quals-in-Tid-Scan.patchapplication/octet-stream; name=v4-0002-Support-range-quals-in-Tid-Scan.patchDownload

From a9d74e3bc53df673f9f9198e6432b03130f420af Mon Sep 17 00:00:00 2001
From: Edmund Horner <ejrh00@gmail.com>
Date: Fri, 12 Oct 2018 16:28:19 +1300
Subject: [PATCH 2/4] Support range quals in Tid Scan

This means queries with expressions such as "ctid >= ? AND ctid < ?" can be
answered by scanning over that part of a table, rather than falling back to a
full SeqScan.
---
 src/backend/executor/nodeTidscan.c      | 858 ++++++++++++++++++++++++--------
 src/backend/optimizer/path/costsize.c   |  43 +-
 src/backend/optimizer/path/tidpath.c    | 145 ++++--
 src/backend/optimizer/plan/createplan.c |  27 +-
 src/include/catalog/pg_operator.dat     |   6 +-
 src/include/nodes/execnodes.h           |  24 +-
 src/include/nodes/relation.h            |  13 +-
 src/test/regress/expected/tidscan.out   | 250 ++++++++++
 src/test/regress/sql/tidscan.sql        |  76 +++
 9 files changed, 1180 insertions(+), 262 deletions(-)

diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index bc859e3..3897b97 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -22,7 +22,9 @@
  */
 #include "postgres.h"
 
+#include "access/relscan.h"
 #include "access/sysattr.h"
+#include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "executor/execdebug.h"
 #include "executor/nodeTidscan.h"
@@ -39,21 +41,132 @@
 	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
 	 ((Var *) (node))->varlevelsup == 0)
 
+typedef enum
+{
+	TIDEXPR_IN_ARRAY,
+	TIDEXPR_EQ,
+	TIDEXPR_UPPER_BOUND,
+	TIDEXPR_LOWER_BOUND
+}			TidExprType;
+
+/* one element in TidExpr's opexprs */
+typedef struct TidOpExpr
+{
+	TidExprType exprtype;		/* type of op */
+	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
+	bool		inclusive;		/* whether op is inclusive */
+}			TidOpExpr;
+
 /* one element in tss_tidexprs */
 typedef struct TidExpr
 {
-	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
-	bool		isarray;		/* if true, it yields tid[] not just tid */
-	CurrentOfExpr *cexpr;		/* alternatively, we can have CURRENT OF */
+	List	   *opexprs;		/* list of individual op exprs */
+	CurrentOfExpr *cexpr;		/* For TIDEXPR_CURRENT_OF */
 } TidExpr;
 
+typedef struct TidRange
+{
+	ItemPointerData first;
+	ItemPointerData last;
+}			TidRange;
+
+static TidOpExpr * MakeTidOpExpr(OpExpr *expr, TidScanState *tidstate);
+static TidOpExpr * MakeTidScalarArrayOpExpr(ScalarArrayOpExpr *saop, TidScanState *tidstate);
+static List *MakeTidOpExprList(List *exprs, TidScanState *tidstate);
 static void TidExprListCreate(TidScanState *tidstate);
+static TidRange * EnsureTidRangeSpace(TidRange * tidRanges, int numRanges, int *numAllocRanges,
+									  int numNewItems);
+static bool SetTidLowerBound(ItemPointer tid, bool inclusive, ItemPointer lowerBound);
+static bool SetTidUpperBound(ItemPointer tid, bool inclusive, ItemPointer upperBound);
 static void TidListEval(TidScanState *tidstate);
-static int	itemptr_comparator(const void *a, const void *b);
+static bool MergeTidRanges(TidRange * a, TidRange * b);
+static int	tidrange_comparator(const void *a, const void *b);
+static HeapScanDesc BeginTidRangeScan(TidScanState *node, TidRange * range);
+static HeapTuple NextInTidRange(HeapScanDesc scandesc, ScanDirection direction, TidRange * range);
 static TupleTableSlot *TidNext(TidScanState *node);
 
 
 /*
+ * For the given 'expr', build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.
+ */
+static TidOpExpr *
+MakeTidOpExpr(OpExpr *expr, TidScanState *tidstate)
+{
+	Node	   *arg1 = get_leftop((Expr *) expr);
+	Node	   *arg2 = get_rightop((Expr *) expr);
+	ExprState  *exprstate = NULL;
+	bool		invert = false;
+	TidOpExpr  *tidopexpr;
+
+	if (IsCTIDVar(arg1))
+		exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
+	else if (IsCTIDVar(arg2))
+	{
+		exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
+		invert = true;
+	}
+	else
+		elog(ERROR, "could not identify CTID variable");
+
+	tidopexpr = (TidOpExpr *) palloc0(sizeof(TidOpExpr));
+
+	switch (expr->opno)
+	{
+		case TIDLessEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDLessOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
+			break;
+		case TIDGreaterEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDGreaterOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
+			break;
+		default:
+			tidopexpr->exprtype = TIDEXPR_EQ;
+	}
+
+	tidopexpr->exprstate = exprstate;
+
+	return tidopexpr;
+}
+
+static TidOpExpr *
+MakeTidScalarArrayOpExpr(ScalarArrayOpExpr *saop, TidScanState *tidstate)
+{
+	TidOpExpr  *tidopexpr;
+
+	Assert(IsCTIDVar(linitial(saop->args)));
+
+	tidopexpr = (TidOpExpr *) palloc0(sizeof(TidOpExpr));
+	tidopexpr->exprstate = ExecInitExpr(lsecond(saop->args),
+										&tidstate->ss.ps);
+	tidopexpr->exprtype = TIDEXPR_IN_ARRAY;
+
+	return tidopexpr;
+}
+
+static List *
+MakeTidOpExprList(List *exprs, TidScanState *tidstate)
+{
+	ListCell   *l;
+	List	   *tidopexprs = NIL;
+
+	foreach(l, exprs)
+	{
+		OpExpr	   *opexpr = lfirst(l);
+		TidOpExpr  *tidopexpr = MakeTidOpExpr(opexpr, tidstate);
+
+		tidopexprs = lappend(tidopexprs, tidopexpr);
+	}
+
+	return tidopexprs;
+}
+
+/*
  * Extract the qual subexpressions that yield TIDs to search for,
  * and compile them into ExprStates if they're ordinary expressions.
  *
@@ -69,6 +182,17 @@ TidExprListCreate(TidScanState *tidstate)
 	tidstate->tss_tidexprs = NIL;
 	tidstate->tss_isCurrentOf = false;
 
+	/*
+	 * If no quals were specified, then a complete scan is assumed.  Make a
+	 * TidExpr with an empty list of TidOpExprs.
+	 */
+	if (!node->tidquals)
+	{
+		TidExpr    *tidexpr = (TidExpr *) palloc0(sizeof(TidExpr));
+
+		tidstate->tss_tidexprs = lappend(tidstate->tss_tidexprs, tidexpr);
+	}
+
 	foreach(l, node->tidquals)
 	{
 		Expr	   *expr = (Expr *) lfirst(l);
@@ -76,37 +200,30 @@ TidExprListCreate(TidScanState *tidstate)
 
 		if (is_opclause(expr))
 		{
-			Node	   *arg1;
-			Node	   *arg2;
-
-			arg1 = get_leftop(expr);
-			arg2 = get_rightop(expr);
-			if (IsCTIDVar(arg1))
-				tidexpr->exprstate = ExecInitExpr((Expr *) arg2,
-												  &tidstate->ss.ps);
-			else if (IsCTIDVar(arg2))
-				tidexpr->exprstate = ExecInitExpr((Expr *) arg1,
-												  &tidstate->ss.ps);
-			else
-				elog(ERROR, "could not identify CTID variable");
-			tidexpr->isarray = false;
+			OpExpr	   *opexpr = (OpExpr *) expr;
+			TidOpExpr  *tidopexpr = MakeTidOpExpr(opexpr, tidstate);
+
+			tidexpr->opexprs = list_make1(tidopexpr);
 		}
 		else if (expr && IsA(expr, ScalarArrayOpExpr))
 		{
-			ScalarArrayOpExpr *saex = (ScalarArrayOpExpr *) expr;
+			ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) expr;
+			TidOpExpr  *tidopexpr = MakeTidScalarArrayOpExpr(saop, tidstate);
 
-			Assert(IsCTIDVar(linitial(saex->args)));
-			tidexpr->exprstate = ExecInitExpr(lsecond(saex->args),
-											  &tidstate->ss.ps);
-			tidexpr->isarray = true;
+			tidexpr->opexprs = list_make1(tidopexpr);
 		}
 		else if (expr && IsA(expr, CurrentOfExpr))
 		{
 			CurrentOfExpr *cexpr = (CurrentOfExpr *) expr;
 
+			/* For CURRENT OF, save the expression in the TidExpr. */
 			tidexpr->cexpr = cexpr;
 			tidstate->tss_isCurrentOf = true;
 		}
+		else if (and_clause((Node *) expr))
+		{
+			tidexpr->opexprs = MakeTidOpExprList(((BoolExpr *) expr)->args, tidstate);
+		}
 		else
 			elog(ERROR, "could not identify CTID expression");
 
@@ -119,7 +236,256 @@ TidExprListCreate(TidScanState *tidstate)
 }
 
 /*
- * Compute the list of TIDs to be visited, by evaluating the expressions
+ * Ensure the array of TidRange objects has enough space for new items.
+ * May reallocate array.
+ */
+static TidRange *
+EnsureTidRangeSpace(TidRange * tidRanges, int numRanges, int *numAllocRanges,
+					int numNewItems)
+{
+	if (numRanges + numNewItems > *numAllocRanges)
+	{
+		/* If growing by one, grow exponentially; otherwise, grow just enough. */
+		if (numNewItems == 1)
+			*numAllocRanges *= 2;
+		else
+			*numAllocRanges = numRanges + numNewItems;
+
+		tidRanges = (TidRange *)
+			repalloc(tidRanges,
+					 *numAllocRanges * sizeof(TidRange));
+	}
+	return tidRanges;
+}
+
+/*
+ * Set a lower bound tid, taking into account the inclusivity of the bound.
+ * Return true if the bound is valid.
+ */
+static bool
+SetTidLowerBound(ItemPointer tid, bool inclusive, ItemPointer lowerBound)
+{
+	OffsetNumber offset;
+
+	*lowerBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	if (!inclusive)
+	{
+		/* Check if the lower bound is actually in the next block. */
+		if (offset >= MaxOffsetNumber)
+		{
+			BlockNumber block = ItemPointerGetBlockNumberNoCheck(lowerBound);
+
+			/*
+			 * If the lower bound was already or above at the maximum block
+			 * number, then there is no valid range.
+			 */
+			if (block >= MaxBlockNumber)
+				return false;
+
+			ItemPointerSetBlockNumber(lowerBound, block + 1);
+			ItemPointerSetOffsetNumber(lowerBound, 1);
+		}
+		else
+			ItemPointerSetOffsetNumber(lowerBound, OffsetNumberNext(offset));
+	}
+	else if (offset == 0)
+		ItemPointerSetOffsetNumber(lowerBound, 1);
+
+	return true;
+}
+
+/*
+ * Set an upper bound tid, taking into account the inclusivity of the bound.
+ * Return true if the bound is valid.
+ */
+static bool
+SetTidUpperBound(ItemPointer tid, bool inclusive, ItemPointer upperBound)
+{
+	OffsetNumber offset;
+
+	*upperBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	/*
+	 * Since TID offsets start at 1, an inclusive upper bound with offset 0
+	 * can be treated as an exclusive bound.  This has the benefit of
+	 * eliminating that block from the scan range.
+	 */
+	if (inclusive && offset == 0)
+		inclusive = false;
+
+	if (!inclusive)
+	{
+		/* Check if the upper bound is actually in the previous block. */
+		if (offset == 0)
+		{
+			BlockNumber block = ItemPointerGetBlockNumberNoCheck(upperBound);
+
+			/*
+			 * If the upper bound was already in block 0, then there is no
+			 * valid range.
+			 */
+			if (block == 0)
+				return false;
+
+			ItemPointerSetBlockNumber(upperBound, block - 1);
+			ItemPointerSetOffsetNumber(upperBound, MaxOffsetNumber);
+		}
+		else
+			ItemPointerSetOffsetNumber(upperBound, OffsetNumberPrev(offset));
+	}
+
+	return true;
+}
+
+static void
+TidInArrayExprEval(TidOpExpr * tidopexpr, BlockNumber nblocks, TidScanState *tidstate,
+				   TidRange * *tidRanges, int *numRanges, int *numAllocRanges)
+{
+	ExprContext *econtext = tidstate->ss.ps.ps_ExprContext;
+	bool		isNull;
+	Datum		arraydatum;
+	ArrayType  *itemarray;
+	Datum	   *ipdatums;
+	bool	   *ipnulls;
+	int			ndatums;
+	int			i;
+
+	arraydatum = ExecEvalExprSwitchContext(tidopexpr->exprstate,
+										   econtext,
+										   &isNull);
+	if (isNull)
+		return;
+
+	itemarray = DatumGetArrayTypeP(arraydatum);
+	deconstruct_array(itemarray,
+					  TIDOID, sizeof(ItemPointerData), false, 's',
+					  &ipdatums, &ipnulls, &ndatums);
+
+	*tidRanges = EnsureTidRangeSpace(*tidRanges, *numRanges, numAllocRanges, ndatums);
+
+	for (i = 0; i < ndatums; i++)
+	{
+		if (!ipnulls[i])
+		{
+			ItemPointer itemptr = (ItemPointer) DatumGetPointer(ipdatums[i]);
+
+			if (ItemPointerIsValid(itemptr) &&
+				ItemPointerGetBlockNumber(itemptr) < nblocks)
+			{
+				(*tidRanges)[*numRanges].first = *itemptr;
+				(*tidRanges)[*numRanges].last = *itemptr;
+				(*numRanges)++;
+			}
+		}
+	}
+	pfree(ipdatums);
+	pfree(ipnulls);
+}
+
+static void
+TidExprEval(TidExpr *expr, BlockNumber nblocks, TidScanState *tidstate,
+			TidRange * *tidRanges, int *numRanges, int *numAllocRanges)
+{
+	ExprContext *econtext = tidstate->ss.ps.ps_ExprContext;
+	ListCell   *l;
+	ItemPointerData lowerBound;
+	ItemPointerData upperBound;
+
+	/* The biggest range on an empty table is empty; just skip it. */
+	if (nblocks == 0)
+		return;
+
+	/* Set the lower and upper bound to scan the whole table. */
+	ItemPointerSetBlockNumber(&lowerBound, 0);
+	ItemPointerSetOffsetNumber(&lowerBound, 1);
+	ItemPointerSetBlockNumber(&upperBound, nblocks - 1);
+	ItemPointerSetOffsetNumber(&upperBound, MaxOffsetNumber);
+
+	foreach(l, expr->opexprs)
+	{
+		TidOpExpr  *tidopexpr = (TidOpExpr *) lfirst(l);
+
+		if (tidopexpr->exprtype == TIDEXPR_IN_ARRAY)
+		{
+			TidInArrayExprEval(tidopexpr, nblocks, tidstate,
+							   tidRanges, numRanges, numAllocRanges);
+
+			/*
+			 * A CTID = ANY expression only exists by itself; there shouldn't
+			 * be any other quals alongside it.  TidInArrayExprEval has
+			 * already added the ranges, so just return here.
+			 */
+			Assert(list_length(expr->opexprs) == 1);
+			return;
+		}
+		else
+		{
+			ItemPointer itemptr;
+			bool		isNull;
+
+			/* Evaluate this bound. */
+			itemptr = (ItemPointer)
+				DatumGetPointer(ExecEvalExprSwitchContext(tidopexpr->exprstate,
+														  econtext,
+														  &isNull));
+
+			/* If the bound is NULL, *nothing* matches the qual. */
+			if (isNull)
+				return;
+
+			if (tidopexpr->exprtype == TIDEXPR_EQ && ItemPointerIsValid(itemptr))
+			{
+				lowerBound = *itemptr;
+				upperBound = *itemptr;
+
+				/*
+				 * A CTID = ? expression only exists by itself, so set the
+				 * range to this single TID, and exit the loop (the remainder
+				 * of this function will add the range).
+				 */
+				Assert(list_length(expr->opexprs) == 1);
+				break;
+			}
+
+			if (tidopexpr->exprtype == TIDEXPR_LOWER_BOUND)
+			{
+				ItemPointerData lb;
+
+				if (!SetTidLowerBound(itemptr, tidopexpr->inclusive, &lb))
+					return;
+
+				if (ItemPointerCompare(&lb, &lowerBound) > 0)
+					lowerBound = lb;
+			}
+
+			if (tidopexpr->exprtype == TIDEXPR_UPPER_BOUND)
+			{
+				ItemPointerData ub;
+
+				if (!SetTidUpperBound(itemptr, tidopexpr->inclusive, &ub))
+					return;
+
+				if (ItemPointerCompare(&ub, &upperBound) < 0)
+					upperBound = ub;
+			}
+		}
+	}
+
+	/* If the resulting range is not empty, add it to the array. */
+	if (ItemPointerCompare(&lowerBound, &upperBound) <= 0)
+	{
+		*tidRanges = EnsureTidRangeSpace(*tidRanges, *numRanges, numAllocRanges, 1);
+		(*tidRanges)[*numRanges].first = lowerBound;
+		(*tidRanges)[*numRanges].last = upperBound;
+		(*numRanges)++;
+	}
+}
+
+/*
+ * Compute the list of TID ranges to be visited, by evaluating the expressions
  * for them.
  *
  * (The result is actually an array, not a list.)
@@ -129,9 +495,9 @@ TidListEval(TidScanState *tidstate)
 {
 	ExprContext *econtext = tidstate->ss.ps.ps_ExprContext;
 	BlockNumber nblocks;
-	ItemPointerData *tidList;
-	int			numAllocTids;
-	int			numTids;
+	TidRange   *tidRanges;
+	int			numAllocRanges;
+	int			numRanges;
 	ListCell   *l;
 
 	/*
@@ -147,76 +513,15 @@ TidListEval(TidScanState *tidstate)
 	 * are simple OpExprs or CurrentOfExprs.  If there are any
 	 * ScalarArrayOpExprs, we may have to enlarge the array.
 	 */
-	numAllocTids = list_length(tidstate->tss_tidexprs);
-	tidList = (ItemPointerData *)
-		palloc(numAllocTids * sizeof(ItemPointerData));
-	numTids = 0;
+	numAllocRanges = list_length(tidstate->tss_tidexprs);
+	tidRanges = (TidRange *) palloc0(numAllocRanges * sizeof(TidRange));
+	numRanges = 0;
 
 	foreach(l, tidstate->tss_tidexprs)
 	{
 		TidExpr    *tidexpr = (TidExpr *) lfirst(l);
-		ItemPointer itemptr;
-		bool		isNull;
 
-		if (tidexpr->exprstate && !tidexpr->isarray)
-		{
-			itemptr = (ItemPointer)
-				DatumGetPointer(ExecEvalExprSwitchContext(tidexpr->exprstate,
-														  econtext,
-														  &isNull));
-			if (!isNull &&
-				ItemPointerIsValid(itemptr) &&
-				ItemPointerGetBlockNumber(itemptr) < nblocks)
-			{
-				if (numTids >= numAllocTids)
-				{
-					numAllocTids *= 2;
-					tidList = (ItemPointerData *)
-						repalloc(tidList,
-								 numAllocTids * sizeof(ItemPointerData));
-				}
-				tidList[numTids++] = *itemptr;
-			}
-		}
-		else if (tidexpr->exprstate && tidexpr->isarray)
-		{
-			Datum		arraydatum;
-			ArrayType  *itemarray;
-			Datum	   *ipdatums;
-			bool	   *ipnulls;
-			int			ndatums;
-			int			i;
-
-			arraydatum = ExecEvalExprSwitchContext(tidexpr->exprstate,
-												   econtext,
-												   &isNull);
-			if (isNull)
-				continue;
-			itemarray = DatumGetArrayTypeP(arraydatum);
-			deconstruct_array(itemarray,
-							  TIDOID, sizeof(ItemPointerData), false, 's',
-							  &ipdatums, &ipnulls, &ndatums);
-			if (numTids + ndatums > numAllocTids)
-			{
-				numAllocTids = numTids + ndatums;
-				tidList = (ItemPointerData *)
-					repalloc(tidList,
-							 numAllocTids * sizeof(ItemPointerData));
-			}
-			for (i = 0; i < ndatums; i++)
-			{
-				if (!ipnulls[i])
-				{
-					itemptr = (ItemPointer) DatumGetPointer(ipdatums[i]);
-					if (ItemPointerIsValid(itemptr) &&
-						ItemPointerGetBlockNumber(itemptr) < nblocks)
-						tidList[numTids++] = *itemptr;
-				}
-			}
-			pfree(ipdatums);
-			pfree(ipnulls);
-		}
-		else
+		if (tidexpr->cexpr)
 		{
 			ItemPointerData cursor_tid;
 
@@ -225,16 +530,17 @@ TidListEval(TidScanState *tidstate)
 							  RelationGetRelid(tidstate->ss.ss_currentRelation),
 							  &cursor_tid))
 			{
-				if (numTids >= numAllocTids)
-				{
-					numAllocTids *= 2;
-					tidList = (ItemPointerData *)
-						repalloc(tidList,
-								 numAllocTids * sizeof(ItemPointerData));
-				}
-				tidList[numTids++] = cursor_tid;
+				tidRanges = EnsureTidRangeSpace(tidRanges, numRanges, &numAllocRanges, 1);
+				tidRanges[numRanges].first = cursor_tid;
+				tidRanges[numRanges].last = cursor_tid;
+				numRanges++;
 			}
 		}
+		else
+		{
+			TidExprEval(tidexpr, nblocks, tidstate,
+						&tidRanges, &numRanges, &numAllocRanges);
+		}
 	}
 
 	/*
@@ -243,52 +549,152 @@ TidListEval(TidScanState *tidstate)
 	 * the list.  Sorting makes it easier to detect duplicates, and as a bonus
 	 * ensures that we will visit the heap in the most efficient way.
 	 */
-	if (numTids > 1)
+	if (numRanges > 1)
 	{
-		int			lastTid;
+		int			lastRange;
 		int			i;
 
 		/* CurrentOfExpr could never appear OR'd with something else */
 		Assert(!tidstate->tss_isCurrentOf);
 
-		qsort((void *) tidList, numTids, sizeof(ItemPointerData),
-			  itemptr_comparator);
-		lastTid = 0;
-		for (i = 1; i < numTids; i++)
+		qsort((void *) tidRanges, numRanges, sizeof(TidRange), tidrange_comparator);
+		lastRange = 0;
+		for (i = 1; i < numRanges; i++)
 		{
-			if (!ItemPointerEquals(&tidList[lastTid], &tidList[i]))
-				tidList[++lastTid] = tidList[i];
+			if (!MergeTidRanges(&tidRanges[lastRange], &tidRanges[i]))
+				tidRanges[++lastRange] = tidRanges[i];
 		}
-		numTids = lastTid + 1;
+		numRanges = lastRange + 1;
 	}
 
-	tidstate->tss_TidList = tidList;
-	tidstate->tss_NumTids = numTids;
-	tidstate->tss_TidPtr = -1;
+	tidstate->tss_TidRanges = tidRanges;
+	tidstate->tss_NumTidRanges = numRanges;
+	tidstate->tss_TidRangePtr = -1;
+}
+
+/*
+ * If two ranges overlap, merge them into one.
+ * Assumes the two ranges are already ordered by (first, last).
+ * Returns true if they were merged.
+ */
+static bool
+MergeTidRanges(TidRange * a, TidRange * b)
+{
+	ItemPointerData a_last = a->last;
+	ItemPointerData b_last;
+
+	if (!ItemPointerIsValid(&a_last))
+		a_last = a->first;
+
+	/*
+	 * If the first range ends before the second one begins, they don't
+	 * overlap.
+	 */
+	if (ItemPointerCompare(&a_last, &b->first) < 0)
+		return false;
+
+	b_last = b->last;
+	if (!ItemPointerIsValid(&b_last))
+		b_last = b->first;
+
+	/*
+	 * Since they overlap, the end of the new range should be the maximum of
+	 * the original two range ends.
+	 */
+	if (ItemPointerCompare(&a_last, &b_last) < 0)
+		a->last = b->last;
+	return true;
 }
 
 /*
- * qsort comparator for ItemPointerData items
+ * qsort comparator for TidRange items
  */
 static int
-itemptr_comparator(const void *a, const void *b)
+tidrange_comparator(const void *a, const void *b)
 {
-	const ItemPointerData *ipa = (const ItemPointerData *) a;
-	const ItemPointerData *ipb = (const ItemPointerData *) b;
-	BlockNumber ba = ItemPointerGetBlockNumber(ipa);
-	BlockNumber bb = ItemPointerGetBlockNumber(ipb);
-	OffsetNumber oa = ItemPointerGetOffsetNumber(ipa);
-	OffsetNumber ob = ItemPointerGetOffsetNumber(ipb);
-
-	if (ba < bb)
-		return -1;
-	if (ba > bb)
-		return 1;
-	if (oa < ob)
-		return -1;
-	if (oa > ob)
-		return 1;
-	return 0;
+	TidRange   *tra = (TidRange *) a;
+	TidRange   *trb = (TidRange *) b;
+	int			cmp_first = ItemPointerCompare(&tra->first, &trb->first);
+
+	if (cmp_first != 0)
+		return cmp_first;
+	else
+		return ItemPointerCompare(&tra->last, &trb->last);
+}
+
+static HeapScanDesc
+BeginTidRangeScan(TidScanState *node, TidRange * range)
+{
+	HeapScanDesc scandesc = node->ss.ss_currentScanDesc;
+	BlockNumber first_block = ItemPointerGetBlockNumberNoCheck(&range->first);
+	BlockNumber last_block = ItemPointerGetBlockNumberNoCheck(&range->last);
+
+	if (!scandesc)
+	{
+		EState	   *estate = node->ss.ps.state;
+
+		scandesc = heap_beginscan_strat(node->ss.ss_currentRelation,
+										estate->es_snapshot,
+										0, NULL,
+										false, false);
+		node->ss.ss_currentScanDesc = scandesc;
+	}
+	else
+		heap_rescan(scandesc, NULL);
+
+	heap_setscanlimits(scandesc, first_block, last_block - first_block + 1);
+	node->tss_inScan = true;
+	return scandesc;
+}
+
+static HeapTuple
+NextInTidRange(HeapScanDesc scandesc, ScanDirection direction, TidRange * range)
+{
+	BlockNumber first_block = ItemPointerGetBlockNumber(&range->first);
+	OffsetNumber first_offset = ItemPointerGetOffsetNumber(&range->first);
+	BlockNumber last_block = ItemPointerGetBlockNumber(&range->last);
+	OffsetNumber last_offset = ItemPointerGetOffsetNumber(&range->last);
+	HeapTuple	tuple;
+
+	for (;;)
+	{
+		BlockNumber block;
+		OffsetNumber offset;
+
+		tuple = heap_getnext(scandesc, direction);
+		if (!tuple)
+			break;
+
+		/* Check that the tuple is within the required range. */
+		block = ItemPointerGetBlockNumber(&tuple->t_self);
+		offset = ItemPointerGetOffsetNumber(&tuple->t_self);
+
+		/*
+		 * If the tuple is in the fist block of the range and before the first
+		 * requested offset, then we can either skip it (if scanning forward),
+		 * or end the scan (if scanning backward).
+		 */
+		if (block == first_block && offset < first_offset)
+		{
+			if (ScanDirectionIsForward(direction))
+				continue;
+			else
+				return NULL;
+		}
+
+		/* Similarly for the last block, after the last requested offset. */
+		if (block == last_block && offset > last_offset)
+		{
+			if (ScanDirectionIsBackward(direction))
+				continue;
+			else
+				return NULL;
+		}
+
+		break;
+	}
+
+	return tuple;
 }
 
 /* ----------------------------------------------------------------
@@ -302,6 +708,7 @@ itemptr_comparator(const void *a, const void *b)
 static TupleTableSlot *
 TidNext(TidScanState *node)
 {
+	HeapScanDesc scandesc;
 	EState	   *estate;
 	ScanDirection direction;
 	Snapshot	snapshot;
@@ -309,105 +716,141 @@ TidNext(TidScanState *node)
 	HeapTuple	tuple;
 	TupleTableSlot *slot;
 	Buffer		buffer = InvalidBuffer;
-	ItemPointerData *tidList;
-	int			numTids;
-	bool		bBackward;
+	int			numRanges;
 
 	/*
 	 * extract necessary information from tid scan node
 	 */
+	scandesc = node->ss.ss_currentScanDesc;
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	snapshot = estate->es_snapshot;
 	heapRelation = node->ss.ss_currentRelation;
 	slot = node->ss.ss_ScanTupleSlot;
 
-	/*
-	 * First time through, compute the list of TIDs to be visited
-	 */
-	if (node->tss_TidList == NULL)
+	/* First time through, compute the list of TID ranges to be visited */
+	if (node->tss_TidRanges == NULL)
+	{
 		TidListEval(node);
 
-	tidList = node->tss_TidList;
-	numTids = node->tss_NumTids;
+		node->tss_TidRangePtr = -1;
+	}
 
-	/*
-	 * We use node->tss_htup as the tuple pointer; note this can't just be a
-	 * local variable here, as the scan tuple slot will keep a pointer to it.
-	 */
-	tuple = &(node->tss_htup);
+	numRanges = node->tss_NumTidRanges;
 
-	/*
-	 * Initialize or advance scan position, depending on direction.
-	 */
-	bBackward = ScanDirectionIsBackward(direction);
-	if (bBackward)
+	tuple = NULL;
+	for (;;)
 	{
-		if (node->tss_TidPtr < 0)
-		{
-			/* initialize for backward scan */
-			node->tss_TidPtr = numTids - 1;
-		}
-		else
-			node->tss_TidPtr--;
-	}
-	else
-	{
-		if (node->tss_TidPtr < 0)
+		TidRange   *currentRange;
+
+		if (!node->tss_inScan)
 		{
-			/* initialize for forward scan */
-			node->tss_TidPtr = 0;
+			/* Initialize or advance scan position, depending on direction. */
+			bool		bBackward = ScanDirectionIsBackward(direction);
+
+			if (bBackward)
+			{
+				if (node->tss_TidRangePtr < 0)
+				{
+					/* initialize for backward scan */
+					node->tss_TidRangePtr = numRanges - 1;
+				}
+				else
+					node->tss_TidRangePtr--;
+			}
+			else
+			{
+				if (node->tss_TidRangePtr < 0)
+				{
+					/* initialize for forward scan */
+					node->tss_TidRangePtr = 0;
+				}
+				else
+					node->tss_TidRangePtr++;
+			}
 		}
-		else
-			node->tss_TidPtr++;
-	}
 
-	while (node->tss_TidPtr >= 0 && node->tss_TidPtr < numTids)
-	{
-		tuple->t_self = tidList[node->tss_TidPtr];
+		if (node->tss_TidRangePtr >= numRanges || node->tss_TidRangePtr < 0)
+			break;
+
+		currentRange = &node->tss_TidRanges[node->tss_TidRangePtr];
 
 		/*
-		 * For WHERE CURRENT OF, the tuple retrieved from the cursor might
-		 * since have been updated; if so, we should fetch the version that is
-		 * current according to our snapshot.
+		 * Ranges with only one item -- including one resulting from a
+		 * CURRENT-OF qual -- are handled by looking up the item directly.
 		 */
-		if (node->tss_isCurrentOf)
-			heap_get_latest_tid(heapRelation, snapshot, &tuple->t_self);
-
-		if (heap_fetch(heapRelation, snapshot, tuple, &buffer, false, NULL))
+		if (ItemPointerEquals(&currentRange->first, &currentRange->last))
 		{
 			/*
-			 * Store the scanned tuple in the scan tuple slot of the scan
-			 * state.  Eventually we will only do this and not return a tuple.
+			 * We use node->tss_htup as the tuple pointer; note this can't
+			 * just be a local variable here, as the scan tuple slot will keep
+			 * a pointer to it.
 			 */
-			ExecStoreBufferHeapTuple(tuple, /* tuple to store */
-									 slot,	/* slot to store in */
-									 buffer);	/* buffer associated with
-												 * tuple */
+			tuple = &(node->tss_htup);
+			tuple->t_self = currentRange->first;
 
 			/*
-			 * At this point we have an extra pin on the buffer, because
-			 * ExecStoreHeapTuple incremented the pin count. Drop our local
-			 * pin.
+			 * For WHERE CURRENT OF, the tuple retrieved from the cursor might
+			 * since have been updated; if so, we should fetch the version
+			 * that is current according to our snapshot.
 			 */
-			ReleaseBuffer(buffer);
+			if (node->tss_isCurrentOf)
+				heap_get_latest_tid(heapRelation, snapshot, &tuple->t_self);
 
-			return slot;
+			if (heap_fetch(heapRelation, snapshot, tuple, &buffer, false, NULL))
+			{
+				/*
+				 * Store the scanned tuple in the scan tuple slot of the scan
+				 * state.  Eventually we will only do this and not return a
+				 * tuple.
+				 */
+				ExecStoreBufferHeapTuple(tuple, /* tuple to store */
+										 slot,	/* slot to store in */
+										 buffer);	/* buffer associated with
+													 * tuple */
+
+				/*
+				 * At this point we have an extra pin on the buffer, because
+				 * ExecStoreHeapTuple incremented the pin count. Drop our
+				 * local pin.
+				 */
+				ReleaseBuffer(buffer);
+
+				return slot;
+			}
+			else
+			{
+				tuple = NULL;
+			}
 		}
-		/* Bad TID or failed snapshot qual; try next */
-		if (bBackward)
-			node->tss_TidPtr--;
 		else
-			node->tss_TidPtr++;
+		{
+			if (!node->tss_inScan)
+				scandesc = BeginTidRangeScan(node, currentRange);
+
+			tuple = NextInTidRange(scandesc, direction, currentRange);
+			if (tuple)
+				break;
 
-		CHECK_FOR_INTERRUPTS();
+			node->tss_inScan = false;
+		}
 	}
 
 	/*
-	 * if we get here it means the tid scan failed so we are at the end of the
-	 * scan..
+	 * save the tuple and the buffer returned to us by the access methods in
+	 * our scan tuple slot and return the slot.  Note also that
+	 * ExecStoreHeapTuple will increment the refcount of the buffer; the
+	 * refcount will not be dropped until the tuple table slot is cleared.
 	 */
-	return ExecClearTuple(slot);
+	if (tuple)
+		ExecStoreBufferHeapTuple(tuple, /* tuple to store */
+								 slot,	/* slot to store in */
+								 scandesc->rs_cbuf);	/* buffer associated
+														 * with this tuple */
+	else
+		ExecClearTuple(slot);
+
+	return slot;
 }
 
 /*
@@ -460,11 +903,13 @@ ExecTidScan(PlanState *pstate)
 void
 ExecReScanTidScan(TidScanState *node)
 {
-	if (node->tss_TidList)
-		pfree(node->tss_TidList);
-	node->tss_TidList = NULL;
-	node->tss_NumTids = 0;
-	node->tss_TidPtr = -1;
+	if (node->tss_TidRanges)
+		pfree(node->tss_TidRanges);
+
+	node->tss_TidRanges = NULL;
+	node->tss_NumTidRanges = 0;
+	node->tss_TidRangePtr = -1;
+	node->tss_inScan = false;
 
 	ExecScanReScan(&node->ss);
 }
@@ -479,6 +924,8 @@ ExecReScanTidScan(TidScanState *node)
 void
 ExecEndTidScan(TidScanState *node)
 {
+	HeapScanDesc scan = node->ss.ss_currentScanDesc;
+
 	/*
 	 * Free the exprcontext
 	 */
@@ -490,6 +937,10 @@ ExecEndTidScan(TidScanState *node)
 	if (node->ss.ps.ps_ResultTupleSlot)
 		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+	/* close heap scan */
+	if (scan != NULL)
+		heap_endscan(scan);
 }
 
 /* ----------------------------------------------------------------
@@ -525,11 +976,12 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
 	ExecAssignExprContext(estate, &tidstate->ss.ps);
 
 	/*
-	 * mark tid list as not computed yet
+	 * mark tid range list as not computed yet
 	 */
-	tidstate->tss_TidList = NULL;
-	tidstate->tss_NumTids = 0;
-	tidstate->tss_TidPtr = -1;
+	tidstate->tss_TidRanges = NULL;
+	tidstate->tss_NumTidRanges = 0;
+	tidstate->tss_TidRangePtr = -1;
+	tidstate->tss_inScan = false;
 
 	/*
 	 * open the scan relation
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7bf67a0..bffd2c0 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1184,9 +1184,12 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	QualCost	qpqual_cost;
 	Cost		cpu_per_tuple;
 	QualCost	tid_qual_cost;
-	int			ntuples;
+	double		ntuples;
+	double		nrandompages;
+	double		nseqpages;
 	ListCell   *l;
 	double		spc_random_page_cost;
+	double		spc_seq_page_cost;
 
 	/* Should only be applied to base relations */
 	Assert(baserel->relid > 0);
@@ -1198,8 +1201,10 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	else
 		path->rows = baserel->rows;
 
-	/* Count how many tuples we expect to retrieve */
-	ntuples = 0;
+	/* Count how many tuples and pages we expect to retrieve */
+	ntuples = 0.0;
+	nrandompages = 0.0;
+	nseqpages = 0.0;
 	foreach(l, tidquals)
 	{
 		if (IsA(lfirst(l), ScalarArrayOpExpr))
@@ -1207,19 +1212,37 @@ cost_tidscan(Path *path, PlannerInfo *root,
 			/* Each element of the array yields 1 tuple */
 			ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) lfirst(l);
 			Node	   *arraynode = (Node *) lsecond(saop->args);
+			int			array_len = estimate_array_length(arraynode);
 
-			ntuples += estimate_array_length(arraynode);
+			ntuples += array_len;
+			nrandompages += array_len;
 		}
 		else if (IsA(lfirst(l), CurrentOfExpr))
 		{
 			/* CURRENT OF yields 1 tuple */
 			isCurrentOf = true;
-			ntuples++;
+			ntuples += 1.0;
+			nrandompages += 1.0;
 		}
 		else
 		{
-			/* It's just CTID = something, count 1 tuple */
-			ntuples++;
+			/*
+			 * For anything else, we'll use the normal selectivity estimate.
+			 * Count the first page as a random page, the rest as sequential.
+			 */
+			Selectivity selectivity = clause_selectivity(root, lfirst(l),
+														 baserel->relid,
+														 JOIN_INNER,
+														 NULL);
+			double		pages = selectivity * baserel->pages;
+
+			if (pages <= 0.0)
+				pages = 1.0;
+
+			/* TODO decide what the costs should be */
+			ntuples += selectivity * baserel->tuples;
+			nseqpages += pages - 1.0;
+			nrandompages += 1.0;
 		}
 	}
 
@@ -1248,10 +1271,10 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	/* fetch estimated page cost for tablespace containing table */
 	get_tablespace_page_costs(baserel->reltablespace,
 							  &spc_random_page_cost,
-							  NULL);
+							  &spc_seq_page_cost);
 
-	/* disk costs --- assume each tuple on a different page */
-	run_cost += spc_random_page_cost * ntuples;
+	/* disk costs */
+	run_cost += spc_random_page_cost * nrandompages + spc_seq_page_cost * nseqpages;
 
 	/* Add scanning CPU costs */
 	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 3bb5b8d..9005249 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -4,13 +4,15 @@
  *	  Routines to determine which TID conditions are usable for scanning
  *	  a given relation, and create TidPaths accordingly.
  *
- * What we are looking for here is WHERE conditions of the form
- * "CTID = pseudoconstant", which can be implemented by just fetching
- * the tuple directly via heap_fetch().  We can also handle OR'd conditions
- * such as (CTID = const1) OR (CTID = const2), as well as ScalarArrayOpExpr
- * conditions of the form CTID = ANY(pseudoconstant_array).  In particular
- * this allows
- *		WHERE ctid IN (tid1, tid2, ...)
+ * What we are looking for here is WHERE conditions of the forms:
+ * - "CTID = pseudoconstant", which can be implemented by just fetching
+ *    the tuple directly via heap_fetch().
+ * - "CTID IN (pseudoconstant, ...)" or "CTID = ANY(pseudoconstant_array)"
+ * - "CTID > pseudoconstant", etc. for >, >=, <, and <=.
+ * - "CTID > pseudoconstant AND CTID < pseudoconstant AND ...", etc.
+ *
+ * We can also handle OR'd conditions of the above form, such as
+ * "(CTID = const1) OR (CTID >= const2) OR CTID IN (...)".
  *
  * We also support "WHERE CURRENT OF cursor" conditions (CurrentOfExpr),
  * which amount to "CTID = run-time-determined-TID".  These could in
@@ -46,32 +48,45 @@
 #include "optimizer/restrictinfo.h"
 
 
-static bool IsTidEqualClause(OpExpr *node, int varno);
+static bool IsTidVar(Var *var, int varno);
+static bool IsTidComparison(OpExpr *node, int varno, Oid expected_comparison_operator);
 static bool IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno);
+static List *MakeTidRangeQuals(List *quals);
+static List *TidCompoundRangeQualFromExpr(Node *expr, int varno);
 static List *TidQualFromExpr(Node *expr, int varno);
 static List *TidQualFromBaseRestrictinfo(RelOptInfo *rel);
 
 
+static bool
+IsTidVar(Var *var, int varno)
+{
+	return (var->varattno == SelfItemPointerAttributeNumber &&
+			var->vartype == TIDOID &&
+			var->varno == varno &&
+			var->varlevelsup == 0);
+}
+
 /*
  * Check to see if an opclause is of the form
- *		CTID = pseudoconstant
+ *		CTID OP pseudoconstant
  * or
- *		pseudoconstant = CTID
+ *		pseudoconstant OP CTID
+ * where OP is the expected comparison operator.
  *
  * We check that the CTID Var belongs to relation "varno".  That is probably
  * redundant considering this is only applied to restriction clauses, but
  * let's be safe.
  */
 static bool
-IsTidEqualClause(OpExpr *node, int varno)
+IsTidComparison(OpExpr *node, int varno, Oid expected_comparison_operator)
 {
 	Node	   *arg1,
 			   *arg2,
 			   *other;
 	Var		   *var;
 
-	/* Operator must be tideq */
-	if (node->opno != TIDEqualOperator)
+	/* Operator must be the expected one */
+	if (node->opno != expected_comparison_operator)
 		return false;
 	if (list_length(node->args) != 2)
 		return false;
@@ -83,19 +98,13 @@ IsTidEqualClause(OpExpr *node, int varno)
 	if (arg1 && IsA(arg1, Var))
 	{
 		var = (Var *) arg1;
-		if (var->varattno == SelfItemPointerAttributeNumber &&
-			var->vartype == TIDOID &&
-			var->varno == varno &&
-			var->varlevelsup == 0)
+		if (IsTidVar(var, varno))
 			other = arg2;
 	}
 	if (!other && arg2 && IsA(arg2, Var))
 	{
 		var = (Var *) arg2;
-		if (var->varattno == SelfItemPointerAttributeNumber &&
-			var->vartype == TIDOID &&
-			var->varno == varno &&
-			var->varlevelsup == 0)
+		if (IsTidVar(var, varno))
 			other = arg1;
 	}
 	if (!other)
@@ -110,6 +119,17 @@ IsTidEqualClause(OpExpr *node, int varno)
 	return true;				/* success */
 }
 
+#define IsTidEqualClause(node, varno)	IsTidComparison(node, varno, TIDEqualOperator)
+#define IsTidLTClause(node, varno)		IsTidComparison(node, varno, TIDLessOperator)
+#define IsTidLEClause(node, varno)		IsTidComparison(node, varno, TIDLessEqOperator)
+#define IsTidGTClause(node, varno)		IsTidComparison(node, varno, TIDGreaterOperator)
+#define IsTidGEClause(node, varno)		IsTidComparison(node, varno, TIDGreaterEqOperator)
+
+#define IsTidRangeClause(node, varno)	(IsTidLTClause(node, varno) || \
+										 IsTidLEClause(node, varno) || \
+										 IsTidGTClause(node, varno) || \
+										 IsTidGEClause(node, varno))
+
 /*
  * Check to see if a clause is of the form
  *		CTID = ANY (pseudoconstant_array)
@@ -134,10 +154,7 @@ IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno)
 	{
 		Var		   *var = (Var *) arg1;
 
-		if (var->varattno == SelfItemPointerAttributeNumber &&
-			var->vartype == TIDOID &&
-			var->varno == varno &&
-			var->varlevelsup == 0)
+		if (IsTidVar(var, varno))
 		{
 			/* The other argument must be a pseudoconstant */
 			if (is_pseudo_constant_clause(arg2))
@@ -148,6 +165,42 @@ IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno)
 	return false;
 }
 
+static List *
+MakeTidRangeQuals(List *quals)
+{
+	if (list_length(quals) == 1)
+		return quals;
+	else
+		return list_make1(make_andclause(quals));
+}
+
+/*
+ * TidCompoundRangeQualFromExpr
+ *
+ * 		Extract a compound CTID range condition from the given qual expression
+ */
+static List *
+TidCompoundRangeQualFromExpr(Node *expr, int varno)
+{
+	ListCell   *l;
+	List	   *found_quals = NIL;
+
+	foreach(l, ((BoolExpr *) expr)->args)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* If this clause contains a range qual, add it to the list. */
+		if (is_opclause(clause) && IsTidRangeClause((OpExpr *) clause, varno))
+			found_quals = lappend(found_quals, clause);
+	}
+
+	/* If we found any, make an AND clause out of them. */
+	if (found_quals)
+		return MakeTidRangeQuals(found_quals);
+	else
+		return NIL;
+}
+
 /*
  *	Extract a set of CTID conditions from the given qual expression
  *
@@ -174,6 +227,8 @@ TidQualFromExpr(Node *expr, int varno)
 		/* base case: check for tideq opclause */
 		if (IsTidEqualClause((OpExpr *) expr, varno))
 			rlst = list_make1(expr);
+		else if (IsTidRangeClause((OpExpr *) expr, varno))
+			rlst = list_make1(expr);
 	}
 	else if (expr && IsA(expr, ScalarArrayOpExpr))
 	{
@@ -189,11 +244,18 @@ TidQualFromExpr(Node *expr, int varno)
 	}
 	else if (and_clause(expr))
 	{
-		foreach(l, ((BoolExpr *) expr)->args)
+		/* look for a range qual in the clause */
+		rlst = TidCompoundRangeQualFromExpr(expr, varno);
+
+		/* if no range qual was found, look for any other TID qual */
+		if (!rlst)
 		{
-			rlst = TidQualFromExpr((Node *) lfirst(l), varno);
-			if (rlst)
-				break;
+			foreach(l, ((BoolExpr *) expr)->args)
+			{
+				rlst = TidQualFromExpr((Node *) lfirst(l), varno);
+				if (rlst)
+					break;
+			}
 		}
 	}
 	else if (or_clause(expr))
@@ -217,17 +279,24 @@ TidQualFromExpr(Node *expr, int varno)
 }
 
 /*
- *	Extract a set of CTID conditions from the rel's baserestrictinfo list
+ * Extract a set of CTID conditions from the rel's baserestrictinfo list
+ *
+ * Normally we just use the first RestrictInfo item with some usable quals,
+ * but it's also possible for a good compound range qual, such as
+ * "CTID > ? AND CTID < ?", to be split across multiple items.  So we look for
+ * range quals in all items and use them if any were found.
  */
 static List *
 TidQualFromBaseRestrictinfo(RelOptInfo *rel)
 {
 	List	   *rlst = NIL;
 	ListCell   *l;
+	List	   *found_quals = NIL;
 
 	foreach(l, rel->baserestrictinfo)
 	{
 		RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+		Node	   *clause = (Node *) rinfo->clause;
 
 		/*
 		 * If clause must wait till after some lower-security-level
@@ -236,10 +305,23 @@ TidQualFromBaseRestrictinfo(RelOptInfo *rel)
 		if (!restriction_is_securely_promotable(rinfo, rel))
 			continue;
 
-		rlst = TidQualFromExpr((Node *) rinfo->clause, rel->relid);
+		/* If this clause contains a range qual, add it to the list. */
+		if (is_opclause(clause) && IsTidRangeClause((OpExpr *) clause, rel->relid))
+		{
+			found_quals = lappend(found_quals, clause);
+			continue;
+		}
+
+		/* Look for other TID quals. */
+		rlst = TidQualFromExpr((Node *) clause, rel->relid);
 		if (rlst)
 			break;
 	}
+
+	/* Use a range qual if any were found. */
+	if (found_quals)
+		rlst = MakeTidRangeQuals(found_quals);
+
 	return rlst;
 }
 
@@ -264,6 +346,7 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 
 	tidquals = TidQualFromBaseRestrictinfo(rel);
 
+	/* If there are tidquals, then it's worth generating a tidscan path. */
 	if (tidquals)
 		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals,
 												   required_outer));
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index da7a920..e2c0bce 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -3081,14 +3081,37 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	}
 
 	/*
-	 * Remove any clauses that are TID quals.  This is a bit tricky since the
-	 * tidquals list has implicit OR semantics.
+	 * Remove the tidquals from the scan clauses if possible, which is
+	 * generally if the tidquals were taken verbatim from any of the
+	 * RelOptInfo items.  If the tidquals don't represent the entire
+	 * RelOptInfo qual, then nothing will be removed.  Note that the tidquals
+	 * is a list; if there is more than one, we have to rebuild the equivalent
+	 * OR clause to find a match.
 	 */
 	ortidquals = tidquals;
 	if (list_length(ortidquals) > 1)
 		ortidquals = list_make1(make_orclause(ortidquals));
 	scan_clauses = list_difference(scan_clauses, ortidquals);
 
+	/*
+	 * In the case of a single compound qual such as "ctid > ? AND ...", the
+	 * various parts may have come from different RestrictInfos.  So remove
+	 * each part separately.  (This doesn't happen for multiple compound
+	 * quals, because the top-level OR clause can't be split over multiple
+	 * RestrictInfos.
+	 */
+	if (list_length(tidquals) == 1)
+	{
+		Node	   *qual = linitial(tidquals);
+
+		if (and_clause(qual))
+		{
+			BoolExpr   *and_qual = ((BoolExpr *) qual);
+
+			scan_clauses = list_difference(scan_clauses, and_qual->args);
+		}
+	}
+
 	scan_plan = make_tidscan(tlist,
 							 scan_clauses,
 							 scan_relid,
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index ce23c2f..7476916 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -156,15 +156,15 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index ff93910..47c2257 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1492,15 +1492,18 @@ typedef struct BitmapHeapScanState
 	ParallelBitmapHeapState *pstate;
 } BitmapHeapScanState;
 
+typedef struct TidRange TidRange;
+
 /* ----------------
  *	 TidScanState information
  *
- *		tidexprs	   list of TidExpr structs (see nodeTidscan.c)
- *		isCurrentOf    scan has a CurrentOfExpr qual
- *		NumTids		   number of tids in this scan
- *		TidPtr		   index of currently fetched tid
- *		TidList		   evaluated item pointers (array of size NumTids)
- *		htup		   currently-fetched tuple, if any
+ *		tidexprs		list of TidExpr structs (see nodeTidscan.c)
+ *		isCurrentOf		scan has a CurrentOfExpr qual
+ *		NumTidRanges	number of tid ranges in this scan
+ *		TidRangePtr		index of current tid range
+ *		TidRanges		evaluated item pointers (array of size NumTids)
+ *		inScan			currently in a range scan
+ *		htup			currently-fetched tuple, if any
  * ----------------
  */
 typedef struct TidScanState
@@ -1508,10 +1511,11 @@ typedef struct TidScanState
 	ScanState	ss;				/* its first field is NodeTag */
 	List	   *tss_tidexprs;
 	bool		tss_isCurrentOf;
-	int			tss_NumTids;
-	int			tss_TidPtr;
-	ItemPointerData *tss_TidList;
-	HeapTupleData tss_htup;
+	int			tss_NumTidRanges;
+	int			tss_TidRangePtr;
+	TidRange   *tss_TidRanges;
+	bool		tss_inScan;		/* for range scans */
+	HeapTupleData tss_htup;		/* for current-of and single TID fetches */
 } TidScanState;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6fd2420..895849f 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1228,14 +1228,21 @@ typedef struct BitmapOrPath
 /*
  * TidPath represents a scan by TID
  *
- * tidquals is an implicitly OR'ed list of qual expressions of the form
- * "CTID = pseudoconstant" or "CTID = ANY(pseudoconstant_array)".
+ * tidquals is an implicitly OR'ed list of qual expressions of the forms:
+ *   - "CTID = pseudoconstant"
+ *   - "CTID = ANY(pseudoconstant_array)"
+ *   - "CURRENT OF cursor"
+ *   - "(CTID relop pseudoconstant AND ...)"
+ *
+ * If tidquals is empty, all CTIDs will match (contrary to the usual meaning
+ * of an empty disjunction).
+ *
  * Note they are bare expressions, not RestrictInfos.
  */
 typedef struct TidPath
 {
 	Path		path;
-	List	   *tidquals;		/* qual(s) involving CTID = something */
+	List	   *tidquals;
 } TidPath;
 
 /*
diff --git a/src/test/regress/expected/tidscan.out b/src/test/regress/expected/tidscan.out
index 521ed1b..8083909 100644
--- a/src/test/regress/expected/tidscan.out
+++ b/src/test/regress/expected/tidscan.out
@@ -177,3 +177,253 @@ UPDATE tidscan SET id = -id WHERE CURRENT OF c RETURNING *;
 ERROR:  cursor "c" is not positioned on a row
 ROLLBACK;
 DROP TABLE tidscan;
+-- tests for tidrangescans
+CREATE TABLE tidrangescan(id integer, data text);
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid <= '(1,5)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+ (1,1)
+ (1,2)
+ (1,3)
+ (1,4)
+ (1,5)
+(15 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid < '(0,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid > '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+  ctid  
+--------
+ (9,9)
+ (9,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ('(9,8)'::tid < ctid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+  ctid  
+--------
+ (9,9)
+ (9,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid >= '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+  ctid  
+--------
+ (9,8)
+ (9,9)
+ (9,10)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid >= '(100,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ((ctid > '(4,4)'::tid) AND ('(4,7)'::tid >= ctid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+ ctid  
+-------
+ (4,5)
+ (4,6)
+ (4,7)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+ ctid  
+-------
+ (4,5)
+ (4,6)
+ (4,7)
+(3 rows)
+
+-- combinations
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+                                        QUERY PLAN                                         
+-------------------------------------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ((('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid)) OR (ctid = '(2,2)'::tid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+ ctid  
+-------
+ (2,2)
+ (4,5)
+ (4,6)
+ (4,7)
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+                                                     QUERY PLAN                                                     
+--------------------------------------------------------------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ((('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid)) OR (ctid = '(2,2)'::tid))
+   Filter: ((('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid)) OR ((ctid = '(2,2)'::tid) AND (data = 'foo'::text)))
+(3 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+ ctid  
+-------
+ (4,5)
+ (4,6)
+ (4,7)
+(3 rows)
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan where ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan where ctid < '(0,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+-- make sure ranges are combined correctly
+SELECT COUNT(*) FROM tidrangescan WHERE ctid < '(0,3)' OR ctid >= '(0,2)' AND ctid <= '(0,5)';
+ count 
+-------
+     5
+(1 row)
+
+SELECT COUNT(*) FROM tidrangescan WHERE ctid <= '(0,10)' OR ctid >= '(0,2)' AND ctid <= '(0,5)';
+ count 
+-------
+    10
+(1 row)
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan_empty
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+ ctid 
+------
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan_empty
+   TID Cond: (ctid > '(9,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+ ctid 
+------
+(0 rows)
+
diff --git a/src/test/regress/sql/tidscan.sql b/src/test/regress/sql/tidscan.sql
index a8472e0..02b094a 100644
--- a/src/test/regress/sql/tidscan.sql
+++ b/src/test/regress/sql/tidscan.sql
@@ -64,3 +64,79 @@ UPDATE tidscan SET id = -id WHERE CURRENT OF c RETURNING *;
 ROLLBACK;
 
 DROP TABLE tidscan;
+
+-- tests for tidrangescans
+
+CREATE TABLE tidrangescan(id integer, data text);
+
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+
+-- combinations
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan where ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+SELECT ctid FROM tidrangescan where ctid < '(0,0)' LIMIT 1;
+
+-- make sure ranges are combined correctly
+SELECT COUNT(*) FROM tidrangescan WHERE ctid < '(0,3)' OR ctid >= '(0,2)' AND ctid <= '(0,5)';
+
+SELECT COUNT(*) FROM tidrangescan WHERE ctid <= '(0,10)' OR ctid >= '(0,2)' AND ctid <= '(0,5)';
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
-- 
2.7.4

v4-0004-Tid-Scan-results-are-ordered.patchapplication/octet-stream; name=v4-0004-Tid-Scan-results-are-ordered.patchDownload

From 3e5e92eaaaa154c071fb8615d930f1cb7b82f0ed Mon Sep 17 00:00:00 2001
From: Edmund Horner <ejrh00@gmail.com>
Date: Fri, 12 Oct 2018 16:29:44 +1300
Subject: [PATCH 4/4] Tid Scan results are ordered

The planner now knows that the results of a Tid path are ordered by ctid, so
queries that rely on that order no longer need a separate sort.  This improves
cases such as "ORDER BY ctid ASC/DESC", as well as "SELECT MIN(ctid)/MAX(ctid)".
Tid Scans can now be Backward.
---
 src/backend/commands/explain.c          |  36 +++++++---
 src/backend/executor/nodeTidscan.c      |   9 +++
 src/backend/nodes/copyfuncs.c           |   1 +
 src/backend/nodes/outfuncs.c            |   2 +
 src/backend/nodes/readfuncs.c           |   1 +
 src/backend/optimizer/path/pathkeys.c   |  19 +++++
 src/backend/optimizer/path/tidpath.c    |  39 ++++++++--
 src/backend/optimizer/plan/createplan.c |   9 ++-
 src/backend/optimizer/util/pathnode.c   |   4 +-
 src/include/nodes/plannodes.h           |   1 +
 src/include/nodes/relation.h            |   1 +
 src/include/optimizer/pathnode.h        |   3 +-
 src/include/optimizer/paths.h           |   3 +
 src/test/regress/expected/tidscan.out   | 123 +++++++++++++++++++++++++++++++-
 src/test/regress/sql/tidscan.sql        |  39 +++++++++-
 15 files changed, 268 insertions(+), 22 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 888d994..5a4305d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -111,6 +111,7 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
 static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static void show_scan_direction(ExplainState *es, ScanDirection direction);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
 static void ExplainScanTarget(Scan *plan, ExplainState *es);
@@ -1270,7 +1271,6 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SeqScan:
 		case T_SampleScan:
 		case T_BitmapHeapScan:
-		case T_TidScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1279,6 +1279,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_WorkTableScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_TidScan:
+			show_scan_direction(es, ((TidScan *) plan)->direction);
+			ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_ForeignScan:
 		case T_CustomScan:
 			if (((Scan *) plan)->scanrelid > 0)
@@ -2892,25 +2896,21 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
 }
 
 /*
- * Add some additional details about an IndexScan or IndexOnlyScan
+ * Show the direction of a scan.
  */
 static void
-ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
-						ExplainState *es)
+show_scan_direction(ExplainState *es, ScanDirection direction)
 {
-	const char *indexname = explain_get_index_name(indexid);
-
 	if (es->format == EXPLAIN_FORMAT_TEXT)
 	{
-		if (ScanDirectionIsBackward(indexorderdir))
+		if (ScanDirectionIsBackward(direction))
 			appendStringInfoString(es->str, " Backward");
-		appendStringInfo(es->str, " using %s", indexname);
 	}
 	else
 	{
 		const char *scandir;
 
-		switch (indexorderdir)
+		switch (direction)
 		{
 			case BackwardScanDirection:
 				scandir = "Backward";
@@ -2926,11 +2926,27 @@ ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 				break;
 		}
 		ExplainPropertyText("Scan Direction", scandir, es);
-		ExplainPropertyText("Index Name", indexname, es);
 	}
 }
 
 /*
+ * Add some additional details about an IndexScan or IndexOnlyScan
+ */
+static void
+ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
+						ExplainState *es)
+{
+	const char *indexname = explain_get_index_name(indexid);
+
+	show_scan_direction(es, indexorderdir);
+
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+		appendStringInfo(es->str, " using %s", indexname);
+	else
+		ExplainPropertyText("Index Name", indexname, es);
+}
+
+/*
  * Show the target of a Scan node
  */
 static void
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 3897b97..f7e78f0 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -738,6 +738,15 @@ TidNext(TidScanState *node)
 
 	numRanges = node->tss_NumTidRanges;
 
+	/* If the plan direction is backward, invert the direction. */
+	if (ScanDirectionIsBackward(((TidScan *) node->ss.ps.plan)->direction))
+	{
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
+	}
+
 	tuple = NULL;
 	for (;;)
 	{
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index cab02a4..2150a7a 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -580,6 +580,7 @@ _copyTidScan(const TidScan *from)
 	 * copy remainder of node
 	 */
 	COPY_NODE_FIELD(tidquals);
+	COPY_SCALAR_FIELD(direction);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 647665a..a22eb6b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -616,6 +616,7 @@ _outTidScan(StringInfo str, const TidScan *node)
 	_outScanInfo(str, (const Scan *) node);
 
 	WRITE_NODE_FIELD(tidquals);
+	WRITE_ENUM_FIELD(direction, ScanDirection);
 }
 
 static void
@@ -1892,6 +1893,7 @@ _outTidPath(StringInfo str, const TidPath *node)
 	_outPathInfo(str, (const Path *) node);
 
 	WRITE_NODE_FIELD(tidquals);
+	WRITE_ENUM_FIELD(direction, ScanDirection);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index e117867..f936ef1 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1846,6 +1846,7 @@ _readTidScan(void)
 	ReadCommonScan(&local_node->scan);
 
 	READ_NODE_FIELD(tidquals);
+	READ_ENUM_FIELD(direction, ScanDirection);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index ec66cb9..b847151 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -18,6 +18,9 @@
 #include "postgres.h"
 
 #include "access/stratnum.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/plannodes.h"
@@ -848,6 +851,22 @@ build_join_pathkeys(PlannerInfo *root,
 	return truncate_useless_pathkeys(root, joinrel, outer_pathkeys);
 }
 
+/*
+ * build_tidscan_pathkeys
+ *	  Build the path keys corresponding to ORDER BY ctid ASC|DESC.
+ */
+List *
+build_tidscan_pathkeys(PlannerInfo *root,
+					   RelOptInfo *rel,
+					   ScanDirection direction)
+{
+	int			opno = (direction == ForwardScanDirection) ? TIDLessOperator : TIDGreaterOperator;
+	Var		   *varexpr = makeVar(rel->relid, SelfItemPointerAttributeNumber, TIDOID, -1, InvalidOid, 0);
+	List	   *pathkeys = build_expression_pathkey(root, (Expr *) varexpr, NULL, opno, rel->relids, true);
+
+	return pathkeys;
+}
+
 /****************************************************************************
  *		PATHKEYS AND SORT CLAUSES
  ****************************************************************************/
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 9005249..2362193 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -329,12 +329,16 @@ TidQualFromBaseRestrictinfo(RelOptInfo *rel)
  * create_tidscan_paths
  *	  Create paths corresponding to direct TID scans of the given rel.
  *
+ *	  Path keys and direction will be set on the scans if it looks useful.
+ *
  *	  Candidate paths are added to the rel's pathlist (using add_path).
  */
 void
 create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 {
 	Relids		required_outer;
+	List	   *pathkeys = NULL;
+	ScanDirection direction = ForwardScanDirection;
 	List	   *tidquals;
 
 	/*
@@ -344,10 +348,37 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 	 */
 	required_outer = rel->lateral_relids;
 
+	/*
+	 * Try to determine the best scan direction and create some useful
+	 * pathkeys.
+	 */
+	if (has_useful_pathkeys(root, rel))
+	{
+		/*
+		 * Build path keys corresponding to ORDER BY ctid ASC, and check
+		 * whether they will be useful for this scan.  If not, build path keys
+		 * for DESC, and try that; set the direction to BackwardScanDirection
+		 * if so.  If neither of them will be useful, no path keys will be
+		 * set.
+		 */
+		pathkeys = build_tidscan_pathkeys(root, rel, ForwardScanDirection);
+		if (!pathkeys_contained_in(pathkeys, root->query_pathkeys))
+		{
+			pathkeys = build_tidscan_pathkeys(root, rel, BackwardScanDirection);
+			if (pathkeys_contained_in(pathkeys, root->query_pathkeys))
+				direction = BackwardScanDirection;
+			else
+				pathkeys = NULL;
+		}
+	}
+
 	tidquals = TidQualFromBaseRestrictinfo(rel);
 
-	/* If there are tidquals, then it's worth generating a tidscan path. */
-	if (tidquals)
-		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals,
-												   required_outer));
+	/*
+	 * If there are tidquals or some useful pathkeys were found, then it's
+	 * worth generating a tidscan path.
+	 */
+	if (tidquals || pathkeys)
+		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals, pathkeys,
+												   direction, required_outer));
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index e2c0bce..4fc8aef 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -184,7 +184,7 @@ static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 					 List *bitmapqualorig,
 					 Index scanrelid);
 static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
-			 List *tidquals);
+			 List *tidquals, ScanDirection direction);
 static SubqueryScan *make_subqueryscan(List *qptlist,
 				  List *qpqual,
 				  Index scanrelid,
@@ -3115,7 +3115,8 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	scan_plan = make_tidscan(tlist,
 							 scan_clauses,
 							 scan_relid,
-							 tidquals);
+							 tidquals,
+							 best_path->direction);
 
 	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
 
@@ -5176,7 +5177,8 @@ static TidScan *
 make_tidscan(List *qptlist,
 			 List *qpqual,
 			 Index scanrelid,
-			 List *tidquals)
+			 List *tidquals,
+			 ScanDirection direction)
 {
 	TidScan    *node = makeNode(TidScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5187,6 +5189,7 @@ make_tidscan(List *qptlist,
 	plan->righttree = NULL;
 	node->scan.scanrelid = scanrelid;
 	node->tidquals = tidquals;
+	node->direction = direction;
 
 	return node;
 }
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d50d86b..31645c4 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1186,6 +1186,7 @@ create_bitmap_or_path(PlannerInfo *root,
  */
 TidPath *
 create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
+					List *pathkeys, ScanDirection direction,
 					Relids required_outer)
 {
 	TidPath    *pathnode = makeNode(TidPath);
@@ -1198,9 +1199,10 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 	pathnode->path.parallel_aware = false;
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = 0;
-	pathnode->path.pathkeys = NIL;	/* always unordered */
+	pathnode->path.pathkeys = pathkeys;
 
 	pathnode->tidquals = tidquals;
+	pathnode->direction = direction;
 
 	cost_tidscan(&pathnode->path, root, rel, tidquals,
 				 pathnode->path.param_info);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 26e1c40..201d315 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -485,6 +485,7 @@ typedef struct TidScan
 {
 	Scan		scan;
 	List	   *tidquals;		/* qual(s) involving CTID = something */
+	ScanDirection direction;
 } TidScan;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 895849f..02210d3 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1243,6 +1243,7 @@ typedef struct TidPath
 {
 	Path		path;
 	List	   *tidquals;
+	ScanDirection direction;
 } TidPath;
 
 /*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 81abcf5..d6dda47 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,7 +63,8 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  RelOptInfo *rel,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
-					List *tidquals, Relids required_outer);
+					List *tidquals, List *pathkeys, ScanDirection direction,
+					Relids required_outer);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 				   List *subpaths, List *partial_subpaths,
 				   Relids required_outer,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde30..9d0699e 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -211,6 +211,9 @@ extern List *build_join_pathkeys(PlannerInfo *root,
 					RelOptInfo *joinrel,
 					JoinType jointype,
 					List *outer_pathkeys);
+extern List *build_tidscan_pathkeys(PlannerInfo *root,
+					   RelOptInfo *rel,
+					   ScanDirection direction);
 extern List *make_pathkeys_for_sortclauses(PlannerInfo *root,
 							  List *sortclauses,
 							  List *tlist);
diff --git a/src/test/regress/expected/tidscan.out b/src/test/regress/expected/tidscan.out
index 8083909..8dcbf99 100644
--- a/src/test/regress/expected/tidscan.out
+++ b/src/test/regress/expected/tidscan.out
@@ -176,7 +176,6 @@ EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
 UPDATE tidscan SET id = -id WHERE CURRENT OF c RETURNING *;
 ERROR:  cursor "c" is not positioned on a row
 ROLLBACK;
-DROP TABLE tidscan;
 -- tests for tidrangescans
 CREATE TABLE tidrangescan(id integer, data text);
 INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,1000) AS s(i);
@@ -427,3 +426,125 @@ SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
 ------
 (0 rows)
 
+-- check that ordering on a tidscan doesn't require a sort
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ Tid Scan on tidscan
+   TID Cond: (ctid = ANY ('{"(0,2)","(0,1)","(0,3)"}'::tid[]))
+(2 rows)
+
+SELECT ctid FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+ ctid  
+-------
+ (0,1)
+ (0,2)
+ (0,3)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ Tid Scan Backward on tidscan
+   TID Cond: (ctid = ANY ('{"(0,2)","(0,1)","(0,3)"}'::tid[]))
+(2 rows)
+
+SELECT ctid FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+ ctid  
+-------
+ (0,3)
+ (0,2)
+ (0,1)
+(3 rows)
+
+-- ordering with no quals should use tid range scan
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan ORDER BY ctid ASC;
+        QUERY PLAN        
+--------------------------
+ Tid Scan on tidrangescan
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan ORDER BY ctid DESC;
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan Backward on tidrangescan
+(1 row)
+
+-- min/max
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan;
+                 QUERY PLAN                 
+--------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan on tidrangescan
+                 Filter: (ctid IS NOT NULL)
+(5 rows)
+
+SELECT MIN(ctid) FROM tidrangescan;
+  min  
+-------
+ (0,1)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan Backward on tidrangescan
+                 Filter: (ctid IS NOT NULL)
+(5 rows)
+
+SELECT MAX(ctid) FROM tidrangescan;
+  max   
+--------
+ (9,10)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+                   QUERY PLAN                    
+-------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan on tidrangescan
+                 TID Cond: (ctid > '(5,0)'::tid)
+                 Filter: (ctid IS NOT NULL)
+(6 rows)
+
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+  min  
+-------
+ (5,1)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+                   QUERY PLAN                    
+-------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan Backward on tidrangescan
+                 TID Cond: (ctid < '(5,0)'::tid)
+                 Filter: (ctid IS NOT NULL)
+(6 rows)
+
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+  max   
+--------
+ (4,10)
+(1 row)
+
+-- clean up
+DROP TABLE tidscan;
+DROP TABLE tidrangescan;
diff --git a/src/test/regress/sql/tidscan.sql b/src/test/regress/sql/tidscan.sql
index 02b094a..8f437e8 100644
--- a/src/test/regress/sql/tidscan.sql
+++ b/src/test/regress/sql/tidscan.sql
@@ -63,8 +63,6 @@ EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
 UPDATE tidscan SET id = -id WHERE CURRENT OF c RETURNING *;
 ROLLBACK;
 
-DROP TABLE tidscan;
-
 -- tests for tidrangescans
 
 CREATE TABLE tidrangescan(id integer, data text);
@@ -140,3 +138,40 @@ SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
 EXPLAIN (COSTS OFF)
 SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
 SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+
+-- check that ordering on a tidscan doesn't require a sort
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+SELECT ctid FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+SELECT ctid FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+
+-- ordering with no quals should use tid range scan
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan ORDER BY ctid ASC;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan ORDER BY ctid DESC;
+
+-- min/max
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan;
+SELECT MIN(ctid) FROM tidrangescan;
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan;
+SELECT MAX(ctid) FROM tidrangescan;
+
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+
+-- clean up
+DROP TABLE tidscan;
+DROP TABLE tidrangescan;
-- 
2.7.4

#18

David Rowley

david.rowley@2ndquadrant.com

about 7 years ago

In reply to: Edmund Horner (#17)

1 attachment(s)

Re: Tid scan improvements

On Mon, 12 Nov 2018 at 17:35, Edmund Horner <ejrh00@gmail.com> wrote:

Hi, here's the new patch(s).

Mostly the same, but trying to address your comments from earlier as
well as clean up a few other things I noticed.

Thanks for making those changes.

I've now had a look over the latest patches and I've found a few more
things. Many of these are a bit nitpicky, but certainly not all. I
also reviewed 0004 this time.

0001:

1. The row estimates are not quite right. This cases the row
estimation to go the wrong way for isgt.

For example, the following gets 24 rows instead of 26.

postgres=# create table t (a int);
CREATE TABLE
postgres=# insert into t select generate_Series(1,100);
INSERT 0 100
postgres=# analyze t;
postgres=# explain analyze select * from t where ctid >= '(0,75)';
QUERY PLAN
---------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..2.25 rows=24 width=4) (actual
time=0.046..0.051 rows=26 loops=1)
Filter: (ctid >= '(0,75)'::tid)
Rows Removed by Filter: 74
Planning Time: 0.065 ms
Execution Time: 0.074 ms
(5 rows)

The < and <= case is not quite right either. < should have 1 fewer
tuple than the calculated average tuples per page, and <= should have
the same (assuming no gaps)

I've attached a small delta patch that I think improves things here.

0002:

2. You should test for a non-empty List with list != NIL

/*
* If no quals were specified, then a complete scan is assumed. Make a
* TidExpr with an empty list of TidOpExprs.
*/
if (!node->tidquals)

Also, can you not just return after that if test? I think the code
would be easier to read with it like that.

3. I'd rather see EnsureTidRangeSpace() keep doubling the size of the
allocation until it reaches the required size. See how
MakeSharedInvalidMessagesArray() does it. Doing it this way ensures
we always have a power of two sized array which is much nicer if we
ever reach the palloc() limit as if the array is sized at the palloc()
limit / 2 + 1, then if we try to double it'll fail. Of course, it's
unlikely to be a problem here, but... the question would be how to
decide on the initial size.

4. "at" needs shifted left a couple of words

/*
* If the lower bound was already or above at the maximum block
* number, then there is no valid range.
*/

but I don't see how it could be "or above". The ctid type does not
have the room for that. Although, that's not to say you should test if
(block == MaxBlockNumber), the >= seems better for the code. I'm just
complaining about the comment.

5. TidInArrayExprEval() lacks a header comment, and any other comments
to mention what it does. The function args also push over the 80 char
line length. There's also a few other functions in nodeTidscan.c that
are missing a header comment.

6. In MergeTidRanges(), you have:

ItemPointerData a_last = a->last;
ItemPointerData b_last;

if (!ItemPointerIsValid(&a_last))
a_last = a->first;

but I don't see anywhere you're setting ->last to an invalid item
pointer. Is this left over from a previous design of the range scan?
It looks like in TidExprEval() you're setting the upperbound to the
last page on the relation.

7. "fist" -> "first"

* If the tuple is in the fist block of the range and before the first

8. tss_TidRangePtr is a pretty confusingly named field.

if (node->tss_TidRangePtr >= numRanges || node->tss_TidRangePtr < 0)
break;

I'd expect anything with ptr in it to be a pointer, but this seems to
be an array index. Maybe "idx" is better than "ptr", or take note from
nodeAppend.c and have something like "tts_whichRange".

UPDATE: I see you've just altered what's there already. Perhaps it's
okay to leave it as you have it, but it's still not ideal.

9. This comment seems to indicate that a range can only have one
bound, but that does not seem to be the case.

* Ranges with only one item -- including one resulting from a
* CURRENT-OF qual -- are handled by looking up the item directly.

It seems open bounded ranges just have the lowest or highest possible
value for a ctid on the open side.

Perhaps the comment could be written as:

/*
* For ranges containing a single tuple, we can simply make an
* attempt to fetch the tuple directly.
*/

10. In cost_tidscan() I think you should ceil() the following:

double pages = selectivity * baserel->pages;

Otherwise, you'll end up partially charging a seq_page_cost, which
seems pretty invalid since you can't partially read a page.

11. In the comment:

/* TODO decide what the costs should be */

I think you can just explain why you're charging 1 random_page_cost
and the remainder in seq_page_cost. Or is there something left to do
here that I've forgotten about?

12. expected_comparison_operator is a bit long a name:

IsTidComparison(OpExpr *node, int varno, Oid expected_comparison_operator)

How about just expected_opno?

13. !rlst -> rlst != NIL

/* if no range qual was found, look for any other TID qual */
if (!rlst)

(Yeah I know there's various cases where it's done incorrectly there
already :-( )

14. This is not great:

#define IsTidEqualClause(node, varno) IsTidComparison(node, varno,
TIDEqualOperator)
#define IsTidLTClause(node, varno) IsTidComparison(node, varno, TIDLessOperator)
#define IsTidLEClause(node, varno) IsTidComparison(node, varno,
TIDLessEqOperator)
#define IsTidGTClause(node, varno) IsTidComparison(node, varno,
TIDGreaterOperator)
#define IsTidGEClause(node, varno) IsTidComparison(node, varno,
TIDGreaterEqOperator)

#define IsTidRangeClause(node, varno) (IsTidLTClause(node, varno) || \
IsTidLEClause(node, varno) || \
IsTidGTClause(node, varno) || \
IsTidGEClause(node, varno))

The 4 macros for >, >=, < and <= are only used by IsTidRangeClause()
which means IsTidComparison() could get called up to 4 times. Most of
the work it does would be redundant in that case. Maybe it's better
to rethink that?

15. There's no field named NumTids:

* TidRanges evaluated item pointers (array of size NumTids)

0003:

16. I think the following comment needs to be updated:

/* start from last page of the scan */

to:

/* When scanning the whole relation, start from the last page of the scan */

and drop:

/* Scanning the full relation: start just before start block. */

then maybe change:

/* Scanning a restricted range: start at end of range. */

/* Otherwise, if scanning just a subset of the relation, start at the
final block in the range */

0004:

17. Can you make a few changed to build_tidscan_pathkeys():

a. build_index_pathkeys() uses ScanDirectionIsBackward(scandir), can
you set the opno based on that rather than doing "direction ==
ForwardScanDirection"
b. varexpr can be an Expr and just be named expr. Please move the
declaration and assignment out onto separate lines and wrap the long
line.
c. wrap long line with the call to build_expression_pathkey(). Get rid
of the (Expr *) cast.

18. I'd expect the following not to produce a sort above the Tid Scan.

postgres=# set enable_seqscan=0;
SET
postgres=# explain select * from t inner join t t1 on t.ctid = t1.ctid
where t.ctid < '(0,10)' ;
QUERY PLAN
---------------------------------------------------------------------------------------
Merge Join (cost=10000000008.65..10000000009.28 rows=9 width=8)
Merge Cond: (t.ctid = t1.ctid)
-> Sort (cost=3.33..3.35 rows=9 width=10)
Sort Key: t.ctid
-> Tid Scan on t (cost=0.00..3.18 rows=9 width=10)
TID Cond: (ctid < '(0,10)'::tid)
-> Sort (cost=10000000005.32..10000000005.57 rows=100 width=10)
Sort Key: t1.ctid
-> Seq Scan on t t1 (cost=10000000000.00..10000000002.00
rows=100 width=10)
(9 rows)

On looking at why the planner did this, I see it's down to how you've
coded create_tidscan_paths(). You're creating a tidpath if there's any
quals or any useful pathkeys useful to the query's ORDER BY, but only
including the pathkeys if they're useful for the query's ORDER BY. I
think it'll be better to include the forward pathkeys in all cases,
and just make it a backward Tid Scan if backward keys are useful for
the ORDER BY. There's still a problem with this as a Merge Join
would need a Sort if there was an ORDER BY ctid DESC for one relation
even if the other relation had some valid ctid quals since the 2nd
scan would create a forward Tid Scan. Maybe that's not worth worrying
about. The only fix I can imagine is to always create a forward and
backward Tid Scan path, which is pretty bad as it's two more paths
that likely won't get used 99.9% of the time.

This also caused me to notice the costs are pretty broken for this:

postgres=# explain select * from t order by ctid;
QUERY PLAN
---------------------------------------------------
Tid Scan on t (cost=0.00..0.00 rows=100 width=10)
(1 row)

19. Looks like the ScanDirection's normally get named "scandir":

static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
List *tidquals, ScanDirection direction);

Likewise for the various .h files you've added that as a new field to
various structs.

Setting back to waiting on author.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

fixes_for_v4_0001.diffapplication/octet-stream; name=fixes_for_v4_0001.diffDownload

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 5e025a7437..c6286f08ab 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -590,8 +590,10 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 			block = ItemPointerGetBlockNumberNoCheck(itemptr);
 
 			/*
-			 * If there's a useable density (tuples per page) estimate, take
-			 * into account the fraction of a block with a lower TID offset.
+			 * Determine the average number of tuples per page.  We naively
+			 * assume there will never be any dead tuples or empty space at
+			 * the start or in the middle of the page.  This is likely fine
+			 * for the purposes here.
 			 */
 			density = vardata->rel->tuples / vardata->rel->pages;
 			if (density > 0.0)
@@ -603,10 +605,17 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 
 			selec = block / (double) vardata->rel->pages;
 
-			/* For <= and >=, one extra item is included. */
-			if (iseq && vardata->rel->tuples >= 1.0)
-				selec += (1 / vardata->rel->tuples);
+			/*
+			 * We'll have one less tuple for "<" and one additional tuple for
+			 * ">=", the latter of which we'll reverse the selectivity for
+			 * below, so we can simply subtract a tuple here.  We can easily
+			 * detect these two cases by iseq being equal to isgt.  They'll
+			 * either both be true or both be false.
+			 */
+			if (iseq == isgt && vardata->rel->tuples >= 1.0)
+				selec -= (1 / vardata->rel->tuples);
 
+			/* Finally, reverse the selectivity for the ">", ">=" case. */
 			if (isgt)
 				selec = 1.0 - selec;

#19

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 7 years ago

In reply to: David Rowley (#18)

Re: Tid scan improvements

On 11/22/18 8:41 AM, David Rowley wrote:

...

3. I'd rather see EnsureTidRangeSpace() keep doubling the size of the
allocation until it reaches the required size. See how
MakeSharedInvalidMessagesArray() does it. Doing it this way ensures
we always have a power of two sized array which is much nicer if we
ever reach the palloc() limit as if the array is sized at the palloc()
limit / 2 + 1, then if we try to double it'll fail. Of course, it's
unlikely to be a problem here, but... the question would be how to
decide on the initial size.

I think it kinda tries to do that in some cases, by doing this:

*numAllocRanges *= 2;

...

tidRanges = (TidRange *)
repalloc(tidRanges,
*numAllocRanges * sizeof(TidRange));

The problem here is that what matters is not numAllocRanges being 2^N,
but the number of bytes allocated being 2^N. Because that's what ends up
in AllocSet, which keeps lists of 2^N chunks.

And as TidRange is 12B, so this is guaranteed to waste memory, because
no matter what the first factor is, the result will never be 2^N.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20

Edmund Horner

ejrh00@gmail.com

about 7 years ago

In reply to: Tomas Vondra (#19)

Re: Tid scan improvements

On Fri, 23 Nov 2018 at 07:03, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On 11/22/18 8:41 AM, David Rowley wrote:

...
3. I'd rather see EnsureTidRangeSpace() keep doubling the size of the
allocation until it reaches the required size. See how
MakeSharedInvalidMessagesArray() does it. Doing it this way ensures
we always have a power of two sized array which is much nicer if we
ever reach the palloc() limit as if the array is sized at the palloc()
limit / 2 + 1, then if we try to double it'll fail. Of course, it's
unlikely to be a problem here, but... the question would be how to
decide on the initial size.

I think it kinda tries to do that in some cases, by doing this:

*numAllocRanges *= 2;
...
tidRanges = (TidRange *)
repalloc(tidRanges,
*numAllocRanges * sizeof(TidRange));

The problem here is that what matters is not numAllocRanges being 2^N,
but the number of bytes allocated being 2^N. Because that's what ends up
in AllocSet, which keeps lists of 2^N chunks.

And as TidRange is 12B, so this is guaranteed to waste memory, because
no matter what the first factor is, the result will never be 2^N.

For simplicity, I think making it a strict doubling of capacity each
time is fine. That's what we see in numerous other places in the
backend code.

What we don't really see is intentionally setting the initial capacity
so that each subsequent capacity is close-to-but-not-exceeding a power
of 2 bytes. You can't really do that optimally if working in terms of
whole numbers of items that aren't each a power of 2 size. This step,
there may be 2/3 of an item spare; next step, we'll have a whole item
spare that we're not going to use. So we could keep track in terms of
bytes allocated, and then figure out how many items we can fit at the
current time.

In my opinion, such complexity is overkill for Tid scans.

Currently, we try to pick an initial size based on the number of
expressions. We assume each expression will yield one range, and
allow that a saop expression might require us to enlarge the array.

Again, for simplicity, we should scrap that and pick something like
floor(256/sizeof(TidRange)) = 21 items, with about 1.5% wastage.

#21

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 7 years ago

In reply to: Edmund Horner (#20)

Re: Tid scan improvements

On 11/24/18 1:56 AM, Edmund Horner wrote:

On Fri, 23 Nov 2018 at 07:03, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On 11/22/18 8:41 AM, David Rowley wrote:

...
3. I'd rather see EnsureTidRangeSpace() keep doubling the size of the
allocation until it reaches the required size. See how
MakeSharedInvalidMessagesArray() does it. Doing it this way ensures
we always have a power of two sized array which is much nicer if we
ever reach the palloc() limit as if the array is sized at the palloc()
limit / 2 + 1, then if we try to double it'll fail. Of course, it's
unlikely to be a problem here, but... the question would be how to
decide on the initial size.

I think it kinda tries to do that in some cases, by doing this:

*numAllocRanges *= 2;
...
tidRanges = (TidRange *)
repalloc(tidRanges,
*numAllocRanges * sizeof(TidRange));

The problem here is that what matters is not numAllocRanges being 2^N,
but the number of bytes allocated being 2^N. Because that's what ends up
in AllocSet, which keeps lists of 2^N chunks.

And as TidRange is 12B, so this is guaranteed to waste memory, because
no matter what the first factor is, the result will never be 2^N.

For simplicity, I think making it a strict doubling of capacity each
time is fine. That's what we see in numerous other places in the
backend code.

Sure.

What we don't really see is intentionally setting the initial capacity
so that each subsequent capacity is close-to-but-not-exceeding a power
of 2 bytes. You can't really do that optimally if working in terms of
whole numbers of items that aren't each a power of 2 size. This step,
there may be 2/3 of an item spare; next step, we'll have a whole item
spare that we're not going to use.

Ah, I missed the detail with setting initial size.

So we could keep track in terms of bytes allocated, and then figure
out how many items we can fit at the current time.

In my opinion, such complexity is overkill for Tid scans.

Currently, we try to pick an initial size based on the number of
expressions. We assume each expression will yield one range, and
allow that a saop expression might require us to enlarge the array.

Again, for simplicity, we should scrap that and pick something like
floor(256/sizeof(TidRange)) = 21 items, with about 1.5% wastage.

Probably. I don't think it'd be a lot of code to do the exact sizing,
but you're right 1.5% is close enough. As long as there is a comment
explaining the initial sizing, I'm fine with that.

If I could suggest one more thing, I'd define a struct combining the
array of ranges, numRanges and numAllocRangeslike:

typedef struct TidRanges
{
int numRanges;
int numAllocRanges;
TidRange ranges[FLEXIBLE_ARRAY_MEMBER];
} TidRanges;

and use that instead of the plain array. I find it easier to follow
compared to passing the various fields directly (sometimes as a value,
sometimes pointer to the value, etc.).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#22

Edmund Horner

ejrh00@gmail.com

about 7 years ago

In reply to: Tomas Vondra (#21)

Re: Tid scan improvements

On Sat, 24 Nov 2018 at 15:46, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On 11/24/18 1:56 AM, Edmund Horner wrote:

On Fri, 23 Nov 2018 at 07:03, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On 11/22/18 8:41 AM, David Rowley wrote:

...
3. I'd rather see EnsureTidRangeSpace() keep doubling the size of the
allocation until it reaches the required size. See how
MakeSharedInvalidMessagesArray() does it. Doing it this way ensures
we always have a power of two sized array which is much nicer if we
ever reach the palloc() limit as if the array is sized at the palloc()
limit / 2 + 1, then if we try to double it'll fail. Of course, it's
unlikely to be a problem here, but... the question would be how to
decide on the initial size.

I think it kinda tries to do that in some cases, by doing this:

*numAllocRanges *= 2;
...
tidRanges = (TidRange *)
repalloc(tidRanges,
*numAllocRanges * sizeof(TidRange));

The problem here is that what matters is not numAllocRanges being 2^N,
but the number of bytes allocated being 2^N. Because that's what ends up
in AllocSet, which keeps lists of 2^N chunks.

And as TidRange is 12B, so this is guaranteed to waste memory, because
no matter what the first factor is, the result will never be 2^N.

For simplicity, I think making it a strict doubling of capacity each
time is fine. That's what we see in numerous other places in the
backend code.

Sure.

What we don't really see is intentionally setting the initial capacity
so that each subsequent capacity is close-to-but-not-exceeding a power
of 2 bytes. You can't really do that optimally if working in terms of
whole numbers of items that aren't each a power of 2 size. This step,
there may be 2/3 of an item spare; next step, we'll have a whole item
spare that we're not going to use.

Ah, I missed the detail with setting initial size.

So we could keep track in terms of bytes allocated, and then figure
out how many items we can fit at the current time.

In my opinion, such complexity is overkill for Tid scans.

Currently, we try to pick an initial size based on the number of
expressions. We assume each expression will yield one range, and
allow that a saop expression might require us to enlarge the array.

Again, for simplicity, we should scrap that and pick something like
floor(256/sizeof(TidRange)) = 21 items, with about 1.5% wastage.

Probably. I don't think it'd be a lot of code to do the exact sizing,
but you're right 1.5% is close enough. As long as there is a comment
explaining the initial sizing, I'm fine with that.

If I could suggest one more thing, I'd define a struct combining the
array of ranges, numRanges and numAllocRangeslike:

typedef struct TidRanges
{
int numRanges;
int numAllocRanges;
TidRange ranges[FLEXIBLE_ARRAY_MEMBER];
} TidRanges;

and use that instead of the plain array. I find it easier to follow
compared to passing the various fields directly (sometimes as a value,
sometimes pointer to the value, etc.).

Ok, I've made rewritten it to use a struct:

typedef struct TidRangeArray {
TidRange *ranges;
int numRanges;
int numAllocated;
} TidRangeArray;

which is slightly different from the flexible array member version you
suggested. The TidRangeArray is allocated on the stack in the
function that builds it, and then ranges and numRanges are copied into
the TidScanState before the function returns.

Any particular pros/cons of this versus your approach? With yours, I
presume we'd have a pointer to TidRanges in TidScanState.

My other concern now is that EnsureTidRangeSpace needs a loop to
double the allocated size. Most such arrays in the backend only ever
grow by 1, so a single doubling is fine, but the TID scan one can grow
by an arbitrary number with a scalar array op, and it's nice to not
have to check the space for each individual item. Here's what I've
got.

void
EnsureTidRangeSpace(TidRangeArray *tidRangeArray, int numNewItems)
{
int requiredSpace = tidRangeArray->numRanges + numNewItems;
if (requiredSpace <= tidRangeArray->numAllocated)
return;

/* it's not safe to double the size unless we're less than half MAX_INT */
if (requiredSpace >= INT_MAX / 2)
tidRangeArray->numAllocated = requiredSpace;
else
while (tidRangeArray->numAllocated < requiredSpace)
tidRangeArray->numAllocated *= 2;

tidRangeArray->ranges = (TidRange *)
repalloc(tidRangeArray->ranges,
tidRangeArray->numAllocated * sizeof(TidRange));
}

If you're in danger of overflowing numAllocated with the number of
TIDs in your query, you're probably going to have other problems. But
I'd prefer to at least not get stuck in an infinite doubling loop.

Note that you don't need any single ScalarArrayOp to return a huge
result, because you can have multiple such ops in your query, and the
results for each all need to get put into the TidRangeArray before
de-duplication occurs.

What's a safe way to check that we're not trying to process too many items?

#23

Edmund Horner

ejrh00@gmail.com

about 7 years ago

In reply to: David Rowley (#18)

5 attachment(s)

Re: Tid scan improvements

On Thu, 22 Nov 2018 at 20:41, David Rowley <david.rowley@2ndquadrant.com> wrote:

I've now had a look over the latest patches and I've found a few more
things. Many of these are a bit nitpicky, but certainly not all. I
also reviewed 0004 this time.

Whew! A lot more things to look at.

I've tried to address most of what you've raised, and attach yet
another set of patches. There are are few things that I'm not settled
on, discussed below under Big Items.

CC'd Tomas, if he wants to check what I've done with the TidRange
array allocation.

***** Big Items *****

0001:

1. The row estimates are not quite right. This cases the row
estimation to go the wrong way for isgt.

For example, the following gets 24 rows instead of 26.

postgres=# create table t (a int);
CREATE TABLE
postgres=# insert into t select generate_Series(1,100);
INSERT 0 100
postgres=# analyze t;
postgres=# explain analyze select * from t where ctid >= '(0,75)';
QUERY PLAN
---------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..2.25 rows=24 width=4) (actual
time=0.046..0.051 rows=26 loops=1)
Filter: (ctid >= '(0,75)'::tid)
Rows Removed by Filter: 74
Planning Time: 0.065 ms
Execution Time: 0.074 ms
(5 rows)

The < and <= case is not quite right either. < should have 1 fewer
tuple than the calculated average tuples per page, and <= should have
the same (assuming no gaps)

I've attached a small delta patch that I think improves things here.

Thanks, I've incorporated your patch. I think the logic for iseq and
isgt makes sense now.

Since we only have the total number of tuples and the total number of
pages, and no real statistics, this might be the best we can
reasonably do. There's still a noticable rowcount error for the last
page, and slighter rowcount errors for other pages. We estimate
density = ntuples/npages for all pages; but in a densely populated
table, we'll average only half the number of tuples in the last page
as earlier pages.

I guess we *could* estimate density = ntuples/(npages - 0.5) for all
but the last page; and half that for the last. But that adds
complexity, and you'd still only get a good row count when the last
page was about half full.

I implemented this anyway, and it does improve row counts a bit. I'll
include it in the next patch set and you can take a look.

I also spent some time today agonising over how visiblity would affect
things, but did not come up with anything useful to add to our
formulas.

3. I'd rather see EnsureTidRangeSpace() keep doubling the size of the
allocation until it reaches the required size. See how
MakeSharedInvalidMessagesArray() does it. Doing it this way ensures
we always have a power of two sized array which is much nicer if we
ever reach the palloc() limit as if the array is sized at the palloc()
limit / 2 + 1, then if we try to double it'll fail. Of course, it's
unlikely to be a problem here, but... the question would be how to
decide on the initial size.

I've tried to change things that way, but we still need to deal with
excessive numbers of items.

I've defined a constant MaxTidRanges = MaxAllocSize/sizeof(TidRange),
and raise an error if the required size exceeds that.

4. "at" needs shifted left a couple of words

/*
* If the lower bound was already or above at the maximum block
* number, then there is no valid range.
*/

but I don't see how it could be "or above". The ctid type does not
have the room for that. Although, that's not to say you should test if
(block == MaxBlockNumber), the >= seems better for the code. I'm just
complaining about the comment.

We have to deal with TIDs entered by the user, which can include
invalid ones like (4294967295,0). MaxBlockNumber is 4294967294.

12. expected_comparison_operator is a bit long a name:

IsTidComparison(OpExpr *node, int varno, Oid expected_comparison_operator)

How about just expected_opno?

14. This is not great:

[horrible macros in tidpath.c]

The 4 macros for >, >=, < and <= are only used by IsTidRangeClause()
which means IsTidComparison() could get called up to 4 times. Most of
the work it does would be redundant in that case. Maybe it's better
to rethink that?

Yeah. I've rewritten all this as two functions, IsTidEqualClause and
IsTidRangeClause, which each check the opno, with a helper function
IsTidBinaryExpression that checks everything else.

18. I'd expect the following not to produce a sort above the Tid Scan.

postgres=# set enable_seqscan=0;
SET
postgres=# explain select * from t inner join t t1 on t.ctid = t1.ctid
where t.ctid < '(0,10)' ;
QUERY PLAN
---------------------------------------------------------------------------------------
Merge Join (cost=10000000008.65..10000000009.28 rows=9 width=8)
Merge Cond: (t.ctid = t1.ctid)
-> Sort (cost=3.33..3.35 rows=9 width=10)
Sort Key: t.ctid
-> Tid Scan on t (cost=0.00..3.18 rows=9 width=10)
TID Cond: (ctid < '(0,10)'::tid)
-> Sort (cost=10000000005.32..10000000005.57 rows=100 width=10)
Sort Key: t1.ctid
-> Seq Scan on t t1 (cost=10000000000.00..10000000002.00
rows=100 width=10)
(9 rows)

On looking at why the planner did this, I see it's down to how you've
coded create_tidscan_paths(). You're creating a tidpath if there's any
quals or any useful pathkeys useful to the query's ORDER BY, but only
including the pathkeys if they're useful for the query's ORDER BY. I
think it'll be better to include the forward pathkeys in all cases,
and just make it a backward Tid Scan if backward keys are useful for
the ORDER BY. There's still a problem with this as a Merge Join
would need a Sort if there was an ORDER BY ctid DESC for one relation
even if the other relation had some valid ctid quals since the 2nd
scan would create a forward Tid Scan. Maybe that's not worth worrying
about. The only fix I can imagine is to always create a forward and
backward Tid Scan path, which is pretty bad as it's two more paths
that likely won't get used 99.9% of the time.

Two paths seems excessive just to cater for these unlikely plans. We
don't provide any other support for joining on CTID.

But setting the path keys doesn't cost much, so we should do that.

This also caused me to notice the costs are pretty broken for this:

postgres=# explain select * from t order by ctid;
QUERY PLAN
---------------------------------------------------
Tid Scan on t (cost=0.00..0.00 rows=100 width=10)
(1 row)

Yeah -- a side effect of treating empty tidquals as a scan over the
whole table. I've added costing code for this case.

***** Smaller items *****

Compacted for brevity (hope you don't mind):

2. You should test for a non-empty List with list != NIL [...] Also, can you not just return after that if test? I think the code
would be easier to read with it like that.
5. TidInArrayExprEval() lacks a header comment [...]
6. In MergeTidRanges(), you have: [leftover code]
7. "fist" -> "first" [...]
8. tss_TidRangePtr is a pretty confusingly named field. [...]
9. This comment seems to indicate that a range can only have one bound, but that does not seem to be the case. [...]
10. In cost_tidscan() I think you should ceil() the following: [...]
11. In the comment: /* TODO decide what the costs should be */ [...]
13. !rlst -> rlst != NIL
15. There's no field named NumTids: [...]
16. I think the following comment needs to be updated: [heapam comments]
17. Can you make a few changed to build_tidscan_pathkeys(): [...]
19. Looks like the ScanDirection's normally get named "scandir": [...]

These are mostly trivial and I've generally gone with your recommendation.

Attachments:

v5-0001-Add-selectivity-and-nullness-estimates-for-CTID-syst.patchapplication/octet-stream; name=v5-0001-Add-selectivity-and-nullness-estimates-for-CTID-syst.patchDownload

From 57356662700a7cd080de06fb9676c121f3b8267a Mon Sep 17 00:00:00 2001
From: Edmund Horner <ejrh00@gmail.com>
Date: Fri, 12 Oct 2018 13:36:24 +1300
Subject: [PATCH 1/5] Add selectivity and nullness estimates for CTID system
 variables

Previously, estimates for ItemPointer range quals, such as "ctid <= '(5,7)'",
resorted to the default values of 0.33 for range selectivity, and 0.005 for
nullness, although there was special-case handling for equality quals like
"ctid = (5,7)", which used the appropriate selectivity for distinct items.

This change uses the relation size to estimate the selectivity of a range qual,
and also uses a nullness estimate of 0 for ctid, since it is never NULL.
---
 src/backend/utils/adt/selfuncs.c | 61 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index ffca0fe..f5a1ee0 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -581,6 +581,58 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 
 	if (!HeapTupleIsValid(vardata->statsTuple))
 	{
+		/*
+		 * There are no stats for system columns, but for CTID we can estimate
+		 * based on table size.
+		 */
+		if (vardata->var && IsA(vardata->var, Var) &&
+			((Var *) vardata->var)->varattno == SelfItemPointerAttributeNumber)
+		{
+			ItemPointer itemptr;
+			double		block;
+			double		density;
+
+			/* If the relation's empty, we're going to include all of it. */
+			if (vardata->rel->pages == 0)
+				return 1.0;
+
+			itemptr = (ItemPointer) DatumGetPointer(constval);
+			block = ItemPointerGetBlockNumberNoCheck(itemptr);
+
+			/*
+			 * Determine the average number of tuples per page.  We naively
+			 * assume there will never be any dead tuples or empty space at
+			 * the start or in the middle of the page.  This is likely fine
+			 * for the purposes here.
+			 */
+			density = vardata->rel->tuples / vardata->rel->pages;
+			if (density > 0.0)
+			{
+				OffsetNumber offset = ItemPointerGetOffsetNumberNoCheck(itemptr);
+
+				block += Min(offset / density, 1.0);
+			}
+
+			selec = block / (double) vardata->rel->pages;
+
+			/*
+			 * We'll have one less tuple for "<" and one additional tuple for
+			 * ">=", the latter of which we'll reverse the selectivity for
+			 * below, so we can simply subtract a tuple here.  We can easily
+			 * detect these two cases by iseq being equal to isgt.  They'll
+			 * either both be true or both be false.
+			 */
+			if (iseq == isgt && vardata->rel->tuples >= 1.0)
+				selec -= (1 / vardata->rel->tuples);
+
+			/* Finally, reverse the selectivity for the ">", ">=" case. */
+			if (isgt)
+				selec = 1.0 - selec;
+
+			CLAMP_PROBABILITY(selec);
+			return selec;
+		}
+
 		/* no stats available, so default result */
 		return DEFAULT_INEQ_SEL;
 	}
@@ -1795,6 +1847,15 @@ nulltestsel(PlannerInfo *root, NullTestType nulltesttype, Node *arg,
 				return (Selectivity) 0; /* keep compiler quiet */
 		}
 	}
+	else if (vardata.var && IsA(vardata.var, Var) &&
+			 ((Var *) vardata.var)->varattno == SelfItemPointerAttributeNumber)
+	{
+		/*
+		 * There are no stats for system columns, but we know CTID is never
+		 * NULL.
+		 */
+		selec = (nulltesttype == IS_NULL) ? 0.0 : 1.0;
+	}
 	else
 	{
 		/*
-- 
2.7.4

v5-0004-Tid-Scan-results-are-ordered.patchapplication/octet-stream; name=v5-0004-Tid-Scan-results-are-ordered.patchDownload

From 635d1d86732ee86baa545fe31c9b44bf73dfead2 Mon Sep 17 00:00:00 2001
From: Edmund Horner <ejrh00@gmail.com>
Date: Fri, 12 Oct 2018 16:29:44 +1300
Subject: [PATCH 4/5] Tid Scan results are ordered

The planner now knows that the results of a Tid path are ordered by ctid, so
queries that rely on that order no longer need a separate sort.  This improves
cases such as "ORDER BY ctid ASC/DESC", as well as "SELECT MIN(ctid)/MAX(ctid)".
Tid Scans can now be Backward.
---
 src/backend/commands/explain.c          |  48 ++++++++-----
 src/backend/executor/nodeTidscan.c      |  13 +++-
 src/backend/nodes/copyfuncs.c           |   1 +
 src/backend/nodes/outfuncs.c            |   2 +
 src/backend/nodes/readfuncs.c           |   1 +
 src/backend/optimizer/path/pathkeys.c   |  33 +++++++++
 src/backend/optimizer/path/tidpath.c    |  42 +++++++++--
 src/backend/optimizer/plan/createplan.c |   9 ++-
 src/backend/optimizer/util/pathnode.c   |   4 +-
 src/include/nodes/plannodes.h           |   1 +
 src/include/nodes/relation.h            |   1 +
 src/include/optimizer/pathnode.h        |   3 +-
 src/include/optimizer/paths.h           |   3 +
 src/test/regress/expected/tidscan.out   | 123 +++++++++++++++++++++++++++++++-
 src/test/regress/sql/tidscan.sql        |  39 +++++++++-
 15 files changed, 292 insertions(+), 31 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c0d0168..73ff03b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -111,6 +111,7 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
 static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static void show_scan_direction(ExplainState *es, ScanDirection scandir);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
 static void ExplainScanTarget(Scan *plan, ExplainState *es);
@@ -1271,7 +1272,6 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SeqScan:
 		case T_SampleScan:
 		case T_BitmapHeapScan:
-		case T_TidScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1280,6 +1280,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_WorkTableScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_TidScan:
+			show_scan_direction(es, ((TidScan *) plan)->scandir);
+			ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_ForeignScan:
 		case T_CustomScan:
 			if (((Scan *) plan)->scanrelid > 0)
@@ -2893,45 +2897,57 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
 }
 
 /*
- * Add some additional details about an IndexScan or IndexOnlyScan
+ * Show the direction of a scan.
  */
 static void
-ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
-						ExplainState *es)
+show_scan_direction(ExplainState *es, ScanDirection scandir)
 {
-	const char *indexname = explain_get_index_name(indexid);
-
 	if (es->format == EXPLAIN_FORMAT_TEXT)
 	{
-		if (ScanDirectionIsBackward(indexorderdir))
+		if (ScanDirectionIsBackward(scandir))
 			appendStringInfoString(es->str, " Backward");
-		appendStringInfo(es->str, " using %s", indexname);
 	}
 	else
 	{
-		const char *scandir;
+		const char *scandirstr;
 
-		switch (indexorderdir)
+		switch (scandir)
 		{
 			case BackwardScanDirection:
-				scandir = "Backward";
+				scandirstr = "Backward";
 				break;
 			case NoMovementScanDirection:
-				scandir = "NoMovement";
+				scandirstr = "NoMovement";
 				break;
 			case ForwardScanDirection:
-				scandir = "Forward";
+				scandirstr = "Forward";
 				break;
 			default:
-				scandir = "???";
+				scandirstr = "???";
 				break;
 		}
-		ExplainPropertyText("Scan Direction", scandir, es);
-		ExplainPropertyText("Index Name", indexname, es);
+		ExplainPropertyText("Scan Direction", scandirstr, es);
 	}
 }
 
 /*
+ * Add some additional details about an IndexScan or IndexOnlyScan
+ */
+static void
+ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
+						ExplainState *es)
+{
+	const char *indexname = explain_get_index_name(indexid);
+
+	show_scan_direction(es, indexorderdir);
+
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+		appendStringInfo(es->str, " using %s", indexname);
+	else
+		ExplainPropertyText("Index Name", indexname, es);
+}
+
+/*
  * Show the target of a Scan node
  */
 static void
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index a3b5970..4f05938 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -816,6 +816,15 @@ TidNext(TidScanState *node)
 
 	numRanges = node->tss_NumTidRanges;
 
+	/* If the plan direction is backward, invert the direction. */
+	if (ScanDirectionIsBackward(((TidScan *) node->ss.ps.plan)->scandir))
+	{
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
+	}
+
 	tuple = NULL;
 	for (;;)
 	{
@@ -824,9 +833,7 @@ TidNext(TidScanState *node)
 		if (!node->tss_inScan)
 		{
 			/* Initialize or advance scan position, depending on direction. */
-			bool		bBackward = ScanDirectionIsBackward(direction);
-
-			if (bBackward)
+			if (ScanDirectionIsBackward(direction))
 			{
 				if (node->tss_CurrentTidRange < 0)
 				{
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index cab02a4..8c05bdd 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -580,6 +580,7 @@ _copyTidScan(const TidScan *from)
 	 * copy remainder of node
 	 */
 	COPY_NODE_FIELD(tidquals);
+	COPY_SCALAR_FIELD(scandir);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 647665a..8295a09 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -616,6 +616,7 @@ _outTidScan(StringInfo str, const TidScan *node)
 	_outScanInfo(str, (const Scan *) node);
 
 	WRITE_NODE_FIELD(tidquals);
+	WRITE_ENUM_FIELD(scandir, ScanDirection);
 }
 
 static void
@@ -1892,6 +1893,7 @@ _outTidPath(StringInfo str, const TidPath *node)
 	_outPathInfo(str, (const Path *) node);
 
 	WRITE_NODE_FIELD(tidquals);
+	WRITE_ENUM_FIELD(scandir, ScanDirection);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index e117867..5bd01d5 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1846,6 +1846,7 @@ _readTidScan(void)
 	ReadCommonScan(&local_node->scan);
 
 	READ_NODE_FIELD(tidquals);
+	READ_ENUM_FIELD(scandir, ScanDirection);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index ec66cb9..b08830f 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -18,6 +18,9 @@
 #include "postgres.h"
 
 #include "access/stratnum.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/plannodes.h"
@@ -848,6 +851,36 @@ build_join_pathkeys(PlannerInfo *root,
 	return truncate_useless_pathkeys(root, joinrel, outer_pathkeys);
 }
 
+/*
+ * build_tidscan_pathkeys
+ *	  Build the path keys corresponding to ORDER BY ctid ASC|DESC.
+ */
+List *
+build_tidscan_pathkeys(PlannerInfo *root,
+					   RelOptInfo *rel,
+					   ScanDirection scandir)
+{
+	int			opno;
+	Expr	   *expr;
+	List	   *pathkeys;
+
+	opno = ScanDirectionIsForward(scandir) ? TIDLessOperator : TIDGreaterOperator;
+	expr = (Expr *) makeVar(rel->relid,
+							SelfItemPointerAttributeNumber,
+							TIDOID,
+							-1,
+							InvalidOid,
+							0);
+	pathkeys = build_expression_pathkey(root,
+										expr,
+										NULL,
+										opno,
+										rel->relids,
+										true);
+
+	return pathkeys;
+}
+
 /****************************************************************************
  *		PATHKEYS AND SORT CLAUSES
  ****************************************************************************/
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 3290294..38c58fa 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -357,12 +357,17 @@ TidQualFromBaseRestrictinfo(RelOptInfo *rel)
  * create_tidscan_paths
  *	  Create paths corresponding to direct TID scans of the given rel.
  *
+ *	  Path keys will be set to "CTID ASC" by default, or "CTID DESC" if it
+ *	  looks more useful.
+ *
  *	  Candidate paths are added to the rel's pathlist (using add_path).
  */
 void
 create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 {
 	Relids		required_outer;
+	List	   *pathkeys = NIL;
+	ScanDirection scandir = ForwardScanDirection;
 	List	   *tidquals;
 
 	/*
@@ -374,8 +379,37 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 
 	tidquals = TidQualFromBaseRestrictinfo(rel);
 
-	/* If there are tidquals, then it's worth generating a tidscan path. */
-	if (tidquals)
-		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals,
-												   required_outer));
+	/*
+	 * Look for a suitable direction by trying both forward and backward
+	 * pathkeys.  But don't set any pathkeys if neither direction helps the
+	 * scan (we don't want to generate tid paths for everything).
+	 */
+	if (has_useful_pathkeys(root, rel))
+	{
+		pathkeys = build_tidscan_pathkeys(root, rel, ForwardScanDirection);
+		if (!pathkeys_contained_in(pathkeys, root->query_pathkeys))
+		{
+			pathkeys = build_tidscan_pathkeys(root, rel, BackwardScanDirection);
+			if (pathkeys_contained_in(pathkeys, root->query_pathkeys))
+				scandir = BackwardScanDirection;
+			else
+				pathkeys = NIL;
+		}
+	}
+	else if (tidquals)
+	{
+		/*
+		 * Otherwise, default to a forward scan -- but only if tid quals were
+		 * found (we don't want to generate tid paths for everything).
+		 */
+		pathkeys = build_tidscan_pathkeys(root, rel, ForwardScanDirection);
+	}
+
+	/*
+	 * If there are tidquals or some useful pathkeys were found, then it's
+	 * worth generating a tidscan path.
+	 */
+	if (tidquals || pathkeys)
+		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals, pathkeys,
+												   scandir, required_outer));
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index e2c0bce..66513fe 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -184,7 +184,7 @@ static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 					 List *bitmapqualorig,
 					 Index scanrelid);
 static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
-			 List *tidquals);
+			 List *tidquals, ScanDirection scandir);
 static SubqueryScan *make_subqueryscan(List *qptlist,
 				  List *qpqual,
 				  Index scanrelid,
@@ -3115,7 +3115,8 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	scan_plan = make_tidscan(tlist,
 							 scan_clauses,
 							 scan_relid,
-							 tidquals);
+							 tidquals,
+							 best_path->scandir);
 
 	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
 
@@ -5176,7 +5177,8 @@ static TidScan *
 make_tidscan(List *qptlist,
 			 List *qpqual,
 			 Index scanrelid,
-			 List *tidquals)
+			 List *tidquals,
+			 ScanDirection scandir)
 {
 	TidScan    *node = makeNode(TidScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5187,6 +5189,7 @@ make_tidscan(List *qptlist,
 	plan->righttree = NULL;
 	node->scan.scanrelid = scanrelid;
 	node->tidquals = tidquals;
+	node->scandir = scandir;
 
 	return node;
 }
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d50d86b..fcfdaa1 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1186,6 +1186,7 @@ create_bitmap_or_path(PlannerInfo *root,
  */
 TidPath *
 create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
+					List *pathkeys, ScanDirection scandir,
 					Relids required_outer)
 {
 	TidPath    *pathnode = makeNode(TidPath);
@@ -1198,9 +1199,10 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 	pathnode->path.parallel_aware = false;
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = 0;
-	pathnode->path.pathkeys = NIL;	/* always unordered */
+	pathnode->path.pathkeys = pathkeys;
 
 	pathnode->tidquals = tidquals;
+	pathnode->scandir = scandir;
 
 	cost_tidscan(&pathnode->path, root, rel, tidquals,
 				 pathnode->path.param_info);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index bc9ff54..6a1a83d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -486,6 +486,7 @@ typedef struct TidScan
 {
 	Scan		scan;
 	List	   *tidquals;		/* qual(s) involving CTID = something */
+	ScanDirection scandir;
 } TidScan;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 895849f..4baf14e 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1243,6 +1243,7 @@ typedef struct TidPath
 {
 	Path		path;
 	List	   *tidquals;
+	ScanDirection scandir;
 } TidPath;
 
 /*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 81abcf5..735d185 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,7 +63,8 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  RelOptInfo *rel,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
-					List *tidquals, Relids required_outer);
+					List *tidquals, List *pathkeys, ScanDirection scandir,
+					Relids required_outer);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 				   List *subpaths, List *partial_subpaths,
 				   Relids required_outer,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde30..a4b0a6a 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -211,6 +211,9 @@ extern List *build_join_pathkeys(PlannerInfo *root,
 					RelOptInfo *joinrel,
 					JoinType jointype,
 					List *outer_pathkeys);
+extern List *build_tidscan_pathkeys(PlannerInfo *root,
+					   RelOptInfo *rel,
+					   ScanDirection scandir);
 extern List *make_pathkeys_for_sortclauses(PlannerInfo *root,
 							  List *sortclauses,
 							  List *tlist);
diff --git a/src/test/regress/expected/tidscan.out b/src/test/regress/expected/tidscan.out
index 8083909..8dcbf99 100644
--- a/src/test/regress/expected/tidscan.out
+++ b/src/test/regress/expected/tidscan.out
@@ -176,7 +176,6 @@ EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
 UPDATE tidscan SET id = -id WHERE CURRENT OF c RETURNING *;
 ERROR:  cursor "c" is not positioned on a row
 ROLLBACK;
-DROP TABLE tidscan;
 -- tests for tidrangescans
 CREATE TABLE tidrangescan(id integer, data text);
 INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,1000) AS s(i);
@@ -427,3 +426,125 @@ SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
 ------
 (0 rows)
 
+-- check that ordering on a tidscan doesn't require a sort
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ Tid Scan on tidscan
+   TID Cond: (ctid = ANY ('{"(0,2)","(0,1)","(0,3)"}'::tid[]))
+(2 rows)
+
+SELECT ctid FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+ ctid  
+-------
+ (0,1)
+ (0,2)
+ (0,3)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ Tid Scan Backward on tidscan
+   TID Cond: (ctid = ANY ('{"(0,2)","(0,1)","(0,3)"}'::tid[]))
+(2 rows)
+
+SELECT ctid FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+ ctid  
+-------
+ (0,3)
+ (0,2)
+ (0,1)
+(3 rows)
+
+-- ordering with no quals should use tid range scan
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan ORDER BY ctid ASC;
+        QUERY PLAN        
+--------------------------
+ Tid Scan on tidrangescan
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan ORDER BY ctid DESC;
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan Backward on tidrangescan
+(1 row)
+
+-- min/max
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan;
+                 QUERY PLAN                 
+--------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan on tidrangescan
+                 Filter: (ctid IS NOT NULL)
+(5 rows)
+
+SELECT MIN(ctid) FROM tidrangescan;
+  min  
+-------
+ (0,1)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan Backward on tidrangescan
+                 Filter: (ctid IS NOT NULL)
+(5 rows)
+
+SELECT MAX(ctid) FROM tidrangescan;
+  max   
+--------
+ (9,10)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+                   QUERY PLAN                    
+-------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan on tidrangescan
+                 TID Cond: (ctid > '(5,0)'::tid)
+                 Filter: (ctid IS NOT NULL)
+(6 rows)
+
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+  min  
+-------
+ (5,1)
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+                   QUERY PLAN                    
+-------------------------------------------------
+ Result
+   InitPlan 1 (returns $0)
+     ->  Limit
+           ->  Tid Scan Backward on tidrangescan
+                 TID Cond: (ctid < '(5,0)'::tid)
+                 Filter: (ctid IS NOT NULL)
+(6 rows)
+
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+  max   
+--------
+ (4,10)
+(1 row)
+
+-- clean up
+DROP TABLE tidscan;
+DROP TABLE tidrangescan;
diff --git a/src/test/regress/sql/tidscan.sql b/src/test/regress/sql/tidscan.sql
index 02b094a..8f437e8 100644
--- a/src/test/regress/sql/tidscan.sql
+++ b/src/test/regress/sql/tidscan.sql
@@ -63,8 +63,6 @@ EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
 UPDATE tidscan SET id = -id WHERE CURRENT OF c RETURNING *;
 ROLLBACK;
 
-DROP TABLE tidscan;
-
 -- tests for tidrangescans
 
 CREATE TABLE tidrangescan(id integer, data text);
@@ -140,3 +138,40 @@ SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
 EXPLAIN (COSTS OFF)
 SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
 SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+
+-- check that ordering on a tidscan doesn't require a sort
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+SELECT ctid FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+SELECT ctid FROM tidscan WHERE ctid = ANY(ARRAY['(0,2)', '(0,1)', '(0,3)']::tid[]) ORDER BY ctid DESC;
+
+-- ordering with no quals should use tid range scan
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan ORDER BY ctid ASC;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan ORDER BY ctid DESC;
+
+-- min/max
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan;
+SELECT MIN(ctid) FROM tidrangescan;
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan;
+SELECT MAX(ctid) FROM tidrangescan;
+
+EXPLAIN (COSTS OFF)
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+SELECT MIN(ctid) FROM tidrangescan WHERE ctid > '(5,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+SELECT MAX(ctid) FROM tidrangescan WHERE ctid < '(5,0)';
+
+-- clean up
+DROP TABLE tidscan;
+DROP TABLE tidrangescan;
-- 
2.7.4

v5-0005-TID-selectivity-reduce-the-density-of-the-last-page-.patchapplication/octet-stream; name=v5-0005-TID-selectivity-reduce-the-density-of-the-last-page-.patchDownload

From d7f00aa0c35d29c41af8f2c4068725e8eeb5f82c Mon Sep 17 00:00:00 2001
From: ejrh <ejrh00@gmail.com>
Date: Tue, 27 Nov 2018 20:31:58 +1300
Subject: [PATCH 5/5] TID selectivity: reduce the density of the last page by
 half

This takes into account the fact that the last page will have only half the density, on average,
as other pages in a table.
---
 src/backend/utils/adt/selfuncs.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index f5a1ee0..e7ed9a5 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -604,8 +604,18 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 			 * assume there will never be any dead tuples or empty space at
 			 * the start or in the middle of the page.  This is likely fine
 			 * for the purposes here.
+			 *
+			 * Since the last page will, on average, be only half full, we can
+			 * estimate it to have half as many tuples as earlier pages.  So
+			 * give it half the weight of a regular page.
 			 */
-			density = vardata->rel->tuples / vardata->rel->pages;
+			density = vardata->rel->tuples / (vardata->rel->pages - 0.5);
+
+			/* If it's the last page, it has half the density. */
+			if (block >= vardata->rel->pages - 1)
+				density *= 0.5;
+
+			/* Add a fraction of a block to take the offset into account. */
 			if (density > 0.0)
 			{
 				OffsetNumber offset = ItemPointerGetOffsetNumberNoCheck(itemptr);
@@ -613,7 +623,11 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 				block += Min(offset / density, 1.0);
 			}
 
-			selec = block / (double) vardata->rel->pages;
+			/*
+			 * Again, the last page has only half weight when converting the
+			 * relative block number to a selectivity.
+			 */
+			selec = block / (vardata->rel->pages - 0.5);
 
 			/*
 			 * We'll have one less tuple for "<" and one additional tuple for
-- 
2.7.4

v5-0003-Support-backward-scans-over-restricted-ranges-in-hea.patchapplication/octet-stream; name=v5-0003-Support-backward-scans-over-restricted-ranges-in-hea.patchDownload

From 20bff0fbdb4da235b9a3e7d2cefbfd45e3bdabc0 Mon Sep 17 00:00:00 2001
From: Edmund Horner <ejrh00@gmail.com>
Date: Fri, 12 Oct 2018 16:28:58 +1300
Subject: [PATCH 3/5] Support backward scans over restricted ranges in heap
 access method

This is required for backward Tid scans.
---
 src/backend/access/heap/heapam.c | 48 +++++++++++++++++++++++++++++++++-------
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9650145..dbe6045 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -575,11 +575,27 @@ heapgettup(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_syncscan = false;
-			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
+
+			/*
+			 * When scanning the whole relation, start from the last page of
+			 * the scan.
+			 */
+			if (scan->rs_numblocks == InvalidBlockNumber)
+			{
+				if (scan->rs_startblock > 0)
+					page = scan->rs_startblock - 1;
+				else
+					page = scan->rs_nblocks - 1;
+			}
 			else
-				page = scan->rs_nblocks - 1;
+			{
+				/*
+				 * Otherwise, if scanning just a subset of the relation, start
+				 * at the final block in the range.
+				 */
+				page = scan->rs_startblock + scan->rs_numblocks - 1;
+			}
+
 			heapgetpage(scan, page);
 		}
 		else
@@ -876,11 +892,27 @@ heapgettup_pagemode(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_syncscan = false;
-			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
+
+			/*
+			 * When scanning the whole relation, start from the last page of
+			 * the scan.
+			 */
+			if (scan->rs_numblocks == InvalidBlockNumber)
+			{
+				if (scan->rs_startblock > 0)
+					page = scan->rs_startblock - 1;
+				else
+					page = scan->rs_nblocks - 1;
+			}
 			else
-				page = scan->rs_nblocks - 1;
+			{
+				/*
+				 * Otherwise, if scanning just a subset of the relation, start
+				 * at the final block in the range.
+				 */
+				page = scan->rs_startblock + scan->rs_numblocks - 1;
+			}
+
 			heapgetpage(scan, page);
 		}
 		else
-- 
2.7.4

v5-0002-Support-range-quals-in-Tid-Scan.patchapplication/octet-stream; name=v5-0002-Support-range-quals-in-Tid-Scan.patchDownload

From dbae3c82dc3d1104e92c131b3129d4ea549ede1c Mon Sep 17 00:00:00 2001
From: Edmund Horner <ejrh00@gmail.com>
Date: Fri, 12 Oct 2018 16:28:19 +1300
Subject: [PATCH 2/5] Support range quals in Tid Scan

This means queries with expressions such as "ctid >= ? AND ctid < ?" can be
answered by scanning over that part of a table, rather than falling back to a
full SeqScan.
---
 src/backend/executor/nodeTidscan.c      | 958 +++++++++++++++++++++++++-------
 src/backend/optimizer/path/costsize.c   |  59 +-
 src/backend/optimizer/path/tidpath.c    | 173 ++++--
 src/backend/optimizer/plan/createplan.c |  27 +-
 src/include/catalog/pg_operator.dat     |   6 +-
 src/include/nodes/execnodes.h           |  24 +-
 src/include/nodes/plannodes.h           |   3 +-
 src/include/nodes/relation.h            |  13 +-
 src/test/regress/expected/tidscan.out   | 250 +++++++++
 src/test/regress/sql/tidscan.sql        |  76 +++
 10 files changed, 1318 insertions(+), 271 deletions(-)

diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index afec097..a3b5970 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -22,8 +22,13 @@
  */
 #include "postgres.h"
 
+#include <limits.h>
+
+#include "access/relscan.h"
 #include "access/sysattr.h"
+#include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
+#include "common/int.h"
 #include "executor/execdebug.h"
 #include "executor/nodeTidscan.h"
 #include "miscadmin.h"
@@ -39,21 +44,156 @@
 	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
 	 ((Var *) (node))->varlevelsup == 0)
 
+typedef enum
+{
+	TIDEXPR_IN_ARRAY,
+	TIDEXPR_EQ,
+	TIDEXPR_UPPER_BOUND,
+	TIDEXPR_LOWER_BOUND
+}			TidExprType;
+
+/* one element in TidExpr's opexprs */
+typedef struct TidOpExpr
+{
+	TidExprType exprtype;		/* type of op */
+	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
+	bool		inclusive;		/* whether op is inclusive */
+}			TidOpExpr;
+
 /* one element in tss_tidexprs */
 typedef struct TidExpr
 {
-	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
-	bool		isarray;		/* if true, it yields tid[] not just tid */
-	CurrentOfExpr *cexpr;		/* alternatively, we can have CURRENT OF */
+	List	   *opexprs;		/* list of individual op exprs */
+	CurrentOfExpr *cexpr;		/* For TIDEXPR_CURRENT_OF */
 } TidExpr;
 
+/* a range of tids to scan */
+typedef struct TidRange
+{
+	ItemPointerData first;
+	ItemPointerData last;
+}			TidRange;
+
+/*
+ * During construction of the tidrange array, we need to pass it around with its
+ * current size and allocated size.  We bundle them into this struct for
+ * convenience.
+ */
+typedef struct TidRangeArray
+{
+	TidRange   *ranges;
+	int			numRanges;
+	int			numAllocated;
+}			TidRangeArray;
+
+static TidOpExpr * MakeTidOpExpr(OpExpr *expr, TidScanState *tidstate);
+static TidOpExpr * MakeTidScalarArrayOpExpr(ScalarArrayOpExpr *saop,
+											TidScanState *tidstate);
+static List *MakeTidOpExprList(List *exprs, TidScanState *tidstate);
 static void TidExprListCreate(TidScanState *tidstate);
+static void EnsureTidRangeSpace(TidRangeArray * tidRangeArray, int numNewItems);
+static void AddTidRange(TidRangeArray * tidRangeArray,
+			ItemPointer first,
+			ItemPointer last);
+static bool SetTidLowerBound(ItemPointer tid, bool inclusive,
+				 ItemPointer lowerBound);
+static bool SetTidUpperBound(ItemPointer tid, bool inclusive,
+				 ItemPointer upperBound);
 static void TidListEval(TidScanState *tidstate);
-static int	itemptr_comparator(const void *a, const void *b);
+static bool MergeTidRanges(TidRange * a, TidRange * b);
+static int	tidrange_comparator(const void *a, const void *b);
+static HeapScanDesc BeginTidRangeScan(TidScanState *node, TidRange * range);
+static HeapTuple NextInTidRange(HeapScanDesc scandesc, ScanDirection direction,
+			   TidRange * range);
 static TupleTableSlot *TidNext(TidScanState *node);
 
 
 /*
+ * For the given 'expr', build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.
+ */
+static TidOpExpr *
+MakeTidOpExpr(OpExpr *expr, TidScanState *tidstate)
+{
+	Node	   *arg1 = get_leftop((Expr *) expr);
+	Node	   *arg2 = get_rightop((Expr *) expr);
+	ExprState  *exprstate = NULL;
+	bool		invert = false;
+	TidOpExpr  *tidopexpr;
+
+	if (IsCTIDVar(arg1))
+		exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
+	else if (IsCTIDVar(arg2))
+	{
+		exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
+		invert = true;
+	}
+	else
+		elog(ERROR, "could not identify CTID variable");
+
+	tidopexpr = (TidOpExpr *) palloc0(sizeof(TidOpExpr));
+
+	switch (expr->opno)
+	{
+		case TIDLessEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDLessOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
+			break;
+		case TIDGreaterEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDGreaterOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
+			break;
+		default:
+			tidopexpr->exprtype = TIDEXPR_EQ;
+	}
+
+	tidopexpr->exprstate = exprstate;
+
+	return tidopexpr;
+}
+
+/* For the given 'saop', build and return a TidOpExpr for the scalar array op. */
+static TidOpExpr *
+MakeTidScalarArrayOpExpr(ScalarArrayOpExpr *saop, TidScanState *tidstate)
+{
+	TidOpExpr  *tidopexpr;
+
+	Assert(IsCTIDVar(linitial(saop->args)));
+
+	tidopexpr = (TidOpExpr *) palloc0(sizeof(TidOpExpr));
+	tidopexpr->exprstate = ExecInitExpr(lsecond(saop->args),
+										&tidstate->ss.ps);
+	tidopexpr->exprtype = TIDEXPR_IN_ARRAY;
+
+	return tidopexpr;
+}
+
+/*
+ * Build and return a list of TidOpExprs the the given list of exprs, which
+ * are assumed to be OpExprs.
+ */
+static List *
+MakeTidOpExprList(List *exprs, TidScanState *tidstate)
+{
+	ListCell   *l;
+	List	   *tidopexprs = NIL;
+
+	foreach(l, exprs)
+	{
+		OpExpr	   *opexpr = lfirst(l);
+		TidOpExpr  *tidopexpr = MakeTidOpExpr(opexpr, tidstate);
+
+		tidopexprs = lappend(tidopexprs, tidopexpr);
+	}
+
+	return tidopexprs;
+}
+
+/*
  * Extract the qual subexpressions that yield TIDs to search for,
  * and compile them into ExprStates if they're ordinary expressions.
  *
@@ -69,6 +209,18 @@ TidExprListCreate(TidScanState *tidstate)
 	tidstate->tss_tidexprs = NIL;
 	tidstate->tss_isCurrentOf = false;
 
+	/*
+	 * If no quals were specified, then a complete scan is assumed.  Make a
+	 * TidExpr with an empty list of TidOpExprs.
+	 */
+	if (node->tidquals == NIL)
+	{
+		TidExpr    *tidexpr = (TidExpr *) palloc0(sizeof(TidExpr));
+
+		tidstate->tss_tidexprs = lappend(tidstate->tss_tidexprs, tidexpr);
+		return;
+	}
+
 	foreach(l, node->tidquals)
 	{
 		Expr	   *expr = (Expr *) lfirst(l);
@@ -76,37 +228,31 @@ TidExprListCreate(TidScanState *tidstate)
 
 		if (is_opclause(expr))
 		{
-			Node	   *arg1;
-			Node	   *arg2;
-
-			arg1 = get_leftop(expr);
-			arg2 = get_rightop(expr);
-			if (IsCTIDVar(arg1))
-				tidexpr->exprstate = ExecInitExpr((Expr *) arg2,
-												  &tidstate->ss.ps);
-			else if (IsCTIDVar(arg2))
-				tidexpr->exprstate = ExecInitExpr((Expr *) arg1,
-												  &tidstate->ss.ps);
-			else
-				elog(ERROR, "could not identify CTID variable");
-			tidexpr->isarray = false;
+			OpExpr	   *opexpr = (OpExpr *) expr;
+			TidOpExpr  *tidopexpr = MakeTidOpExpr(opexpr, tidstate);
+
+			tidexpr->opexprs = list_make1(tidopexpr);
 		}
 		else if (expr && IsA(expr, ScalarArrayOpExpr))
 		{
-			ScalarArrayOpExpr *saex = (ScalarArrayOpExpr *) expr;
+			ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) expr;
+			TidOpExpr  *tidopexpr = MakeTidScalarArrayOpExpr(saop, tidstate);
 
-			Assert(IsCTIDVar(linitial(saex->args)));
-			tidexpr->exprstate = ExecInitExpr(lsecond(saex->args),
-											  &tidstate->ss.ps);
-			tidexpr->isarray = true;
+			tidexpr->opexprs = list_make1(tidopexpr);
 		}
 		else if (expr && IsA(expr, CurrentOfExpr))
 		{
 			CurrentOfExpr *cexpr = (CurrentOfExpr *) expr;
 
+			/* For CURRENT OF, save the expression in the TidExpr. */
 			tidexpr->cexpr = cexpr;
 			tidstate->tss_isCurrentOf = true;
 		}
+		else if (and_clause((Node *) expr))
+		{
+			tidexpr->opexprs = MakeTidOpExprList(((BoolExpr *) expr)->args,
+												 tidstate);
+		}
 		else
 			elog(ERROR, "could not identify CTID expression");
 
@@ -119,104 +265,337 @@ TidExprListCreate(TidScanState *tidstate)
 }
 
 /*
- * Compute the list of TIDs to be visited, by evaluating the expressions
- * for them.
- *
- * (The result is actually an array, not a list.)
+ * Ensure the array of TidRange objects has enough space for new items.
+ * Will allocate the array if not yet allocated, and reallocate it if
+ * necessary to accomodate new items.
  */
-static void
-TidListEval(TidScanState *tidstate)
+void
+EnsureTidRangeSpace(TidRangeArray * tidRangeArray, int numNewItems)
 {
-	ExprContext *econtext = tidstate->ss.ps.ps_ExprContext;
-	BlockNumber nblocks;
-	ItemPointerData *tidList;
-	int			numAllocTids;
-	int			numTids;
-	ListCell   *l;
+	int			requiredSize;
+
+#define MaxTidRanges ((Size) (MaxAllocSize / sizeof(TidRange)))
 
 	/*
-	 * We silently discard any TIDs that are out of range at the time of scan
-	 * start.  (Since we hold at least AccessShareLock on the table, it won't
-	 * be possible for someone to truncate away the blocks we intend to
-	 * visit.)
+	 * This addition should be fine, since numNewItems won't exceed the
+	 * maximum array size, which is MaxAllocSize/sizeof(Datum) (see
+	 * ArrayGetNItems).
 	 */
-	nblocks = RelationGetNumberOfBlocks(tidstate->ss.ss_currentRelation);
+	requiredSize = tidRangeArray->numRanges + numNewItems;
+
+	if (requiredSize > MaxTidRanges)
+		ereport(ERROR,
+				(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+				 errmsg("number of tid ranges exceeds the maximum allowed (%d)",
+						(int) MaxTidRanges)));
+
+	if (requiredSize <= tidRangeArray->numAllocated)
+		return;
 
 	/*
-	 * We initialize the array with enough slots for the case that all quals
-	 * are simple OpExprs or CurrentOfExprs.  If there are any
-	 * ScalarArrayOpExprs, we may have to enlarge the array.
+	 * If allocating the array for the first time, start with a size that will
+	 * fit nicely into a power of 2 bytes with little wastage.
 	 */
-	numAllocTids = list_length(tidstate->tss_tidexprs);
-	tidList = (ItemPointerData *)
-		palloc(numAllocTids * sizeof(ItemPointerData));
-	numTids = 0;
+#define InitialTidArraySize (int) (256/sizeof(TidRange))
 
-	foreach(l, tidstate->tss_tidexprs)
+	if (tidRangeArray->numAllocated == 0)
+		tidRangeArray->numAllocated = InitialTidArraySize;
+
+	/* It's not safe to double the size unless we're less than half INT_MAX. */
+	Assert(requiredSize < INT_MAX / 2);
+
+	while (tidRangeArray->numAllocated < requiredSize)
+		tidRangeArray->numAllocated *= 2;
+
+	if (tidRangeArray->ranges == NULL)
+		tidRangeArray->ranges = (TidRange *)
+			palloc0(tidRangeArray->numAllocated * sizeof(TidRange));
+	else
+		tidRangeArray->ranges = (TidRange *)
+			repalloc(tidRangeArray->ranges,
+					 tidRangeArray->numAllocated * sizeof(TidRange));
+}
+
+/*
+ * Add a tid range to the array.
+ *
+ * Note: we assume that space for the additional item has already been ensured
+ * by the caller!
+ */
+void
+AddTidRange(TidRangeArray * tidRangeArray, ItemPointer first, ItemPointer last)
+{
+	tidRangeArray->ranges[tidRangeArray->numRanges].first = *first;
+	tidRangeArray->ranges[tidRangeArray->numRanges].last = *last;
+	tidRangeArray->numRanges++;
+}
+
+/*
+ * Set a lower bound tid, taking into account the inclusivity of the bound.
+ * Return true if the bound is valid.
+ */
+static bool
+SetTidLowerBound(ItemPointer tid, bool inclusive, ItemPointer lowerBound)
+{
+	OffsetNumber offset;
+
+	*lowerBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	if (!inclusive)
 	{
-		TidExpr    *tidexpr = (TidExpr *) lfirst(l);
-		ItemPointer itemptr;
-		bool		isNull;
+		/* Check if the lower bound is actually in the next block. */
+		if (offset >= MaxOffsetNumber)
+		{
+			BlockNumber block = ItemPointerGetBlockNumberNoCheck(lowerBound);
+
+			/*
+			 * If the lower bound was already at or above the maximum block
+			 * number, then there is no valid range.
+			 */
+			if (block >= MaxBlockNumber)
+				return false;
+
+			ItemPointerSetBlockNumber(lowerBound, block + 1);
+			ItemPointerSetOffsetNumber(lowerBound, 1);
+		}
+		else
+			ItemPointerSetOffsetNumber(lowerBound, OffsetNumberNext(offset));
+	}
+	else if (offset == 0)
+		ItemPointerSetOffsetNumber(lowerBound, 1);
 
-		if (tidexpr->exprstate && !tidexpr->isarray)
+	return true;
+}
+
+/*
+ * Set an upper bound tid, taking into account the inclusivity of the bound.
+ * Return true if the bound is valid.
+ */
+static bool
+SetTidUpperBound(ItemPointer tid, bool inclusive, ItemPointer upperBound)
+{
+	OffsetNumber offset;
+
+	*upperBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	/*
+	 * Since TID offsets start at 1, an inclusive upper bound with offset 0
+	 * can be treated as an exclusive bound.  This has the benefit of
+	 * eliminating that block from the scan range.
+	 */
+	if (inclusive && offset == 0)
+		inclusive = false;
+
+	if (!inclusive)
+	{
+		/* Check if the upper bound is actually in the previous block. */
+		if (offset == 0)
 		{
-			itemptr = (ItemPointer)
-				DatumGetPointer(ExecEvalExprSwitchContext(tidexpr->exprstate,
-														  econtext,
-														  &isNull));
-			if (!isNull &&
-				ItemPointerIsValid(itemptr) &&
+			BlockNumber block = ItemPointerGetBlockNumberNoCheck(upperBound);
+
+			/*
+			 * If the upper bound was already in block 0, then there is no
+			 * valid range.
+			 */
+			if (block == 0)
+				return false;
+
+			ItemPointerSetBlockNumber(upperBound, block - 1);
+			ItemPointerSetOffsetNumber(upperBound, MaxOffsetNumber);
+		}
+		else
+			ItemPointerSetOffsetNumber(upperBound, OffsetNumberPrev(offset));
+	}
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		TidInArrayExprEval
+ *
+ *		Evaluate a TidOpExpr, creating a set of new TidRanges -- one for each
+ *		TID in the expression -- to add to the node's list of ranges to scan.
+ * ----------------------------------------------------------------
+ */
+static void
+TidInArrayExprEval(TidOpExpr * tidopexpr, BlockNumber nblocks,
+				   TidScanState *tidstate, TidRangeArray * tidRangeArray)
+{
+	ExprContext *econtext = tidstate->ss.ps.ps_ExprContext;
+	bool		isNull;
+	Datum		arraydatum;
+	ArrayType  *itemarray;
+	Datum	   *ipdatums;
+	bool	   *ipnulls;
+	int			ndatums;
+	int			i;
+
+	arraydatum = ExecEvalExprSwitchContext(tidopexpr->exprstate,
+										   econtext,
+										   &isNull);
+	if (isNull)
+		return;
+
+	itemarray = DatumGetArrayTypeP(arraydatum);
+	deconstruct_array(itemarray,
+					  TIDOID, sizeof(ItemPointerData), false, 's',
+					  &ipdatums, &ipnulls, &ndatums);
+
+	/* ensure space for all returned TID datums in one swoop */
+	EnsureTidRangeSpace(tidRangeArray, ndatums);
+
+	for (i = 0; i < ndatums; i++)
+	{
+		if (!ipnulls[i])
+		{
+			ItemPointer itemptr = (ItemPointer) DatumGetPointer(ipdatums[i]);
+
+			if (ItemPointerIsValid(itemptr) &&
 				ItemPointerGetBlockNumber(itemptr) < nblocks)
 			{
-				if (numTids >= numAllocTids)
-				{
-					numAllocTids *= 2;
-					tidList = (ItemPointerData *)
-						repalloc(tidList,
-								 numAllocTids * sizeof(ItemPointerData));
-				}
-				tidList[numTids++] = *itemptr;
+				AddTidRange(tidRangeArray, itemptr, itemptr);
 			}
 		}
-		else if (tidexpr->exprstate && tidexpr->isarray)
+	}
+	pfree(ipdatums);
+	pfree(ipnulls);
+}
+
+/* ----------------------------------------------------------------
+ *		TidExprEval
+ *
+ *		Evaluate a TidExpr, creating a new TidRange to add to the node's
+ *		list of ranges to scan.
+ * ----------------------------------------------------------------
+ */
+static void
+TidExprEval(TidExpr *expr, BlockNumber nblocks, TidScanState *tidstate,
+			TidRangeArray * tidRangeArray)
+{
+	ExprContext *econtext = tidstate->ss.ps.ps_ExprContext;
+	ListCell   *l;
+	ItemPointerData lowerBound;
+	ItemPointerData upperBound;
+
+	/* The biggest range on an empty table is empty; just skip it. */
+	if (nblocks == 0)
+		return;
+
+	/* Set the lower and upper bound to scan the whole table. */
+	ItemPointerSetBlockNumber(&lowerBound, 0);
+	ItemPointerSetOffsetNumber(&lowerBound, 1);
+	ItemPointerSetBlockNumber(&upperBound, nblocks - 1);
+	ItemPointerSetOffsetNumber(&upperBound, MaxOffsetNumber);
+
+	foreach(l, expr->opexprs)
+	{
+		TidOpExpr  *tidopexpr = (TidOpExpr *) lfirst(l);
+
+		if (tidopexpr->exprtype == TIDEXPR_IN_ARRAY)
+		{
+			TidInArrayExprEval(tidopexpr, nblocks, tidstate, tidRangeArray);
+
+			/*
+			 * A CTID = ANY expression only exists by itself; there shouldn't
+			 * be any other quals alongside it.  TidInArrayExprEval has
+			 * already added the ranges, so just return here.
+			 */
+			Assert(list_length(expr->opexprs) == 1);
+			return;
+		}
+		else
 		{
-			Datum		arraydatum;
-			ArrayType  *itemarray;
-			Datum	   *ipdatums;
-			bool	   *ipnulls;
-			int			ndatums;
-			int			i;
-
-			arraydatum = ExecEvalExprSwitchContext(tidexpr->exprstate,
-												   econtext,
-												   &isNull);
+			ItemPointer itemptr;
+			bool		isNull;
+
+			/* Evaluate this bound. */
+			itemptr = (ItemPointer)
+				DatumGetPointer(ExecEvalExprSwitchContext(tidopexpr->exprstate,
+														  econtext,
+														  &isNull));
+
+			/* If the bound is NULL, *nothing* matches the qual. */
 			if (isNull)
-				continue;
-			itemarray = DatumGetArrayTypeP(arraydatum);
-			deconstruct_array(itemarray,
-							  TIDOID, sizeof(ItemPointerData), false, 's',
-							  &ipdatums, &ipnulls, &ndatums);
-			if (numTids + ndatums > numAllocTids)
+				return;
+
+			if (tidopexpr->exprtype == TIDEXPR_EQ && ItemPointerIsValid(itemptr))
 			{
-				numAllocTids = numTids + ndatums;
-				tidList = (ItemPointerData *)
-					repalloc(tidList,
-							 numAllocTids * sizeof(ItemPointerData));
+				lowerBound = *itemptr;
+				upperBound = *itemptr;
+
+				/*
+				 * A CTID = ? expression only exists by itself, so set the
+				 * range to this single TID, and exit the loop (the remainder
+				 * of this function will add the range).
+				 */
+				Assert(list_length(expr->opexprs) == 1);
+				break;
 			}
-			for (i = 0; i < ndatums; i++)
+
+			if (tidopexpr->exprtype == TIDEXPR_LOWER_BOUND)
 			{
-				if (!ipnulls[i])
-				{
-					itemptr = (ItemPointer) DatumGetPointer(ipdatums[i]);
-					if (ItemPointerIsValid(itemptr) &&
-						ItemPointerGetBlockNumber(itemptr) < nblocks)
-						tidList[numTids++] = *itemptr;
-				}
+				ItemPointerData lb;
+
+				if (!SetTidLowerBound(itemptr, tidopexpr->inclusive, &lb))
+					return;
+
+				if (ItemPointerCompare(&lb, &lowerBound) > 0)
+					lowerBound = lb;
+			}
+
+			if (tidopexpr->exprtype == TIDEXPR_UPPER_BOUND)
+			{
+				ItemPointerData ub;
+
+				if (!SetTidUpperBound(itemptr, tidopexpr->inclusive, &ub))
+					return;
+
+				if (ItemPointerCompare(&ub, &upperBound) < 0)
+					upperBound = ub;
 			}
-			pfree(ipdatums);
-			pfree(ipnulls);
 		}
-		else
+	}
+
+	/* If the resulting range is not empty, add it to the array. */
+	if (ItemPointerCompare(&lowerBound, &upperBound) <= 0)
+	{
+		EnsureTidRangeSpace(tidRangeArray, 1);
+		AddTidRange(tidRangeArray, &lowerBound, &upperBound);
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		TidListEval
+ *
+ *		Compute the list of TID ranges to be visited, by evaluating the
+ *		expressions for them.
+ *
+ *		(The result is actually an array, not a list.)
+ * ----------------------------------------------------------------
+ */
+static void
+TidListEval(TidScanState *tidstate)
+{
+	ExprContext *econtext = tidstate->ss.ps.ps_ExprContext;
+	BlockNumber nblocks;
+	TidRangeArray tidRangeArray = {NULL, 0, 0}; /* not yet allocated */
+	ListCell   *l;
+
+	/*
+	 * We silently discard any TIDs that are out of range at the time of scan
+	 * start.  (Since we hold at least AccessShareLock on the table, it won't
+	 * be possible for someone to truncate away the blocks we intend to
+	 * visit.)
+	 */
+	nblocks = RelationGetNumberOfBlocks(tidstate->ss.ss_currentRelation);
+
+	foreach(l, tidstate->tss_tidexprs)
+	{
+		TidExpr    *tidexpr = (TidExpr *) lfirst(l);
+
+		if (tidexpr->cexpr)
 		{
 			ItemPointerData cursor_tid;
 
@@ -225,16 +604,14 @@ TidListEval(TidScanState *tidstate)
 							  RelationGetRelid(tidstate->ss.ss_currentRelation),
 							  &cursor_tid))
 			{
-				if (numTids >= numAllocTids)
-				{
-					numAllocTids *= 2;
-					tidList = (ItemPointerData *)
-						repalloc(tidList,
-								 numAllocTids * sizeof(ItemPointerData));
-				}
-				tidList[numTids++] = cursor_tid;
+				EnsureTidRangeSpace(&tidRangeArray, 1);
+				AddTidRange(&tidRangeArray, &cursor_tid, &cursor_tid);
 			}
 		}
+		else
+		{
+			TidExprEval(tidexpr, nblocks, tidstate, &tidRangeArray);
+		}
 	}
 
 	/*
@@ -243,52 +620,159 @@ TidListEval(TidScanState *tidstate)
 	 * the list.  Sorting makes it easier to detect duplicates, and as a bonus
 	 * ensures that we will visit the heap in the most efficient way.
 	 */
-	if (numTids > 1)
+	if (tidRangeArray.numRanges > 1)
 	{
-		int			lastTid;
+		int			lastRange;
 		int			i;
 
 		/* CurrentOfExpr could never appear OR'd with something else */
 		Assert(!tidstate->tss_isCurrentOf);
 
-		qsort((void *) tidList, numTids, sizeof(ItemPointerData),
-			  itemptr_comparator);
-		lastTid = 0;
-		for (i = 1; i < numTids; i++)
+		qsort((void *) tidRangeArray.ranges, tidRangeArray.numRanges,
+			  sizeof(TidRange), tidrange_comparator);
+		lastRange = 0;
+		for (i = 1; i < tidRangeArray.numRanges; i++)
 		{
-			if (!ItemPointerEquals(&tidList[lastTid], &tidList[i]))
-				tidList[++lastTid] = tidList[i];
+			if (!MergeTidRanges(&tidRangeArray.ranges[lastRange],
+								&tidRangeArray.ranges[i]))
+				tidRangeArray.ranges[++lastRange] = tidRangeArray.ranges[i];
 		}
-		numTids = lastTid + 1;
+		tidRangeArray.numRanges = lastRange + 1;
 	}
 
-	tidstate->tss_TidList = tidList;
-	tidstate->tss_NumTids = numTids;
-	tidstate->tss_TidPtr = -1;
+	tidstate->tss_TidRanges = tidRangeArray.ranges;
+	tidstate->tss_NumTidRanges = tidRangeArray.numRanges;
+	tidstate->tss_CurrentTidRange = -1;
+}
+
+/*
+ * MergeTidRanges
+ *		If two ranges overlap, merge them into one.
+ *
+ * Assumes the two ranges a and b are already ordered by (first, last).
+ * Returns true if they were merged, with the result in a.
+ */
+static bool
+MergeTidRanges(TidRange * a, TidRange * b)
+{
+	/*
+	 * If the first range ends before the second one begins, they don't
+	 * overlap, and we can't merge them.
+	 */
+	if (ItemPointerCompare(&a->last, &b->first) < 0)
+		return false;
+
+	/*
+	 * Since they overlap, the end of the new range should be the maximum of
+	 * the original two range ends.
+	 */
+	if (ItemPointerCompare(&a->last, &b->last) < 0)
+		a->last = b->last;
+	return true;
 }
 
 /*
- * qsort comparator for ItemPointerData items
+ * qsort comparator for TidRange items
  */
 static int
-itemptr_comparator(const void *a, const void *b)
+tidrange_comparator(const void *a, const void *b)
 {
-	const ItemPointerData *ipa = (const ItemPointerData *) a;
-	const ItemPointerData *ipb = (const ItemPointerData *) b;
-	BlockNumber ba = ItemPointerGetBlockNumber(ipa);
-	BlockNumber bb = ItemPointerGetBlockNumber(ipb);
-	OffsetNumber oa = ItemPointerGetOffsetNumber(ipa);
-	OffsetNumber ob = ItemPointerGetOffsetNumber(ipb);
-
-	if (ba < bb)
-		return -1;
-	if (ba > bb)
-		return 1;
-	if (oa < ob)
-		return -1;
-	if (oa > ob)
-		return 1;
-	return 0;
+	TidRange   *tra = (TidRange *) a;
+	TidRange   *trb = (TidRange *) b;
+	int			cmp_first = ItemPointerCompare(&tra->first, &trb->first);
+
+	if (cmp_first != 0)
+		return cmp_first;
+	else
+		return ItemPointerCompare(&tra->last, &trb->last);
+}
+
+/* ----------------------------------------------------------------
+ *		BeginTidRangeScan
+ *
+ *		Beginning scanning a range of TIDs by setting up the TidScan node's
+ *		scandesc, and setting the tss_inScan flag.
+ * ----------------------------------------------------------------
+ */
+static HeapScanDesc
+BeginTidRangeScan(TidScanState *node, TidRange * range)
+{
+	HeapScanDesc scandesc = node->ss.ss_currentScanDesc;
+	BlockNumber first_block = ItemPointerGetBlockNumberNoCheck(&range->first);
+	BlockNumber last_block = ItemPointerGetBlockNumberNoCheck(&range->last);
+
+	if (!scandesc)
+	{
+		EState	   *estate = node->ss.ps.state;
+
+		scandesc = heap_beginscan_strat(node->ss.ss_currentRelation,
+										estate->es_snapshot,
+										0, NULL,
+										false, false);
+		node->ss.ss_currentScanDesc = scandesc;
+	}
+	else
+		heap_rescan(scandesc, NULL);
+
+	heap_setscanlimits(scandesc, first_block, last_block - first_block + 1);
+	node->tss_inScan = true;
+	return scandesc;
+}
+
+/* ----------------------------------------------------------------
+ *		NextInTidRange
+ *
+ *		Fetch the next tuple when scanning a range of TIDs.
+ * ----------------------------------------------------------------
+ */
+static HeapTuple
+NextInTidRange(HeapScanDesc scandesc, ScanDirection direction, TidRange * range)
+{
+	BlockNumber first_block = ItemPointerGetBlockNumber(&range->first);
+	OffsetNumber first_offset = ItemPointerGetOffsetNumber(&range->first);
+	BlockNumber last_block = ItemPointerGetBlockNumber(&range->last);
+	OffsetNumber last_offset = ItemPointerGetOffsetNumber(&range->last);
+	HeapTuple	tuple;
+
+	for (;;)
+	{
+		BlockNumber block;
+		OffsetNumber offset;
+
+		tuple = heap_getnext(scandesc, direction);
+		if (!tuple)
+			break;
+
+		/* Check that the tuple is within the required range. */
+		block = ItemPointerGetBlockNumber(&tuple->t_self);
+		offset = ItemPointerGetOffsetNumber(&tuple->t_self);
+
+		/*
+		 * If the tuple is in the first block of the range and before the
+		 * first requested offset, then we can either skip it (if scanning
+		 * forward), or end the scan (if scanning backward).
+		 */
+		if (block == first_block && offset < first_offset)
+		{
+			if (ScanDirectionIsForward(direction))
+				continue;
+			else
+				return NULL;
+		}
+
+		/* Similarly for the last block, after the last requested offset. */
+		if (block == last_block && offset > last_offset)
+		{
+			if (ScanDirectionIsBackward(direction))
+				continue;
+			else
+				return NULL;
+		}
+
+		break;
+	}
+
+	return tuple;
 }
 
 /* ----------------------------------------------------------------
@@ -302,6 +786,7 @@ itemptr_comparator(const void *a, const void *b)
 static TupleTableSlot *
 TidNext(TidScanState *node)
 {
+	HeapScanDesc scandesc;
 	EState	   *estate;
 	ScanDirection direction;
 	Snapshot	snapshot;
@@ -309,105 +794,149 @@ TidNext(TidScanState *node)
 	HeapTuple	tuple;
 	TupleTableSlot *slot;
 	Buffer		buffer = InvalidBuffer;
-	ItemPointerData *tidList;
-	int			numTids;
-	bool		bBackward;
+	int			numRanges;
 
 	/*
 	 * extract necessary information from tid scan node
 	 */
+	scandesc = node->ss.ss_currentScanDesc;
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
 	snapshot = estate->es_snapshot;
 	heapRelation = node->ss.ss_currentRelation;
 	slot = node->ss.ss_ScanTupleSlot;
 
-	/*
-	 * First time through, compute the list of TIDs to be visited
-	 */
-	if (node->tss_TidList == NULL)
+	/* First time through, compute the list of TID ranges to be visited */
+	if (node->tss_TidRanges == NULL)
+	{
 		TidListEval(node);
 
-	tidList = node->tss_TidList;
-	numTids = node->tss_NumTids;
+		node->tss_CurrentTidRange = -1;
+	}
 
-	/*
-	 * We use node->tss_htup as the tuple pointer; note this can't just be a
-	 * local variable here, as the scan tuple slot will keep a pointer to it.
-	 */
-	tuple = &(node->tss_htup);
+	numRanges = node->tss_NumTidRanges;
 
-	/*
-	 * Initialize or advance scan position, depending on direction.
-	 */
-	bBackward = ScanDirectionIsBackward(direction);
-	if (bBackward)
-	{
-		if (node->tss_TidPtr < 0)
-		{
-			/* initialize for backward scan */
-			node->tss_TidPtr = numTids - 1;
-		}
-		else
-			node->tss_TidPtr--;
-	}
-	else
+	tuple = NULL;
+	for (;;)
 	{
-		if (node->tss_TidPtr < 0)
+		TidRange   *currentRange;
+
+		if (!node->tss_inScan)
 		{
-			/* initialize for forward scan */
-			node->tss_TidPtr = 0;
+			/* Initialize or advance scan position, depending on direction. */
+			bool		bBackward = ScanDirectionIsBackward(direction);
+
+			if (bBackward)
+			{
+				if (node->tss_CurrentTidRange < 0)
+				{
+					/* initialize for backward scan */
+					node->tss_CurrentTidRange = numRanges - 1;
+				}
+				else
+					node->tss_CurrentTidRange--;
+			}
+			else
+			{
+				if (node->tss_CurrentTidRange < 0)
+				{
+					/* initialize for forward scan */
+					node->tss_CurrentTidRange = 0;
+				}
+				else
+					node->tss_CurrentTidRange++;
+			}
 		}
-		else
-			node->tss_TidPtr++;
-	}
 
-	while (node->tss_TidPtr >= 0 && node->tss_TidPtr < numTids)
-	{
-		tuple->t_self = tidList[node->tss_TidPtr];
+		/* If we've finished iterating over the ranges, exit the loop. */
+		if (node->tss_CurrentTidRange >= numRanges ||
+			node->tss_CurrentTidRange < 0)
+			break;
+
+		currentRange = &node->tss_TidRanges[node->tss_CurrentTidRange];
 
 		/*
-		 * For WHERE CURRENT OF, the tuple retrieved from the cursor might
-		 * since have been updated; if so, we should fetch the version that is
-		 * current according to our snapshot.
+		 * For ranges containing a single tuple, we can simply make an attempt
+		 * to fetch the tuple directly.
 		 */
-		if (node->tss_isCurrentOf)
-			heap_get_latest_tid(heapRelation, snapshot, &tuple->t_self);
-
-		if (heap_fetch(heapRelation, snapshot, tuple, &buffer, false, NULL))
+		if (ItemPointerEquals(&currentRange->first, &currentRange->last))
 		{
 			/*
-			 * Store the scanned tuple in the scan tuple slot of the scan
-			 * state.  Eventually we will only do this and not return a tuple.
+			 * We use node->tss_htup as the tuple pointer; note this can't
+			 * just be a local variable here, as the scan tuple slot will keep
+			 * a pointer to it.
 			 */
-			ExecStoreBufferHeapTuple(tuple, /* tuple to store */
-									 slot,	/* slot to store in */
-									 buffer);	/* buffer associated with
-												 * tuple */
+			tuple = &(node->tss_htup);
+			tuple->t_self = currentRange->first;
 
 			/*
-			 * At this point we have an extra pin on the buffer, because
-			 * ExecStoreHeapTuple incremented the pin count. Drop our local
-			 * pin.
+			 * For WHERE CURRENT OF, the tuple retrieved from the cursor might
+			 * since have been updated; if so, we should fetch the version
+			 * that is current according to our snapshot.
 			 */
-			ReleaseBuffer(buffer);
+			if (node->tss_isCurrentOf)
+				heap_get_latest_tid(heapRelation, snapshot, &tuple->t_self);
+
+			if (heap_fetch(heapRelation, snapshot, tuple, &buffer, false, NULL))
+			{
+				/*
+				 * Store the scanned tuple in the scan tuple slot of the scan
+				 * state.  Eventually we will only do this and not return a
+				 * tuple.
+				 */
+				ExecStoreBufferHeapTuple(tuple, /* tuple to store */
+										 slot,	/* slot to store in */
+										 buffer);	/* buffer associated with
+													 * tuple */
 
-			return slot;
+				/*
+				 * At this point we have an extra pin on the buffer, because
+				 * ExecStoreBufferHeapTuple incremented the pin count. Drop
+				 * our local pin.
+				 */
+				ReleaseBuffer(buffer);
+
+				return slot;
+			}
+			else
+			{
+				/* No tuple found for this TID range. */
+				tuple = NULL;
+			}
 		}
-		/* Bad TID or failed snapshot qual; try next */
-		if (bBackward)
-			node->tss_TidPtr--;
 		else
-			node->tss_TidPtr++;
+		{
+			/*
+			 * For a bigger TID range, we'll use a scan, starting a new one if
+			 * we're not already in one.
+			 */
+			if (!node->tss_inScan)
+				scandesc = BeginTidRangeScan(node, currentRange);
 
-		CHECK_FOR_INTERRUPTS();
+			tuple = NextInTidRange(scandesc, direction, currentRange);
+			if (tuple)
+				break;
+
+			/* No more tuples in this scan, so finish it. */
+			node->tss_inScan = false;
+		}
 	}
 
 	/*
-	 * if we get here it means the tid scan failed so we are at the end of the
-	 * scan..
+	 * save the tuple and the buffer returned to us by the access methods in
+	 * our scan tuple slot and return the slot.  Note also that
+	 * ExecStoreBufferHeapTuple will increment the refcount of the buffer; the
+	 * refcount will not be dropped until the tuple table slot is cleared.
 	 */
-	return ExecClearTuple(slot);
+	if (tuple)
+		ExecStoreBufferHeapTuple(tuple, /* tuple to store */
+								 slot,	/* slot to store in */
+								 scandesc->rs_cbuf);	/* buffer associated
+														 * with this tuple */
+	else
+		ExecClearTuple(slot);
+
+	return slot;
 }
 
 /*
@@ -440,7 +969,7 @@ TidRecheck(TidScanState *node, TupleTableSlot *slot)
  *		Initial States:
  *		  -- the relation indicated is opened for scanning so that the
  *			 "cursor" is positioned before the first qualifying tuple.
- *		  -- tidPtr is -1.
+ *		  -- tss_CurrentTidRange is -1.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -460,11 +989,13 @@ ExecTidScan(PlanState *pstate)
 void
 ExecReScanTidScan(TidScanState *node)
 {
-	if (node->tss_TidList)
-		pfree(node->tss_TidList);
-	node->tss_TidList = NULL;
-	node->tss_NumTids = 0;
-	node->tss_TidPtr = -1;
+	if (node->tss_TidRanges)
+		pfree(node->tss_TidRanges);
+
+	node->tss_TidRanges = NULL;
+	node->tss_NumTidRanges = 0;
+	node->tss_CurrentTidRange = -1;
+	node->tss_inScan = false;
 
 	ExecScanReScan(&node->ss);
 }
@@ -479,6 +1010,8 @@ ExecReScanTidScan(TidScanState *node)
 void
 ExecEndTidScan(TidScanState *node)
 {
+	HeapScanDesc scan = node->ss.ss_currentScanDesc;
+
 	/*
 	 * Free the exprcontext
 	 */
@@ -490,6 +1023,10 @@ ExecEndTidScan(TidScanState *node)
 	if (node->ss.ps.ps_ResultTupleSlot)
 		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+	/* close heap scan */
+	if (scan != NULL)
+		heap_endscan(scan);
 }
 
 /* ----------------------------------------------------------------
@@ -525,11 +1062,12 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
 	ExecAssignExprContext(estate, &tidstate->ss.ps);
 
 	/*
-	 * mark tid list as not computed yet
+	 * mark tid range list as not computed yet
 	 */
-	tidstate->tss_TidList = NULL;
-	tidstate->tss_NumTids = 0;
-	tidstate->tss_TidPtr = -1;
+	tidstate->tss_TidRanges = NULL;
+	tidstate->tss_NumTidRanges = 0;
+	tidstate->tss_CurrentTidRange = -1;
+	tidstate->tss_inScan = false;
 
 	/*
 	 * open the scan relation
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7bf67a0..d9eb3fa 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1184,9 +1184,12 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	QualCost	qpqual_cost;
 	Cost		cpu_per_tuple;
 	QualCost	tid_qual_cost;
-	int			ntuples;
+	double		ntuples;
+	double		nrandompages;
+	double		nseqpages;
 	ListCell   *l;
 	double		spc_random_page_cost;
+	double		spc_seq_page_cost;
 
 	/* Should only be applied to base relations */
 	Assert(baserel->relid > 0);
@@ -1198,8 +1201,10 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	else
 		path->rows = baserel->rows;
 
-	/* Count how many tuples we expect to retrieve */
-	ntuples = 0;
+	/* Count how many tuples and pages we expect to scan */
+	ntuples = 0.0;
+	nrandompages = 0.0;
+	nseqpages = 0.0;
 	foreach(l, tidquals)
 	{
 		if (IsA(lfirst(l), ScalarArrayOpExpr))
@@ -1207,22 +1212,48 @@ cost_tidscan(Path *path, PlannerInfo *root,
 			/* Each element of the array yields 1 tuple */
 			ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) lfirst(l);
 			Node	   *arraynode = (Node *) lsecond(saop->args);
+			int			array_len = estimate_array_length(arraynode);
 
-			ntuples += estimate_array_length(arraynode);
+			ntuples += array_len;
+			nrandompages += array_len;
 		}
 		else if (IsA(lfirst(l), CurrentOfExpr))
 		{
 			/* CURRENT OF yields 1 tuple */
 			isCurrentOf = true;
-			ntuples++;
+			ntuples += 1.0;
+			nrandompages += 1.0;
 		}
 		else
 		{
-			/* It's just CTID = something, count 1 tuple */
-			ntuples++;
+			/* For anything else, we'll use the normal selectivity estimate. */
+			Selectivity selectivity = clause_selectivity(root, lfirst(l),
+														 baserel->relid,
+														 JOIN_INNER,
+														 NULL);
+			double		pages = ceil(selectivity * baserel->pages);
+
+			if (pages <= 0.0)
+				pages = 1.0;
+
+			/*
+			 * The first page in a range requires a random seek, but each
+			 * subsequent page is just a normal sequential page read.
+			 */
+			ntuples += selectivity * baserel->tuples;
+			nseqpages += pages - 1.0;
+			nrandompages += 1.0;
 		}
 	}
 
+	/* An empty tidquals list means we're going to scan the whole table. */
+	if (tidquals == NIL)
+	{
+		ntuples += baserel->tuples;
+		nseqpages += baserel->pages - 1.0;
+		nrandompages += 1.0;
+	}
+
 	/*
 	 * We must force TID scan for WHERE CURRENT OF, because only nodeTidscan.c
 	 * understands how to do it correctly.  Therefore, honor enable_tidscan
@@ -1248,15 +1279,21 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	/* fetch estimated page cost for tablespace containing table */
 	get_tablespace_page_costs(baserel->reltablespace,
 							  &spc_random_page_cost,
-							  NULL);
+							  &spc_seq_page_cost);
 
-	/* disk costs --- assume each tuple on a different page */
-	run_cost += spc_random_page_cost * ntuples;
+	/* disk costs */
+	run_cost += spc_random_page_cost * nrandompages + spc_seq_page_cost * nseqpages;
 
 	/* Add scanning CPU costs */
 	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
 
-	/* XXX currently we assume TID quals are a subset of qpquals */
+	/*
+	 * XXX currently we assume TID quals are a subset of qpquals at this
+	 * point; they will be removed (if possible) when we create the plan, so
+	 * we subtract their cost from the total qpqual cost.  (If the TID quals
+	 * can't be removed, this is a mistake and we're going to underestimate
+	 * the CPU cost a bit.)
+	 */
 	startup_cost += qpqual_cost.startup + tid_qual_cost.per_tuple;
 	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
 		tid_qual_cost.per_tuple;
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 3bb5b8d..3290294 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -4,13 +4,15 @@
  *	  Routines to determine which TID conditions are usable for scanning
  *	  a given relation, and create TidPaths accordingly.
  *
- * What we are looking for here is WHERE conditions of the form
- * "CTID = pseudoconstant", which can be implemented by just fetching
- * the tuple directly via heap_fetch().  We can also handle OR'd conditions
- * such as (CTID = const1) OR (CTID = const2), as well as ScalarArrayOpExpr
- * conditions of the form CTID = ANY(pseudoconstant_array).  In particular
- * this allows
- *		WHERE ctid IN (tid1, tid2, ...)
+ * What we are looking for here is WHERE conditions of the forms:
+ * - "CTID = pseudoconstant", which can be implemented by just fetching
+ *    the tuple directly via heap_fetch().
+ * - "CTID IN (pseudoconstant, ...)" or "CTID = ANY(pseudoconstant_array)"
+ * - "CTID > pseudoconstant", etc. for >, >=, <, and <=.
+ * - "CTID > pseudoconstant AND CTID < pseudoconstant AND ...", etc.
+ *
+ * We can also handle OR'd conditions of the above form, such as
+ * "(CTID = const1) OR (CTID >= const2) OR CTID IN (...)".
  *
  * We also support "WHERE CURRENT OF cursor" conditions (CurrentOfExpr),
  * which amount to "CTID = run-time-determined-TID".  These could in
@@ -46,33 +48,48 @@
 #include "optimizer/restrictinfo.h"
 
 
+static bool IsTidVar(Var *var, int varno);
+static bool IsTidBinaryExpression(OpExpr *node, int varno);
 static bool IsTidEqualClause(OpExpr *node, int varno);
+static bool IsTidRangeClause(OpExpr *node, int varno);
 static bool IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno);
+static List *MakeTidRangeQuals(List *quals);
+static List *TidCompoundRangeQualFromExpr(Node *expr, int varno);
 static List *TidQualFromExpr(Node *expr, int varno);
 static List *TidQualFromBaseRestrictinfo(RelOptInfo *rel);
 
 
+/* Quick check to see if `var` looks like CTID. */
+static bool
+IsTidVar(Var *var, int varno)
+{
+	return (var->varattno == SelfItemPointerAttributeNumber &&
+			var->vartype == TIDOID &&
+			var->varno == varno &&
+			var->varlevelsup == 0);
+}
+
 /*
  * Check to see if an opclause is of the form
- *		CTID = pseudoconstant
+ *		CTID OP pseudoconstant
  * or
- *		pseudoconstant = CTID
+ *		pseudoconstant OP CTID
+ * where OP is assumed to be a binary.  We don't check opno -- that's usually
+ * done by the caller -- but we check the numer of arguments.
  *
  * We check that the CTID Var belongs to relation "varno".  That is probably
  * redundant considering this is only applied to restriction clauses, but
  * let's be safe.
  */
 static bool
-IsTidEqualClause(OpExpr *node, int varno)
+IsTidBinaryExpression(OpExpr *node, int varno)
 {
 	Node	   *arg1,
 			   *arg2,
 			   *other;
 	Var		   *var;
 
-	/* Operator must be tideq */
-	if (node->opno != TIDEqualOperator)
-		return false;
+	/* Operator must be the expected one */
 	if (list_length(node->args) != 2)
 		return false;
 	arg1 = linitial(node->args);
@@ -83,19 +100,13 @@ IsTidEqualClause(OpExpr *node, int varno)
 	if (arg1 && IsA(arg1, Var))
 	{
 		var = (Var *) arg1;
-		if (var->varattno == SelfItemPointerAttributeNumber &&
-			var->vartype == TIDOID &&
-			var->varno == varno &&
-			var->varlevelsup == 0)
+		if (IsTidVar(var, varno))
 			other = arg2;
 	}
 	if (!other && arg2 && IsA(arg2, Var))
 	{
 		var = (Var *) arg2;
-		if (var->varattno == SelfItemPointerAttributeNumber &&
-			var->vartype == TIDOID &&
-			var->varno == varno &&
-			var->varlevelsup == 0)
+		if (IsTidVar(var, varno))
 			other = arg1;
 	}
 	if (!other)
@@ -112,6 +123,38 @@ IsTidEqualClause(OpExpr *node, int varno)
 
 /*
  * Check to see if a clause is of the form
+ *		CTID = pseudoconstant
+ * or
+ *		pseudoconstant = CTID
+ */
+static bool
+IsTidEqualClause(OpExpr *node, int varno)
+{
+	if (node->opno != TIDEqualOperator)
+		return false;
+	return IsTidBinaryExpression(node, varno);
+}
+
+/*
+ * Check to see if a clause is of the form
+ *		CTID op pseudoconstant
+ * or
+ *		pseudoconstant op CTID
+ * where op is a range comparison operator like >, >=, <, or <=.
+ */
+static bool
+IsTidRangeClause(OpExpr *node, int varno)
+{
+	if (node->opno != TIDLessOperator &&
+		node->opno != TIDLessEqOperator &&
+		node->opno != TIDGreaterOperator &&
+		node->opno != TIDGreaterEqOperator)
+		return false;
+	return IsTidBinaryExpression(node, varno);
+}
+
+/*
+ * Check to see if a clause is of the form
  *		CTID = ANY (pseudoconstant_array)
  */
 static bool
@@ -134,10 +177,7 @@ IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno)
 	{
 		Var		   *var = (Var *) arg1;
 
-		if (var->varattno == SelfItemPointerAttributeNumber &&
-			var->vartype == TIDOID &&
-			var->varno == varno &&
-			var->varlevelsup == 0)
+		if (IsTidVar(var, varno))
 		{
 			/* The other argument must be a pseudoconstant */
 			if (is_pseudo_constant_clause(arg2))
@@ -149,6 +189,46 @@ IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno)
 }
 
 /*
+ * Turn a list of range quals into the expected structure: if there's more than
+ * one, wrap them in a top-level AND-clause.
+ */
+static List *
+MakeTidRangeQuals(List *quals)
+{
+	if (list_length(quals) == 1)
+		return quals;
+	else
+		return list_make1(make_andclause(quals));
+}
+
+/*
+ * TidCompoundRangeQualFromExpr
+ *
+ * 		Extract a compound CTID range condition from the given qual expression
+ */
+static List *
+TidCompoundRangeQualFromExpr(Node *expr, int varno)
+{
+	ListCell   *l;
+	List	   *found_quals = NIL;
+
+	foreach(l, ((BoolExpr *) expr)->args)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* If this clause contains a range qual, add it to the list. */
+		if (is_opclause(clause) && IsTidRangeClause((OpExpr *) clause, varno))
+			found_quals = lappend(found_quals, clause);
+	}
+
+	/* If we found any, make an AND clause out of them. */
+	if (found_quals)
+		return MakeTidRangeQuals(found_quals);
+	else
+		return NIL;
+}
+
+/*
  *	Extract a set of CTID conditions from the given qual expression
  *
  *	Returns a List of CTID qual expressions (with implicit OR semantics
@@ -174,6 +254,8 @@ TidQualFromExpr(Node *expr, int varno)
 		/* base case: check for tideq opclause */
 		if (IsTidEqualClause((OpExpr *) expr, varno))
 			rlst = list_make1(expr);
+		else if (IsTidRangeClause((OpExpr *) expr, varno))
+			rlst = list_make1(expr);
 	}
 	else if (expr && IsA(expr, ScalarArrayOpExpr))
 	{
@@ -189,11 +271,18 @@ TidQualFromExpr(Node *expr, int varno)
 	}
 	else if (and_clause(expr))
 	{
-		foreach(l, ((BoolExpr *) expr)->args)
+		/* look for a range qual in the clause */
+		rlst = TidCompoundRangeQualFromExpr(expr, varno);
+
+		/* if no range qual was found, look for any other TID qual */
+		if (rlst == NIL)
 		{
-			rlst = TidQualFromExpr((Node *) lfirst(l), varno);
-			if (rlst)
-				break;
+			foreach(l, ((BoolExpr *) expr)->args)
+			{
+				rlst = TidQualFromExpr((Node *) lfirst(l), varno);
+				if (rlst)
+					break;
+			}
 		}
 	}
 	else if (or_clause(expr))
@@ -217,17 +306,24 @@ TidQualFromExpr(Node *expr, int varno)
 }
 
 /*
- *	Extract a set of CTID conditions from the rel's baserestrictinfo list
+ * Extract a set of CTID conditions from the rel's baserestrictinfo list
+ *
+ * Normally we just use the first RestrictInfo item with some usable quals,
+ * but it's also possible for a good compound range qual, such as
+ * "CTID > ? AND CTID < ?", to be split across multiple items.  So we look for
+ * range quals in all items and use them if any were found.
  */
 static List *
 TidQualFromBaseRestrictinfo(RelOptInfo *rel)
 {
 	List	   *rlst = NIL;
 	ListCell   *l;
+	List	   *found_quals = NIL;
 
 	foreach(l, rel->baserestrictinfo)
 	{
 		RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+		Node	   *clause = (Node *) rinfo->clause;
 
 		/*
 		 * If clause must wait till after some lower-security-level
@@ -236,10 +332,24 @@ TidQualFromBaseRestrictinfo(RelOptInfo *rel)
 		if (!restriction_is_securely_promotable(rinfo, rel))
 			continue;
 
-		rlst = TidQualFromExpr((Node *) rinfo->clause, rel->relid);
+		/* If this clause contains a range qual, add it to the list. */
+		if (is_opclause(clause) &&
+			IsTidRangeClause((OpExpr *) clause, rel->relid))
+		{
+			found_quals = lappend(found_quals, clause);
+			continue;
+		}
+
+		/* Look for other TID quals. */
+		rlst = TidQualFromExpr((Node *) clause, rel->relid);
 		if (rlst)
 			break;
 	}
+
+	/* Use a range qual if any were found. */
+	if (found_quals)
+		rlst = MakeTidRangeQuals(found_quals);
+
 	return rlst;
 }
 
@@ -264,6 +374,7 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 
 	tidquals = TidQualFromBaseRestrictinfo(rel);
 
+	/* If there are tidquals, then it's worth generating a tidscan path. */
 	if (tidquals)
 		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals,
 												   required_outer));
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index da7a920..e2c0bce 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -3081,14 +3081,37 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	}
 
 	/*
-	 * Remove any clauses that are TID quals.  This is a bit tricky since the
-	 * tidquals list has implicit OR semantics.
+	 * Remove the tidquals from the scan clauses if possible, which is
+	 * generally if the tidquals were taken verbatim from any of the
+	 * RelOptInfo items.  If the tidquals don't represent the entire
+	 * RelOptInfo qual, then nothing will be removed.  Note that the tidquals
+	 * is a list; if there is more than one, we have to rebuild the equivalent
+	 * OR clause to find a match.
 	 */
 	ortidquals = tidquals;
 	if (list_length(ortidquals) > 1)
 		ortidquals = list_make1(make_orclause(ortidquals));
 	scan_clauses = list_difference(scan_clauses, ortidquals);
 
+	/*
+	 * In the case of a single compound qual such as "ctid > ? AND ...", the
+	 * various parts may have come from different RestrictInfos.  So remove
+	 * each part separately.  (This doesn't happen for multiple compound
+	 * quals, because the top-level OR clause can't be split over multiple
+	 * RestrictInfos.
+	 */
+	if (list_length(tidquals) == 1)
+	{
+		Node	   *qual = linitial(tidquals);
+
+		if (and_clause(qual))
+		{
+			BoolExpr   *and_qual = ((BoolExpr *) qual);
+
+			scan_clauses = list_difference(scan_clauses, and_qual->args);
+		}
+	}
+
 	scan_plan = make_tidscan(tlist,
 							 scan_clauses,
 							 scan_relid,
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index ce23c2f..7476916 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -156,15 +156,15 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index ac03f46..51c04b9 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1537,15 +1537,18 @@ typedef struct BitmapHeapScanState
 	ParallelBitmapHeapState *pstate;
 } BitmapHeapScanState;
 
+typedef struct TidRange TidRange;
+
 /* ----------------
  *	 TidScanState information
  *
- *		tidexprs	   list of TidExpr structs (see nodeTidscan.c)
- *		isCurrentOf    scan has a CurrentOfExpr qual
- *		NumTids		   number of tids in this scan
- *		TidPtr		   index of currently fetched tid
- *		TidList		   evaluated item pointers (array of size NumTids)
- *		htup		   currently-fetched tuple, if any
+ *		tidexprs		list of TidExpr structs (see nodeTidscan.c)
+ *		isCurrentOf		scan has a CurrentOfExpr qual
+ *		NumTidRanges	number of tid ranges in this scan
+ *		CurrentTidRange	index of current tid range
+ *		TidRanges		evaluated item pointers (array of size NumTidRanges)
+ *		inScan			currently in a range scan
+ *		htup			currently-fetched tuple, if any
  * ----------------
  */
 typedef struct TidScanState
@@ -1553,10 +1556,11 @@ typedef struct TidScanState
 	ScanState	ss;				/* its first field is NodeTag */
 	List	   *tss_tidexprs;
 	bool		tss_isCurrentOf;
-	int			tss_NumTids;
-	int			tss_TidPtr;
-	ItemPointerData *tss_TidList;
-	HeapTupleData tss_htup;
+	int			tss_NumTidRanges;
+	int			tss_CurrentTidRange;
+	TidRange   *tss_TidRanges;
+	bool		tss_inScan;		/* for range scans */
+	HeapTupleData tss_htup;		/* for current-of and single TID fetches */
 } TidScanState;
 
 /* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 64139f8..bc9ff54 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -478,7 +478,8 @@ typedef struct BitmapHeapScan
  *		tid scan node
  *
  * tidquals is an implicitly OR'ed list of qual expressions of the form
- * "CTID = pseudoconstant" or "CTID = ANY(pseudoconstant_array)".
+ * "CTID = pseudoconstant", "CTID = ANY(pseudoconstant_array)", or
+ * "(CTID OP pseudoconstant AND ...)" for OP in >, >=, <, <=.
  * ----------------
  */
 typedef struct TidScan
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6fd2420..895849f 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1228,14 +1228,21 @@ typedef struct BitmapOrPath
 /*
  * TidPath represents a scan by TID
  *
- * tidquals is an implicitly OR'ed list of qual expressions of the form
- * "CTID = pseudoconstant" or "CTID = ANY(pseudoconstant_array)".
+ * tidquals is an implicitly OR'ed list of qual expressions of the forms:
+ *   - "CTID = pseudoconstant"
+ *   - "CTID = ANY(pseudoconstant_array)"
+ *   - "CURRENT OF cursor"
+ *   - "(CTID relop pseudoconstant AND ...)"
+ *
+ * If tidquals is empty, all CTIDs will match (contrary to the usual meaning
+ * of an empty disjunction).
+ *
  * Note they are bare expressions, not RestrictInfos.
  */
 typedef struct TidPath
 {
 	Path		path;
-	List	   *tidquals;		/* qual(s) involving CTID = something */
+	List	   *tidquals;
 } TidPath;
 
 /*
diff --git a/src/test/regress/expected/tidscan.out b/src/test/regress/expected/tidscan.out
index 521ed1b..8083909 100644
--- a/src/test/regress/expected/tidscan.out
+++ b/src/test/regress/expected/tidscan.out
@@ -177,3 +177,253 @@ UPDATE tidscan SET id = -id WHERE CURRENT OF c RETURNING *;
 ERROR:  cursor "c" is not positioned on a row
 ROLLBACK;
 DROP TABLE tidscan;
+-- tests for tidrangescans
+CREATE TABLE tidrangescan(id integer, data text);
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid <= '(1,5)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+ (1,1)
+ (1,2)
+ (1,3)
+ (1,4)
+ (1,5)
+(15 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid < '(0,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid > '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+  ctid  
+--------
+ (9,9)
+ (9,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ('(9,8)'::tid < ctid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+  ctid  
+--------
+ (9,9)
+ (9,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid >= '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+  ctid  
+--------
+ (9,8)
+ (9,9)
+ (9,10)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (ctid >= '(100,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ((ctid > '(4,4)'::tid) AND ('(4,7)'::tid >= ctid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+ ctid  
+-------
+ (4,5)
+ (4,6)
+ (4,7)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: (('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+ ctid  
+-------
+ (4,5)
+ (4,6)
+ (4,7)
+(3 rows)
+
+-- combinations
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+                                        QUERY PLAN                                         
+-------------------------------------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ((('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid)) OR (ctid = '(2,2)'::tid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+ ctid  
+-------
+ (2,2)
+ (4,5)
+ (4,6)
+ (4,7)
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+                                                     QUERY PLAN                                                     
+--------------------------------------------------------------------------------------------------------------------
+ Tid Scan on tidrangescan
+   TID Cond: ((('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid)) OR (ctid = '(2,2)'::tid))
+   Filter: ((('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid)) OR ((ctid = '(2,2)'::tid) AND (data = 'foo'::text)))
+(3 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+ ctid  
+-------
+ (4,5)
+ (4,6)
+ (4,7)
+(3 rows)
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan where ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan where ctid < '(0,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+-- make sure ranges are combined correctly
+SELECT COUNT(*) FROM tidrangescan WHERE ctid < '(0,3)' OR ctid >= '(0,2)' AND ctid <= '(0,5)';
+ count 
+-------
+     5
+(1 row)
+
+SELECT COUNT(*) FROM tidrangescan WHERE ctid <= '(0,10)' OR ctid >= '(0,2)' AND ctid <= '(0,5)';
+ count 
+-------
+    10
+(1 row)
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan_empty
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+ ctid 
+------
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Scan on tidrangescan_empty
+   TID Cond: (ctid > '(9,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+ ctid 
+------
+(0 rows)
+
diff --git a/src/test/regress/sql/tidscan.sql b/src/test/regress/sql/tidscan.sql
index a8472e0..02b094a 100644
--- a/src/test/regress/sql/tidscan.sql
+++ b/src/test/regress/sql/tidscan.sql
@@ -64,3 +64,79 @@ UPDATE tidscan SET id = -id WHERE CURRENT OF c RETURNING *;
 ROLLBACK;
 
 DROP TABLE tidscan;
+
+-- tests for tidrangescans
+
+CREATE TABLE tidrangescan(id integer, data text);
+
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+
+-- combinations
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)' OR ctid = '(2,2)' AND data = 'foo';
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan where ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+SELECT ctid FROM tidrangescan where ctid < '(0,0)' LIMIT 1;
+
+-- make sure ranges are combined correctly
+SELECT COUNT(*) FROM tidrangescan WHERE ctid < '(0,3)' OR ctid >= '(0,2)' AND ctid <= '(0,5)';
+
+SELECT COUNT(*) FROM tidrangescan WHERE ctid <= '(0,10)' OR ctid >= '(0,2)' AND ctid <= '(0,5)';
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
-- 
2.7.4

#24

David Rowley

david.rowley@2ndquadrant.com

about 7 years ago

In reply to: Edmund Horner (#23)

Re: Tid scan improvements

Review of v5:

0001: looks good.

0002:

1. I don't think you need palloc0() here. palloc() looks like it would be fine.

if (tidRangeArray->ranges == NULL)
tidRangeArray->ranges = (TidRange *)
palloc0(tidRangeArray->numAllocated * sizeof(TidRange));

if that wasn't the case, then you'll need to also zero the additional
memory when you repalloc().

2. Can't the following code be moved into the correct
forwards/backwards if block inside the if inscan block above?

/* If we've finished iterating over the ranges, exit the loop. */
if (node->tss_CurrentTidRange >= numRanges ||
node->tss_CurrentTidRange < 0)
break;

Something like:

if (bBackward)
{
if (node->tss_CurrentTidRange < 0)
{
/* initialize for backward scan */
node->tss_CurrentTidRange = numRanges - 1;
}
else if (node->tss_CurrentTidRange == 0)
break;
else
node->tss_CurrentTidRange--;
}
else
{
if (node->tss_CurrentTidRange < 0)
{
/* initialize for forward scan */
node->tss_CurrentTidRange = 0;
}
else if (node->tss_CurrentTidRange >= numRanges - 1)
break;
else
node->tss_CurrentTidRange++;
}

I think that's a few less lines and instructions and (I think) a bit neater too.

3. if (found_quals != NIL) (yeah, I Know there's already lots of
places not doing this)

/* If we found any, make an AND clause out of them. */
if (found_quals)

likewise in:

/* Use a range qual if any were found. */
if (found_quals)

4. The new tests in tidscan.sql should drop the newly created tables.
(I see some get dropped in the 0004 patch, but not all. Best not to
rely on a later patch to do work that this patch should do)

0003: looks okay.

0004:

5. Please add a comment to scandir in:

typedef struct TidScan
{
Scan scan;
List *tidquals; /* qual(s) involving CTID = something */
ScanDirection scandir;
} TidScan;

/* forward or backward or don't care */ would do.

Likewise for struct TidPath. Likely IndexPath can be used for guidance.

6. Is it worth adding a Merge Join regression test for this patch?

Something like:

postgres=# explain select * from t1 inner join t1 t2 on t1.ctid =
t2.ctid order by t1.ctid desc;
QUERY PLAN
-----------------------------------------------------------------------------
Merge Join (cost=0.00..21.25 rows=300 width=14)
Merge Cond: (t1.ctid = t2.ctid)
-> Tid Scan Backward on t1 (cost=0.00..8.00 rows=300 width=10)
-> Materialize (cost=0.00..8.75 rows=300 width=10)
-> Tid Scan Backward on t1 t2 (cost=0.00..8.00 rows=300 width=10)
(5 rows)

0005:

7. I see the logic behind this new patch, but quite possibly the
majority of the time the relpages will be out of date and you'll
mistakenly apply this to not the final page. I'm neither here nor
there with it. I imagine you might feel the same since you didn't
merge it with 0001. Maybe we can leave it out for now and see what
others think.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#25

Tom Lane

tgl@sss.pgh.pa.us

about 7 years ago

In reply to: Edmund Horner (#23)

Re: Tid scan improvements

Edmund Horner <ejrh00@gmail.com> writes:

[ tid scan patches ]

I'm having a hard time wrapping my mind around why you'd bother with
backwards TID scans. The amount of code needed versus the amount of
usefulness seems like a pretty bad cost/benefit ratio, IMO. I can
see that there might be value in knowing that a regular scan has
"ORDER BY ctid ASC" pathkeys (mainly, that it might let us mergejoin
on TID without an explicit sort). It does not, however, follow that
there's any additional value in supporting the DESC case.

regards, tom lane

#26

Andres Freund

andres@anarazel.de

about 7 years ago

In reply to: Tom Lane (#25)

Re: Tid scan improvements

Hi,

On 2018-12-20 17:21:07 -0500, Tom Lane wrote:

Edmund Horner <ejrh00@gmail.com> writes:

[ tid scan patches ]

I'm having a hard time wrapping my mind around why you'd bother with
backwards TID scans. The amount of code needed versus the amount of
usefulness seems like a pretty bad cost/benefit ratio, IMO. I can
see that there might be value in knowing that a regular scan has
"ORDER BY ctid ASC" pathkeys (mainly, that it might let us mergejoin
on TID without an explicit sort). It does not, however, follow that
there's any additional value in supporting the DESC case.

I've not followed this thread, but wouldn't that be quite useful to be
able to move old tuples to free space earlier in the table?

I've written multiple scripts that update the later pages in a table, to
force reuse of earlier free pages (in my case by generating ctid = ANY()
style queries with all possible tids for the last few pages, the most
efficient way I could think of).

Greetings,

Andres Freund

#27

Tom Lane

tgl@sss.pgh.pa.us

about 7 years ago

In reply to: Andres Freund (#26)

Re: Tid scan improvements

Andres Freund <andres@anarazel.de> writes:

On 2018-12-20 17:21:07 -0500, Tom Lane wrote:

I'm having a hard time wrapping my mind around why you'd bother with
backwards TID scans.

I've not followed this thread, but wouldn't that be quite useful to be
able to move old tuples to free space earlier in the table?
I've written multiple scripts that update the later pages in a table, to
force reuse of earlier free pages (in my case by generating ctid = ANY()
style queries with all possible tids for the last few pages, the most
efficient way I could think of).

Sure, but wouldn't you now write those using something on the order of

WHERE ctid >= '(cutoff_page_here, 1)'

? I don't see that you'd want to write "ORDER BY ctid DESC LIMIT n"
because you wouldn't know what value of n to use to get all the
tuples on some-number-of-ending-pages.

regards, tom lane

#28

Andres Freund

andres@anarazel.de

about 7 years ago

In reply to: Tom Lane (#27)

Re: Tid scan improvements

Hi,

On 2018-12-20 18:06:41 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2018-12-20 17:21:07 -0500, Tom Lane wrote:

I'm having a hard time wrapping my mind around why you'd bother with
backwards TID scans.

I've not followed this thread, but wouldn't that be quite useful to be
able to move old tuples to free space earlier in the table?
I've written multiple scripts that update the later pages in a table, to
force reuse of earlier free pages (in my case by generating ctid = ANY()
style queries with all possible tids for the last few pages, the most
efficient way I could think of).

Sure, but wouldn't you now write those using something on the order of

WHERE ctid >= '(cutoff_page_here, 1)'

? I don't see that you'd want to write "ORDER BY ctid DESC LIMIT n"
because you wouldn't know what value of n to use to get all the
tuples on some-number-of-ending-pages.

I think you'd want both, to make sure there's not more tuples than
estimated. With the limit calculated to ensure there's enough free space
for them to actually fit.

Greetings,

Andres Freund

#29

Edmund Horner

ejrh00@gmail.com

about 7 years ago

In reply to: Tom Lane (#25)

Re: Tid scan improvements

On Fri, 21 Dec 2018 at 11:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Edmund Horner <ejrh00@gmail.com> writes:

[ tid scan patches ]

I'm having a hard time wrapping my mind around why you'd bother with
backwards TID scans. The amount of code needed versus the amount of
usefulness seems like a pretty bad cost/benefit ratio, IMO. I can
see that there might be value in knowing that a regular scan has
"ORDER BY ctid ASC" pathkeys (mainly, that it might let us mergejoin
on TID without an explicit sort). It does not, however, follow that
there's any additional value in supporting the DESC case.

I have occasionally found myself running "SELECT MAX(ctid) FROM t"
when I was curious about why a table is so big after vacuuming.

Perhaps that's not a common enough use case to justify the amount of
code, especially the changes to heapam.c and explain.c.

We'd still need the pathkeys to make good use of forward scans. (And
I think the executor still needs to support seeking backward for
cursors.)

#30

David Rowley

david.rowley@2ndquadrant.com

about 7 years ago

In reply to: Edmund Horner (#29)

Re: Tid scan improvements

On Fri, 21 Dec 2018 at 13:09, Edmund Horner <ejrh00@gmail.com> wrote:

On Fri, 21 Dec 2018 at 11:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm having a hard time wrapping my mind around why you'd bother with
backwards TID scans. The amount of code needed versus the amount of
usefulness seems like a pretty bad cost/benefit ratio, IMO. I can
see that there might be value in knowing that a regular scan has
"ORDER BY ctid ASC" pathkeys (mainly, that it might let us mergejoin
on TID without an explicit sort). It does not, however, follow that
there's any additional value in supporting the DESC case.

I have occasionally found myself running "SELECT MAX(ctid) FROM t"
when I was curious about why a table is so big after vacuuming.

Perhaps that's not a common enough use case to justify the amount of
code, especially the changes to heapam.c and explain.c.

We'd still need the pathkeys to make good use of forward scans. (And
I think the executor still needs to support seeking backward for
cursors.)

I think the best thing to do here is separate out all the additional
backwards scan code into a separate patch to allow it to be easier
considered and approved, or rejected. I think if there's any hint of
this blocking the main patch then it should be a separate patch to
allow it's worth to be considered independently.

Also, my primary interest in this patch is to find tuples that are
stopping the heap being truncated during a vacuum. Generally, when I'm
looking for that I have a good idea of what size I expect the relation
should be, (otherwise I'd not think it was bloated), in which case I'd
be doing WHERE ctid >= '(N,1)'. However, it might be easier to write
some auto-bloat-removal script if we could have an ORDER BY ctid DESC
LIMIT n.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#31

Edmund Horner

ejrh00@gmail.com

about 7 years ago

In reply to: David Rowley (#30)

Re: Tid scan improvements

On Fri, 21 Dec 2018 at 13:25, David Rowley <david.rowley@2ndquadrant.com> wrote:

On Fri, 21 Dec 2018 at 13:09, Edmund Horner <ejrh00@gmail.com> wrote:

On Fri, 21 Dec 2018 at 11:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm having a hard time wrapping my mind around why you'd bother with
backwards TID scans. The amount of code needed versus the amount of
usefulness seems like a pretty bad cost/benefit ratio, IMO. I can
see that there might be value in knowing that a regular scan has
"ORDER BY ctid ASC" pathkeys (mainly, that it might let us mergejoin
on TID without an explicit sort). It does not, however, follow that
there's any additional value in supporting the DESC case.

I have occasionally found myself running "SELECT MAX(ctid) FROM t"
when I was curious about why a table is so big after vacuuming.

Perhaps that's not a common enough use case to justify the amount of
code, especially the changes to heapam.c and explain.c.

We'd still need the pathkeys to make good use of forward scans. (And
I think the executor still needs to support seeking backward for
cursors.)

I think the best thing to do here is separate out all the additional
backwards scan code into a separate patch to allow it to be easier
considered and approved, or rejected. I think if there's any hint of
this blocking the main patch then it should be a separate patch to
allow it's worth to be considered independently.

Yeah I think you're right. I'll separate those parts into the basic
forward scan, and then the optional backward scan support. I think
we'll still only generate a backward scan if the query_pathkeys makes
use of it.

For the forward scan, I seem to recall, from your merge join example,
that it's useful to set the pathkeys even when there are no
query_pathkeys. We just have to unconditionally set them so that the
larger plan can make use of them.

Show quoted text

Also, my primary interest in this patch is to find tuples that are
stopping the heap being truncated during a vacuum. Generally, when I'm
looking for that I have a good idea of what size I expect the relation
should be, (otherwise I'd not think it was bloated), in which case I'd
be doing WHERE ctid >= '(N,1)'. However, it might be easier to write
some auto-bloat-removal script if we could have an ORDER BY ctid DESC
LIMIT n.

#32

Tom Lane

tgl@sss.pgh.pa.us

about 7 years ago

In reply to: Edmund Horner (#31)

Re: Tid scan improvements

Edmund Horner <ejrh00@gmail.com> writes:

For the forward scan, I seem to recall, from your merge join example,
that it's useful to set the pathkeys even when there are no
query_pathkeys. We just have to unconditionally set them so that the
larger plan can make use of them.

No. Look at indxpath.c: it does not worry about pathkeys unless
has_useful_pathkeys is true, and it definitely does not generate
pathkeys that don't get past truncate_useless_pathkeys. Those
functions are responsible for worrying about whether mergejoin
can use the pathkeys. It's not tidpath.c's job to outthink them.

regards, tom lane

#33

Edmund Horner

ejrh00@gmail.com

about 7 years ago

In reply to: Tom Lane (#32)

Re: Tid scan improvements

On Fri, 21 Dec 2018 at 16:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Edmund Horner <ejrh00@gmail.com> writes:

For the forward scan, I seem to recall, from your merge join example,
that it's useful to set the pathkeys even when there are no
query_pathkeys. We just have to unconditionally set them so that the
larger plan can make use of them.

No. Look at indxpath.c: it does not worry about pathkeys unless
has_useful_pathkeys is true, and it definitely does not generate
pathkeys that don't get past truncate_useless_pathkeys. Those
functions are responsible for worrying about whether mergejoin
can use the pathkeys. It's not tidpath.c's job to outthink them.

Ok. I think that will simplify things. So if I follow you correctly,
we should do:

1. If has_useful_pathkeys is true: generate pathkeys (for CTID ASC),
and use truncate_useless_pathkeys on them.
2. If we have tid quals or pathkeys, emit a TID scan path.

For the (optional) backwards scan support patch, should we separately
emit another path, in the reverse direction? (My current patch only
creates one path, and tries to decide what the best direction is by
looking at query_pathkeys. This doesn't fit into the above
algorithm.)

#34

Tom Lane

tgl@sss.pgh.pa.us

about 7 years ago

In reply to: Edmund Horner (#33)

Re: Tid scan improvements

Edmund Horner <ejrh00@gmail.com> writes:

Ok. I think that will simplify things. So if I follow you correctly,
we should do:

1. If has_useful_pathkeys is true: generate pathkeys (for CTID ASC),
and use truncate_useless_pathkeys on them.
2. If we have tid quals or pathkeys, emit a TID scan path.

Check.

For the (optional) backwards scan support patch, should we separately
emit another path, in the reverse direction?

What indxpath.c does is, if has_useful_pathkeys is true, to generate
pathkeys both ways and then build paths if the pathkeys get past
truncate_useless_pathkeys. That seems sufficient in this case too.
There are various heuristics about whether it's really useful to
consider both sort directions, but that intelligence is already
built into truncate_useless_pathkeys. tid quals with no pathkeys
would be reason to generate a forward path, but not reason to
generate a reverse path, because then that would be duplicative.

regards, tom lane

#35

Tom Lane

tgl@sss.pgh.pa.us

about 7 years ago

In reply to: Edmund Horner (#23)

Re: Tid scan improvements

BTW, with respect to this bit in 0001:

@@ -1795,6 +1847,15 @@ nulltestsel(PlannerInfo *root, NullTestType nulltesttype, Node *arg,
                 return (Selectivity) 0; /* keep compiler quiet */
         }
     }
+    else if (vardata.var && IsA(vardata.var, Var) &&
+             ((Var *) vardata.var)->varattno == SelfItemPointerAttributeNumber)
+    {
+        /*
+         * There are no stats for system columns, but we know CTID is never
+         * NULL.
+         */
+        selec = (nulltesttype == IS_NULL) ? 0.0 : 1.0;
+    }
     else
     {
         /*

I'm not entirely sure why you're bothering; surely nulltestsel is
unrelated to what this patch is about? And would anybody really
write "WHERE ctid IS NULL"?

However, if we do think it's worth adding code to cover this case,
I wouldn't make it specific to CTID. *All* system columns can be
assumed not null, see heap_getsysattr().

regards, tom lane

#36

Edmund Horner

ejrh00@gmail.com

almost 7 years ago

In reply to: Tom Lane (#35)

Re: Tid scan improvements

On Sat, 22 Dec 2018 at 07:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:

BTW, with respect to this bit in 0001:

@@ -1795,6 +1847,15 @@ nulltestsel(PlannerInfo *root, NullTestType
nulltesttype, Node *arg,
return (Selectivity) 0; /* keep compiler quiet */
}
}
+    else if (vardata.var && IsA(vardata.var, Var) &&
+             ((Var *) vardata.var)->varattno ==
SelfItemPointerAttributeNumber)
+    {
+        /*
+         * There are no stats for system columns, but we know CTID is
never
+         * NULL.
+         */
+        selec = (nulltesttype == IS_NULL) ? 0.0 : 1.0;
+    }
else
{
/*

I'm not entirely sure why you're bothering; surely nulltestsel is
unrelated to what this patch is about? And would anybody really
write "WHERE ctid IS NULL"?

I found that it made a difference with selectivity of range comparisons,
because clauselist_selectivity tries to correct for it (clausesel.c:274):

s2 = rqlist->hibound + rqlist->lobound - 1.0

/* Adjust for double-exclusion of NULLs */
s2 += nulltestsel(root, IS_NULL, rqlist->var,
varRelid, jointype, sjinfo);

It was adding DEFAULT_UNK_SEL = 0.005 to the selectivity, which (while not
major) did make the selectivity less accurate.

However, if we do think it's worth adding code to cover this case,

I wouldn't make it specific to CTID. *All* system columns can be
assumed not null, see heap_getsysattr().

I guess we could have a standalone patch to add this for all system columns?

#37

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Edmund Horner (#36)

Re: Tid scan improvements

Edmund Horner <ejrh00@gmail.com> writes:

On Sat, 22 Dec 2018 at 07:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm not entirely sure why you're bothering; surely nulltestsel is
unrelated to what this patch is about?

I found that it made a difference with selectivity of range comparisons,
because clauselist_selectivity tries to correct for it (clausesel.c:274):

Oh, I see.

I guess we could have a standalone patch to add this for all system columns?

regards, tom lane

#38

Edmund Horner

ejrh00@gmail.com

almost 7 years ago

In reply to: Tom Lane (#34)

Re: Tid scan improvements

Hi all,

I am a bit stuck and I think it's best to try to explain where.

I'm still rebasing the patches for the changes Tom made to support
parameterised TID paths for joins. While the addition of join support
itself does not really touch the same code, the modernisation -- in
particular, returning a list of RestrictInfos rather than raw quals -- does
rewrite quite a bit of tidpath.c.

The original code returned:

List (with OR semantics)
CTID = ? or CTID = ANY (...) or IS CURRENT OF
(more items)

That changed recently to return:

List (with OR semantics)
RestrictInfo
CTID = ? or ...
(more items)

My last set of patches extended the tidqual extraction to pull out lists
(with AND semantics) of range quals of the form CTID < ?, etc. Each list
of more than one item was converted into an AND clause before being added
to the tidqual list; a single range qual can be added to tidquals as is.

This required looking across multiple RestrictInfos at the top level, for
example:

- "WHERE ctid > ? AND ctid < ?" would arrive at tidpath as a list of two
RestrictInfos, from which we extract a single tidqual in the form of an AND
clause.
- "WHERE ctid = ? OR (ctid > ? AND ctid < ?)" arrives as only one
RestrictInfo, but we extract two tidquals (an OpExpr, and an AND clause).

The code could also ignore additional unusable quals from a list of
top-level RestrictInfos, or from a list of quals from an AND clause, for
example:

- "WHERE foo = ? AND ctid > ? AND ctid < ?" gives us the single tidqual
"ctid > ? AND ctid < ?".
- "WHERE (ctid = ? AND bar = ?) OR (foo = ? AND ctid > ? AND ctid < ?)"
gives us the two tidquals "ctid = ?" and "ctid > ? AND ctid < ?".

As the extracted tidquals no longer match the original query quals, they
aren't removed from scan_clauses in createplan.c, and hence are correctly
checked by the filter.

Aside: The analogous situation with an indexed user attribute "x" behaves a
bit differently:
- "WHERE x = ? OR (x > ? AND x < ?)", won't use a regular index scan, but
might use a bitmap index scan.

My patch uses the same path type and executor for all extractable tidquals.

This worked pretty well, but I am finding it difficult to reimplement it in
the new tidpath.c code.

In the query information given to the path generator, there is no existing
RestrictInfo relating to the whole expression "ctid > ? AND ctid < ?". I
am still learning about RestrictInfos, but my understanding is it doesn't
make sense to have a RestrictInfo for an AND clause, anyway; you're
supposed to have them for the sub-expressions of it.

And it doesn't seem a good idea to try to create new RestrictInfos in the
path generation just to pass the tidquals back to plan creation. They're
complicated objects.

There's also the generation of scan_clauses in create_tidscan_plan
(createplan.c:3107). This now uses RestrictInfos -- I'd image we'd need
each AND clause to be wrapped in a RestrictInfo to be able to check it
properly.

To summarise, I'm not sure what kind of structure I should add to the
tidquals list to represent a compound range expression. Maybe it's better
to create a different path (either a new path type, or a flag in TidPath to
say what kind of quals are attached) ?

Edmund

#39

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Edmund Horner (#38)

Re: Tid scan improvements

Edmund Horner <ejrh00@gmail.com> writes:

My patch uses the same path type and executor for all extractable tidquals.

This worked pretty well, but I am finding it difficult to reimplement it in
the new tidpath.c code.

I didn't like that approach to begin with, and would suggest that you go
over to using a separate path type and executor node. I don't think the
amount of commonality for the two cases is all that large, and doing it
as you had it required some ugly ad-hoc conventions about the semantics
of the tidquals list. Where I think this should go is that the tidquals
list still has OR semantics in the existing path type, but you use AND
semantics in the new path type, so that "ctid > ? AND ctid < ?" is just
represented as an implicit-AND list of two simple RestrictInfos.

Now admittedly, this wouldn't give us an efficient way to execute
queries with conditions like "WHERE ctid = X OR (ctid > Y AND ctid < Z)",
but I find myself quite unable to get excited about supporting that.
I see no reason for the new code to worry about any cases more complex
than one or two TID inequalities at top level of the restriction list.

In the query information given to the path generator, there is no existing
RestrictInfo relating to the whole expression "ctid > ? AND ctid < ?". I
am still learning about RestrictInfos, but my understanding is it doesn't
make sense to have a RestrictInfo for an AND clause, anyway; you're
supposed to have them for the sub-expressions of it.

FWIW, the actual data structure for cases like that is that there's
a RestrictInfo for the whole clause ctid = X OR (ctid > Y AND ctid < Z),
and if you look into its "orclause" field, you will find RestrictInfos
attached to the primitive clauses ctid = X, ctid > Y, ctid < Z. (The
old code in tidpath.c didn't know that, because it'd never been rewritten
since RestrictInfos were invented.) However, I think this new code should
not worry about OR cases at all, but just pull out top-level TID
comparison clauses.

And it doesn't seem a good idea to try to create new RestrictInfos in the
path generation just to pass the tidquals back to plan creation.

No, you should avoid that. There are places that assume there's only
one RestrictInfo for any given original clause (or sub-clause).

regards, tom lane

#40

Edmund Horner

ejrh00@gmail.com

almost 7 years ago

In reply to: Tom Lane (#39)

Re: Tid scan improvements

On Sat, 19 Jan 2019 at 05:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Edmund Horner <ejrh00@gmail.com> writes:

My patch uses the same path type and executor for all extractable

tidquals.

This worked pretty well, but I am finding it difficult to reimplement it

in

the new tidpath.c code.

I didn't like that approach to begin with, and would suggest that you go
over to using a separate path type and executor node. I don't think the
amount of commonality for the two cases is all that large, and doing it
as you had it required some ugly ad-hoc conventions about the semantics
of the tidquals list. Where I think this should go is that the tidquals
list still has OR semantics in the existing path type, but you use AND
semantics in the new path type, so that "ctid > ? AND ctid < ?" is just
represented as an implicit-AND list of two simple RestrictInfos.

Thanks for the advice. This approach resembles my first draft, which had a
separate executor type. However, it did have a combined path type, with an
enum TidPathMethod to determine how tidquals was interpreted. At this
point, I think a different path type is clearer, though generation of both
types can live in tidpath.c (just as indxpath.c generates different index
path types).

Now admittedly, this wouldn't give us an efficient way to execute
queries with conditions like "WHERE ctid = X OR (ctid > Y AND ctid < Z)",
but I find myself quite unable to get excited about supporting that.
I see no reason for the new code to worry about any cases more complex
than one or two TID inequalities at top level of the restriction list.

I'm a bit sad to see support for multiple ranges go, though I never saw
such queries as ever being particularly common. (And there was always a
nagging feeling that tidpath.c was beginning to perform feats of boolean
acrobatics out of proportion to its importance. Perhaps in some distant
future, TID quals will become another way of supplying TIDs to a bitmap
heap scan, which would enable complicated boolean queries using both
indexes and TID scans. But that's just musing, not a proposal.)

In the query information given to the path generator, there is no existing

RestrictInfo relating to the whole expression "ctid > ? AND ctid < ?". I
am still learning about RestrictInfos, but my understanding is it doesn't
make sense to have a RestrictInfo for an AND clause, anyway; you're
supposed to have them for the sub-expressions of it.

FWIW, the actual data structure for cases like that is that there's
a RestrictInfo for the whole clause ctid = X OR (ctid > Y AND ctid < Z),
and if you look into its "orclause" field, you will find RestrictInfos
attached to the primitive clauses ctid = X, ctid > Y, ctid < Z. (The
old code in tidpath.c didn't know that, because it'd never been rewritten
since RestrictInfos were invented.) However, I think this new code should
not worry about OR cases at all, but just pull out top-level TID
comparison clauses.

Thanks for the explanation.

Show quoted text

And it doesn't seem a good idea to try to create new RestrictInfos in the

path generation just to pass the tidquals back to plan creation.

No, you should avoid that. There are places that assume there's only
one RestrictInfo for any given original clause (or sub-clause).

#41

Andres Freund

andres@anarazel.de

almost 7 years ago

In reply to: Edmund Horner (#40)

Re: Tid scan improvements

Hi,

On 2019-01-19 17:04:13 +1300, Edmund Horner wrote:

On Sat, 19 Jan 2019 at 05:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Edmund Horner <ejrh00@gmail.com> writes:

My patch uses the same path type and executor for all extractable

tidquals.

This worked pretty well, but I am finding it difficult to reimplement it

in

the new tidpath.c code.

I didn't like that approach to begin with, and would suggest that you go
over to using a separate path type and executor node. I don't think the
amount of commonality for the two cases is all that large, and doing it
as you had it required some ugly ad-hoc conventions about the semantics
of the tidquals list. Where I think this should go is that the tidquals
list still has OR semantics in the existing path type, but you use AND
semantics in the new path type, so that "ctid > ? AND ctid < ?" is just
represented as an implicit-AND list of two simple RestrictInfos.

Thanks for the advice. This approach resembles my first draft, which had a
separate executor type. However, it did have a combined path type, with an
enum TidPathMethod to determine how tidquals was interpreted. At this
point, I think a different path type is clearer, though generation of both
types can live in tidpath.c (just as indxpath.c generates different index
path types).

Now admittedly, this wouldn't give us an efficient way to execute
queries with conditions like "WHERE ctid = X OR (ctid > Y AND ctid < Z)",
but I find myself quite unable to get excited about supporting that.
I see no reason for the new code to worry about any cases more complex
than one or two TID inequalities at top level of the restriction list.

I'm a bit sad to see support for multiple ranges go, though I never saw
such queries as ever being particularly common. (And there was always a
nagging feeling that tidpath.c was beginning to perform feats of boolean
acrobatics out of proportion to its importance. Perhaps in some distant
future, TID quals will become another way of supplying TIDs to a bitmap
heap scan, which would enable complicated boolean queries using both
indexes and TID scans. But that's just musing, not a proposal.)

In the query information given to the path generator, there is no existing

RestrictInfo relating to the whole expression "ctid > ? AND ctid < ?". I
am still learning about RestrictInfos, but my understanding is it doesn't
make sense to have a RestrictInfo for an AND clause, anyway; you're
supposed to have them for the sub-expressions of it.

FWIW, the actual data structure for cases like that is that there's
a RestrictInfo for the whole clause ctid = X OR (ctid > Y AND ctid < Z),
and if you look into its "orclause" field, you will find RestrictInfos
attached to the primitive clauses ctid = X, ctid > Y, ctid < Z. (The
old code in tidpath.c didn't know that, because it'd never been rewritten
since RestrictInfos were invented.) However, I think this new code should
not worry about OR cases at all, but just pull out top-level TID
comparison clauses.

Thanks for the explanation.

And it doesn't seem a good idea to try to create new RestrictInfos in the

path generation just to pass the tidquals back to plan creation.

No, you should avoid that. There are places that assume there's only
one RestrictInfo for any given original clause (or sub-clause).

The commitfest has ended, and you've not updated the patch to address
the feedback yet. Are you planning to do so soon? Otherwise I think we
ought to mark the patch as returned with feedback?

Greetings,

Andres Freund

#42

Edmund Horner

ejrh00@gmail.com

almost 7 years ago

In reply to: Andres Freund (#41)

Re: Tid scan improvements

Hi, my apologies for the delay.

I've finished rebasing and rewriting it for Tom's changes to tidpath.c and
his recommendations for tid range scans, but I then found a bug with cursor
interaction. Specifically, FETCH LAST scans through the whole range, and
then proceeds to scan backwards to get the last row. It worked in both my
very first draft, and in the most recent draft before the changes to
tidpath, but I haven't got it working yet for the new version.

I'm hoping to get that fixed in the next 24 hours, and I'll then post the
new patch.

Edmund

On Sun, 3 Feb 2019 at 23:34, Andres Freund <andres@anarazel.de> wrote:

Show quoted text

Hi,

On 2019-01-19 17:04:13 +1300, Edmund Horner wrote:

On Sat, 19 Jan 2019 at 05:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Edmund Horner <ejrh00@gmail.com> writes:

My patch uses the same path type and executor for all extractable

tidquals.

This worked pretty well, but I am finding it difficult to

reimplement it

in

the new tidpath.c code.

I didn't like that approach to begin with, and would suggest that you

go

over to using a separate path type and executor node. I don't think

the

amount of commonality for the two cases is all that large, and doing it
as you had it required some ugly ad-hoc conventions about the semantics
of the tidquals list. Where I think this should go is that the

tidquals

list still has OR semantics in the existing path type, but you use AND
semantics in the new path type, so that "ctid > ? AND ctid < ?" is just
represented as an implicit-AND list of two simple RestrictInfos.

Thanks for the advice. This approach resembles my first draft, which

had a

separate executor type. However, it did have a combined path type, with

an

enum TidPathMethod to determine how tidquals was interpreted. At this
point, I think a different path type is clearer, though generation of

both

types can live in tidpath.c (just as indxpath.c generates different index
path types).

Now admittedly, this wouldn't give us an efficient way to execute
queries with conditions like "WHERE ctid = X OR (ctid > Y AND ctid <

Z)",

but I find myself quite unable to get excited about supporting that.
I see no reason for the new code to worry about any cases more complex
than one or two TID inequalities at top level of the restriction list.

I'm a bit sad to see support for multiple ranges go, though I never saw
such queries as ever being particularly common. (And there was always a
nagging feeling that tidpath.c was beginning to perform feats of boolean
acrobatics out of proportion to its importance. Perhaps in some distant
future, TID quals will become another way of supplying TIDs to a bitmap
heap scan, which would enable complicated boolean queries using both
indexes and TID scans. But that's just musing, not a proposal.)

In the query information given to the path generator, there is no

existing

RestrictInfo relating to the whole expression "ctid > ? AND ctid <

?". I

am still learning about RestrictInfos, but my understanding is it

doesn't

make sense to have a RestrictInfo for an AND clause, anyway; you're
supposed to have them for the sub-expressions of it.

FWIW, the actual data structure for cases like that is that there's
a RestrictInfo for the whole clause ctid = X OR (ctid > Y AND ctid <

Z),

and if you look into its "orclause" field, you will find RestrictInfos
attached to the primitive clauses ctid = X, ctid > Y, ctid < Z. (The
old code in tidpath.c didn't know that, because it'd never been

rewritten

since RestrictInfos were invented.) However, I think this new code

should

not worry about OR cases at all, but just pull out top-level TID
comparison clauses.

Thanks for the explanation.

And it doesn't seem a good idea to try to create new RestrictInfos in

the

path generation just to pass the tidquals back to plan creation.

No, you should avoid that. There are places that assume there's only
one RestrictInfo for any given original clause (or sub-clause).

The commitfest has ended, and you've not updated the patch to address
the feedback yet. Are you planning to do so soon? Otherwise I think we
ought to mark the patch as returned with feedback?

Greetings,

Andres Freund

#43

Edmund Horner

ejrh00@gmail.com

almost 7 years ago

In reply to: Edmund Horner (#40)

4 attachment(s)

Re: Tid scan improvements

On Sat, 19 Jan 2019 at 17:04, Edmund Horner <ejrh00@gmail.com> wrote:

On Sat, 19 Jan 2019 at 05:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Edmund Horner <ejrh00@gmail.com> writes:

My patch uses the same path type and executor for all extractable

tidquals.

This worked pretty well, but I am finding it difficult to reimplement

it in

the new tidpath.c code.

I didn't like that approach to begin with, and would suggest that you go
over to using a separate path type and executor node. I don't think the
amount of commonality for the two cases is all that large, and doing it
as you had it required some ugly ad-hoc conventions about the semantics
of the tidquals list. Where I think this should go is that the tidquals
list still has OR semantics in the existing path type, but you use AND
semantics in the new path type, so that "ctid > ? AND ctid < ?" is just
represented as an implicit-AND list of two simple RestrictInfos.

Thanks for the advice. This approach resembles my first draft, which had
a separate executor type. However, it did have a combined path type, with
an enum TidPathMethod to determine how tidquals was interpreted. At this
point, I think a different path type is clearer, though generation of both
types can live in tidpath.c (just as indxpath.c generates different index
path types).

Hi, here's a new set of patches. This one adds a new path type called
TidRangePath and a new execution node called TidRangeScan. I haven't
included any of the patches for adding pathkeys to TidPaths or
TidRangePaths.

1. v6-0001-Add-selectivity-estimate-for-CTID-system-variables.patch
2. v6-0002-Support-backward-scans-over-restricted-ranges-in-hea.patch
3. v6-0003-Support-range-quals-in-Tid-Scan.patch
4. v6-0004-TID-selectivity-reduce-the-density-of-the-last-page-.patch

Patches 1, 2, and 4 are basically unchanged from my previous post. Patch 4
is an optional tweak to the CTID selectivity estimates.

Patch 3 is a substantial rewrite from what I had before. I've checked
David's most recent review and tried to make sure the new code meets his
suggestions where applicable, although there is one spot where I left the
code as "if (tidrangequals) ..." instead of the preferred "if
(tidrangequals != NIL) ...", just for consistency with the surrounding code.

Questions --

1. Tid Range Paths are costed as random_page_cost for the first page, and
sequential page cost for the remaining pages. It made sense when there
could be multiple non-overlapping ranges. Now that there's only one range,
it might not, but it has the benefit of making Tid Range Scans a little bit
more expensive than Sequential Scans, so that they are less likely to be
picked when a Seq Scan will do just as well. Is there a better cost
formula to use?

2. Is it worth trying to get rid of some of the code duplication between
the TidPath and TidRangePath handling, such as in costsize.c or
createplan.c?

3. TidRangeRecheck (copied from TidRecheck) has an existing comment asking
whether it should actually be performing a check on the returned tuple. It
seems to me that as long as TidRangeNext doesn't return a tuple outside the
requested range, then the check shouldn't be necessary (and we'd simplify
the comment to "nothing to check"). If a range key can change at runtime,
it should never have been included in the TidRangePath. Is my
understanding correct?

4. I'm a little uncomfortable with the way heapam.c changes the scan limits
("--scan->rs_numblocks") as it progresses through the pages. I have the
executor node reset the scan limits after scanning all the tuples, which
seems to work for the tests I have, but I'm using the
heap_setscanlimits feature in a slightly different way from the only
existing use, which is for the one-off scans when building a BRIN index. I
have added some tests for cursor fetches which seems to exercise the code,
but I'd still appreciate close review of how I'm using heapam.

Edmund

Attachments:

v6-0001-Add-selectivity-estimate-for-CTID-system-variables.patchapplication/octet-stream; name=v6-0001-Add-selectivity-estimate-for-CTID-system-variables.patchDownload

From 23d684ad935d913276072c37180ad83f58447cee Mon Sep 17 00:00:00 2001
From: Edmund Horner <ejrh00@gmail.com>
Date: Fri, 12 Oct 2018 13:36:24 +1300
Subject: [PATCH 1/4] Add selectivity estimate for CTID system variables

Previously, estimates for ItemPointer range quals, such as "ctid <= '(5,7)'",
resorted to the default values of 0.33 for range selectivity, although there was
special-case handling for equality quals like "ctid = '(5,7)'", which used the
appropriate selectivity for distinct items.

This change uses the relation size to estimate the selectivity of a range qual.
---
 src/backend/utils/adt/selfuncs.c | 52 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index fb00504..9bb224d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -583,6 +583,58 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 
 	if (!HeapTupleIsValid(vardata->statsTuple))
 	{
+		/*
+		 * There are no stats for system columns, but for CTID we can estimate
+		 * based on table size.
+		 */
+		if (vardata->var && IsA(vardata->var, Var) &&
+			((Var *) vardata->var)->varattno == SelfItemPointerAttributeNumber)
+		{
+			ItemPointer itemptr;
+			double		block;
+			double		density;
+
+			/* If the relation's empty, we're going to include all of it. */
+			if (vardata->rel->pages == 0)
+				return 1.0;
+
+			itemptr = (ItemPointer) DatumGetPointer(constval);
+			block = ItemPointerGetBlockNumberNoCheck(itemptr);
+
+			/*
+			 * Determine the average number of tuples per page.  We naively
+			 * assume there will never be any dead tuples or empty space at
+			 * the start or in the middle of the page.  This is likely fine
+			 * for the purposes here.
+			 */
+			density = vardata->rel->tuples / vardata->rel->pages;
+			if (density > 0.0)
+			{
+				OffsetNumber offset = ItemPointerGetOffsetNumberNoCheck(itemptr);
+
+				block += Min(offset / density, 1.0);
+			}
+
+			selec = block / (double) vardata->rel->pages;
+
+			/*
+			 * We'll have one less tuple for "<" and one additional tuple for
+			 * ">=", the latter of which we'll reverse the selectivity for
+			 * below, so we can simply subtract a tuple here.  We can easily
+			 * detect these two cases by iseq being equal to isgt.  They'll
+			 * either both be true or both be false.
+			 */
+			if (iseq == isgt && vardata->rel->tuples >= 1.0)
+				selec -= (1 / vardata->rel->tuples);
+
+			/* Finally, reverse the selectivity for the ">", ">=" case. */
+			if (isgt)
+				selec = 1.0 - selec;
+
+			CLAMP_PROBABILITY(selec);
+			return selec;
+		}
+
 		/* no stats available, so default result */
 		return DEFAULT_INEQ_SEL;
 	}
-- 
2.7.4

v6-0003-Support-range-quals-in-Tid-Scan.patchapplication/octet-stream; name=v6-0003-Support-range-quals-in-Tid-Scan.patchDownload

From 33ae3016e4e6d3ebb143a5bb143767ec85032928 Mon Sep 17 00:00:00 2001
From: ejrh <ejrh00@gmail.com>
Date: Wed, 30 Jan 2019 10:37:10 +1300
Subject: [PATCH 3/4] Support range quals in Tid Scan

This means queries with expressions such as "ctid >= ? AND ctid < ?" can be
answered by scanning over that part of a table, rather than falling back to a
full SeqScan.
---
 src/backend/commands/explain.c             |  23 ++
 src/backend/executor/Makefile              |   1 +
 src/backend/executor/execAmi.c             |   6 +
 src/backend/executor/execProcnode.c        |  10 +
 src/backend/executor/nodeTidrangescan.c    | 598 +++++++++++++++++++++++++++++
 src/backend/nodes/copyfuncs.c              |  24 ++
 src/backend/nodes/outfuncs.c               |  13 +
 src/backend/optimizer/path/costsize.c      |  96 +++++
 src/backend/optimizer/path/tidpath.c       | 106 ++++-
 src/backend/optimizer/plan/createplan.c    |  98 +++++
 src/backend/optimizer/plan/setrefs.c       |  13 +
 src/backend/optimizer/plan/subselect.c     |   6 +
 src/backend/optimizer/util/pathnode.c      |  29 ++
 src/include/catalog/pg_operator.dat        |   6 +-
 src/include/executor/nodeTidrangescan.h    |  23 ++
 src/include/nodes/execnodes.h              |  22 ++
 src/include/nodes/nodes.h                  |   3 +
 src/include/nodes/pathnodes.h              |  12 +
 src/include/nodes/plannodes.h              |  13 +
 src/include/optimizer/cost.h               |   2 +
 src/include/optimizer/pathnode.h           |   2 +
 src/test/regress/expected/tidrangescan.out | 238 ++++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/sql/tidrangescan.sql      |  74 ++++
 src/tools/pgindent/typedefs.list           |   5 +
 25 files changed, 1411 insertions(+), 14 deletions(-)
 create mode 100644 src/backend/executor/nodeTidrangescan.c
 create mode 100644 src/include/executor/nodeTidrangescan.h
 create mode 100644 src/test/regress/expected/tidrangescan.out
 create mode 100644 src/test/regress/sql/tidrangescan.sql

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 400f3c9..6a63010 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -933,6 +933,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1079,6 +1080,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_TidScan:
 			pname = sname = "Tid Scan";
 			break;
+		case T_TidRangeScan:
+			pname = sname = "Tid Range Scan";
+			break;
 		case T_SubqueryScan:
 			pname = sname = "Subquery Scan";
 			break;
@@ -1270,6 +1274,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SampleScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1691,6 +1696,23 @@ ExplainNode(PlanState *planstate, List *ancestors,
 											   planstate, es);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				/*
+				 * The tidrangequals list has AND semantics, so be sure to
+				 * show it as an AND condition.
+				 */
+				List	   *tidquals = ((TidRangeScan *) plan)->tidrangequals;
+
+				if (list_length(tidquals) > 1)
+					tidquals = list_make1(make_andclause(tidquals));
+				show_scan_qual(tidquals, "TID Cond", planstate, ancestors, es);
+				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+				if (plan->qual)
+					show_instrumentation_count("Rows Removed by Filter", 1,
+											   planstate, es);
+			}
+			break;
 		case T_ForeignScan:
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
@@ -2978,6 +3000,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index cc09895..0152e31 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -28,6 +28,7 @@ OBJS = execAmi.o execCurrent.o execExpr.o execExprInterp.o \
        nodeValuesscan.o \
        nodeCtescan.o nodeNamedtuplestorescan.o nodeWorktablescan.o \
        nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
+       nodeTidrangescan.o \
        nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o tqueue.o spi.o \
        nodeTableFuncscan.o
 
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 187f892..e85ed61 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -51,6 +51,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -198,6 +199,10 @@ ExecReScan(PlanState *node)
 			ExecReScanTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecReScanTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecReScanSubqueryScan((SubqueryScanState *) node);
 			break;
@@ -524,6 +529,7 @@ ExecSupportsBackwardScan(Plan *node)
 
 		case T_SeqScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_FunctionScan:
 		case T_ValuesScan:
 		case T_CteScan:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4ab2903..46b39d0 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -108,6 +108,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -238,6 +239,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 												   estate, eflags);
 			break;
 
+		case T_TidRangeScan:
+			result = (PlanState *) ExecInitTidRangeScan((TidRangeScan *) node,
+														estate, eflags);
+			break;
+
 		case T_SubqueryScan:
 			result = (PlanState *) ExecInitSubqueryScan((SubqueryScan *) node,
 														estate, eflags);
@@ -632,6 +638,10 @@ ExecEndNode(PlanState *node)
 			ExecEndTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecEndTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecEndSubqueryScan((SubqueryScanState *) node);
 			break;
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
new file mode 100644
index 0000000..163407c
--- /dev/null
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -0,0 +1,598 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.c
+ *	  Routines to support tid range scans of relations
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeTidrangescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ *
+ *		ExecTidRangeScan		scans a relation using a range of tids
+ *		ExecInitTidRangeScan	creates and initializes state info.
+ *		ExecReScanTidRangeScan	rescans the tid relation.
+ *		ExecEndTidRangeScan		releases all storage.
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "executor/execdebug.h"
+#include "executor/nodeTidrangescan.h"
+#include "nodes/nodeFuncs.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+
+
+#define IsCTIDVar(node)  \
+	((node) != NULL && \
+	 IsA((node), Var) && \
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
+	 ((Var *) (node))->varlevelsup == 0)
+
+typedef enum
+{
+	TIDEXPR_UPPER_BOUND,
+	TIDEXPR_LOWER_BOUND
+} TidExprType;
+
+/* one element in TidExpr's opexprs */
+typedef struct TidOpExpr
+{
+	TidExprType exprtype;		/* type of op */
+	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
+	bool		inclusive;		/* whether op is inclusive */
+} TidOpExpr;
+
+/*
+ * For the given 'expr', build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.
+ */
+static TidOpExpr *
+MakeTidOpExpr(OpExpr *expr, TidRangeScanState *tidstate)
+{
+	Node	   *arg1 = get_leftop((Expr *) expr);
+	Node	   *arg2 = get_rightop((Expr *) expr);
+	ExprState  *exprstate = NULL;
+	bool		invert = false;
+	TidOpExpr  *tidopexpr;
+
+	if (IsCTIDVar(arg1))
+		exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
+	else if (IsCTIDVar(arg2))
+	{
+		exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
+		invert = true;
+	}
+	else
+		elog(ERROR, "could not identify CTID variable");
+
+	tidopexpr = (TidOpExpr *) palloc0(sizeof(TidOpExpr));
+
+	switch (expr->opno)
+	{
+		case TIDLessEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDLessOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
+			break;
+		case TIDGreaterEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDGreaterOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
+			break;
+		default:
+			elog(ERROR, "could not identify CTID expression");
+	}
+
+	tidopexpr->exprstate = exprstate;
+
+	return tidopexpr;
+}
+
+/*
+ * Extract the qual subexpressions that yield TIDs to search for,
+ * and compile them into ExprStates if they're ordinary expressions.
+ */
+static void
+TidExprListCreate(TidRangeScanState *tidrangestate)
+{
+	TidRangeScan *node = (TidRangeScan *) tidrangestate->ss.ps.plan;
+	List	   *tidexprs = NIL;
+	ListCell   *l;
+
+	foreach(l, node->tidrangequals)
+	{
+		OpExpr	   *opexpr = lfirst(l);
+		TidOpExpr  *tidopexpr = MakeTidOpExpr(opexpr, tidrangestate);
+
+		tidexprs = lappend(tidexprs, tidopexpr);
+	}
+
+	tidrangestate->trss_tidexprs = tidexprs;
+}
+
+/*
+ * Set a lower bound tid, taking into account the inclusivity of the bound.
+ * Return true if the bound is valid.
+ */
+static bool
+SetTidLowerBound(ItemPointer tid, bool inclusive, ItemPointer lowerBound)
+{
+	OffsetNumber offset;
+
+	*lowerBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	if (!inclusive)
+	{
+		/* Check if the lower bound is actually in the next block. */
+		if (offset >= MaxOffsetNumber)
+		{
+			BlockNumber block = ItemPointerGetBlockNumberNoCheck(lowerBound);
+
+			/*
+			 * If the lower bound was already at or above the maximum block
+			 * number, then there is no valid range.
+			 */
+			if (block >= MaxBlockNumber)
+				return false;
+
+			ItemPointerSetBlockNumber(lowerBound, block + 1);
+			ItemPointerSetOffsetNumber(lowerBound, 1);
+		}
+		else
+			ItemPointerSetOffsetNumber(lowerBound, OffsetNumberNext(offset));
+	}
+	else if (offset == 0)
+		ItemPointerSetOffsetNumber(lowerBound, 1);
+
+	return true;
+}
+
+/*
+ * Set an upper bound tid, taking into account the inclusivity of the bound.
+ * Return true if the bound is valid.
+ */
+static bool
+SetTidUpperBound(ItemPointer tid, bool inclusive, ItemPointer upperBound)
+{
+	OffsetNumber offset;
+
+	*upperBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	/*
+	 * Since TID offsets start at 1, an inclusive upper bound with offset 0
+	 * can be treated as an exclusive bound.  This has the benefit of
+	 * eliminating that block from the scan range.
+	 */
+	if (inclusive && offset == 0)
+		inclusive = false;
+
+	if (!inclusive)
+	{
+		/* Check if the upper bound is actually in the previous block. */
+		if (offset == 0)
+		{
+			BlockNumber block = ItemPointerGetBlockNumberNoCheck(upperBound);
+
+			/*
+			 * If the upper bound was already in block 0, then there is no
+			 * valid range.
+			 */
+			if (block == 0)
+				return false;
+
+			ItemPointerSetBlockNumber(upperBound, block - 1);
+			ItemPointerSetOffsetNumber(upperBound, MaxOffsetNumber);
+		}
+		else
+			ItemPointerSetOffsetNumber(upperBound, OffsetNumberPrev(offset));
+	}
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeEval
+ *
+ *		Compute the range of TIDs to scan, by evaluating the
+ *		expressions for them.
+ * ----------------------------------------------------------------
+ */
+static void
+TidRangeEval(TidRangeScanState *node)
+{
+	ExprContext *econtext = node->ss.ps.ps_ExprContext;
+	BlockNumber nblocks;
+	ItemPointerData lowerBound;
+	ItemPointerData upperBound;
+	ListCell   *l;
+
+	/*
+	 * We silently discard any TIDs that are out of range at the time of scan
+	 * start.  (Since we hold at least AccessShareLock on the table, it won't
+	 * be possible for someone to truncate away the blocks we intend to
+	 * visit.)
+	 */
+	nblocks = RelationGetNumberOfBlocks(node->ss.ss_currentRelation);
+
+
+	/* The biggest range on an empty table is empty; just skip it. */
+	if (nblocks == 0)
+		return;
+
+	/* Set the lower and upper bound to scan the whole table. */
+	ItemPointerSetBlockNumber(&lowerBound, 0);
+	ItemPointerSetOffsetNumber(&lowerBound, 1);
+	ItemPointerSetBlockNumber(&upperBound, nblocks - 1);
+	ItemPointerSetOffsetNumber(&upperBound, MaxOffsetNumber);
+
+	foreach(l, node->trss_tidexprs)
+	{
+		TidOpExpr  *tidopexpr = (TidOpExpr *) lfirst(l);
+		ItemPointer itemptr;
+		bool		isNull;
+
+		/* Evaluate this bound. */
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidopexpr->exprstate,
+													  econtext,
+													  &isNull));
+
+		/* If the bound is NULL, *nothing* matches the qual. */
+		if (isNull)
+			return;
+
+		if (tidopexpr->exprtype == TIDEXPR_LOWER_BOUND)
+		{
+			ItemPointerData lb;
+
+			if (!SetTidLowerBound(itemptr, tidopexpr->inclusive, &lb))
+				return;
+
+			if (ItemPointerCompare(&lb, &lowerBound) > 0)
+				lowerBound = lb;
+		}
+
+		if (tidopexpr->exprtype == TIDEXPR_UPPER_BOUND)
+		{
+			ItemPointerData ub;
+
+			if (!SetTidUpperBound(itemptr, tidopexpr->inclusive, &ub))
+				return;
+
+			if (ItemPointerCompare(&ub, &upperBound) < 0)
+				upperBound = ub;
+		}
+	}
+
+	/* If the resulting range is not empty, use it. */
+	if (ItemPointerCompare(&lowerBound, &upperBound) <= 0)
+	{
+		node->trss_startBlock = ItemPointerGetBlockNumberNoCheck(&lowerBound);
+		node->trss_endBlock = ItemPointerGetBlockNumberNoCheck(&upperBound);
+		node->trss_startOffset = ItemPointerGetOffsetNumberNoCheck(&lowerBound);
+		node->trss_endOffset = ItemPointerGetOffsetNumberNoCheck(&upperBound);
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		NextInTidRange
+ *
+ *		Fetch the next tuple when scanning a range of TIDs.
+ *
+ *		Since the heap access method may return tuples that are in the scan
+ *		limit, but not within the required TID range, this function will
+ *		check for such tuples and skip over them.
+ * ----------------------------------------------------------------
+ */
+static HeapTuple
+NextInTidRange(TidRangeScanState *node, HeapScanDesc scandesc, ScanDirection direction)
+{
+	HeapTuple	tuple;
+
+	for (;;)
+	{
+		BlockNumber block;
+		OffsetNumber offset;
+
+		tuple = heap_getnext(scandesc, direction);
+		if (!tuple)
+			break;
+
+		/* Check that the tuple is within the required range. */
+		block = ItemPointerGetBlockNumber(&tuple->t_self);
+		offset = ItemPointerGetOffsetNumber(&tuple->t_self);
+
+		/* The tuple should never come from outside the scan limits. */
+		Assert(block >= node->trss_startBlock &&
+			   block <= node->trss_endBlock);
+
+		/*
+		 * If the tuple is in the first block of the range and before the
+		 * first requested offset, then we can either skip it (if scanning
+		 * forward), or end the scan (if scanning backward).
+		 */
+		if (block == node->trss_startBlock && offset < node->trss_startOffset)
+		{
+			if (ScanDirectionIsForward(direction))
+				continue;
+			else
+				tuple = NULL;
+		}
+
+		/* Similarly for the last block, after the last requested offset. */
+		if (block == node->trss_endBlock && offset > node->trss_endOffset)
+		{
+			if (ScanDirectionIsBackward(direction))
+				continue;
+			else
+				tuple = NULL;
+		}
+
+		break;
+	}
+
+	return tuple;
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeNext
+ *
+ *		Retrieve a tuple from the TidRangeScan node's currentRelation
+ *		using the tids in the TidRangeScanState information.
+ *
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+TidRangeNext(TidRangeScanState *node)
+{
+	HeapScanDesc scandesc;
+	EState	   *estate;
+	ScanDirection direction;
+	HeapTuple	tuple;
+	TupleTableSlot *slot;
+
+	/*
+	 * extract necessary information from tid scan node
+	 */
+	scandesc = node->ss.ss_currentScanDesc;
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+	slot = node->ss.ss_ScanTupleSlot;
+
+	if (!node->trss_inScan)
+	{
+		BlockNumber blocks_to_scan;
+
+		/* First time through, compute the list of TID ranges to be visited */
+		if (node->trss_startBlock == InvalidBlockNumber)
+			TidRangeEval(node);
+
+		if (scandesc == NULL)
+		{
+			scandesc = heap_beginscan_strat(node->ss.ss_currentRelation,
+											estate->es_snapshot,
+											0, NULL,
+											false, false);
+			node->ss.ss_currentScanDesc = scandesc;
+		}
+
+		/* Compute the number of blocks to scan and set the scan limits. */
+		if (node->trss_startBlock == InvalidBlockNumber)
+		{
+			/* If the range is empty, set the scan limits to zero blocks. */
+			node->trss_startBlock = 0;
+			blocks_to_scan = 0;
+		}
+		else
+			blocks_to_scan = node->trss_endBlock - node->trss_startBlock + 1;
+
+		heap_setscanlimits(scandesc, node->trss_startBlock, blocks_to_scan);
+		node->trss_inScan = true;
+	}
+
+	/* Fetch the next tuple. */
+	tuple = NextInTidRange(node, scandesc, direction);
+
+	/*
+	 * If we've exhuasted all the tuples in the range, reset the inScan flag.
+	 * This will cause the heap to be rescanned for any subsequent fetches,
+	 * which is important for some cursor operations: for instance, FETCH LAST
+	 * fetches all the tuples in order and then fetches one tuple in reverse.
+	 */
+	if (!tuple)
+		node->trss_inScan = false;
+
+	/*
+	 * save the tuple and the buffer returned to us by the access methods in
+	 * our scan tuple slot and return the slot.  Note also that
+	 * ExecStoreBufferHeapTuple will increment the refcount of the buffer; the
+	 * refcount will not be dropped until the tuple table slot is cleared.
+	 */
+	if (tuple)
+		ExecStoreBufferHeapTuple(tuple, /* tuple to store */
+								 slot,	/* slot to store in */
+								 scandesc->rs_cbuf);	/* buffer associated
+														 * with this tuple */
+	else
+		ExecClearTuple(slot);
+
+	return slot;
+}
+
+/*
+ * TidRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot)
+{
+	/*
+	 * XXX shouldn't we check here to make sure tuple is in TID range? In
+	 * runtime-key case this is not certain, is it?
+	 */
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecTidRangeScan(node)
+ *
+ *		Scans the relation using tids and returns the next qualifying tuple
+ *		in the direction specified.
+ *		We call the ExecScan() routine and pass it the appropriate
+ *		access method functions.
+ *
+ *		Conditions:
+ *		  -- the "cursor" maintained by the AMI is positioned at the tuple
+ *			 returned previously.
+ *
+ *		Initial States:
+ *		  -- the relation indicated is opened for scanning so that the
+ *			 "cursor" is positioned before the first qualifying tuple.
+ *		  -- trss_startBlock is InvalidBlockNumber
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecTidRangeScan(PlanState *pstate)
+{
+	TidRangeScanState *node = castNode(TidRangeScanState, pstate);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) TidRangeNext,
+					(ExecScanRecheckMtd) TidRangeRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanTidRangeScan(node)
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanTidRangeScan(TidRangeScanState *node)
+{
+	HeapScanDesc scan = node->ss.ss_currentScanDesc;
+
+	if (scan != NULL)
+		heap_rescan(scan,		/* scan desc */
+					NULL);		/* new scan keys */
+
+	/* mark scan as not in progress, and tid range list as not computed yet */
+	node->trss_inScan = false;
+	node->trss_startBlock = InvalidBlockNumber;
+
+	ExecScanReScan(&node->ss);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecEndTidRangeScan
+ *
+ *		Releases any storage allocated through C routines.
+ *		Returns nothing.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndTidRangeScan(TidRangeScanState *node)
+{
+	HeapScanDesc scan = node->ss.ss_currentScanDesc;
+
+	/*
+	 * Free the exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+	/* close heap scan */
+	if (scan != NULL)
+		heap_endscan(scan);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitTidRangeScan
+ *
+ *		Initializes the tid range scan's state information, creates
+ *		scan keys, and opens the base and tid relations.
+ *
+ *		Parameters:
+ *		  node: TidRangeScan node produced by the planner.
+ *		  estate: the execution state initialized in InitPlan.
+ * ----------------------------------------------------------------
+ */
+TidRangeScanState *
+ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
+{
+	TidRangeScanState *tidrangestate;
+	Relation	currentRelation;
+
+	/*
+	 * create state structure
+	 */
+	tidrangestate = makeNode(TidRangeScanState);
+	tidrangestate->ss.ps.plan = (Plan *) node;
+	tidrangestate->ss.ps.state = estate;
+	tidrangestate->ss.ps.ExecProcNode = ExecTidRangeScan;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &tidrangestate->ss.ps);
+
+	/*
+	 * mark scan as not in progress, and tid range list as not computed yet
+	 */
+	tidrangestate->trss_inScan = false;
+	tidrangestate->trss_startBlock = InvalidBlockNumber;
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	tidrangestate->ss.ss_currentRelation = currentRelation;
+	tidrangestate->ss.ss_currentScanDesc = NULL;	/* no heap scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &tidrangestate->ss,
+						  RelationGetDescr(currentRelation),
+						  &TTSOpsBufferHeapTuple);
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTypeTL(&tidrangestate->ss.ps);
+	ExecAssignScanProjectionInfo(&tidrangestate->ss);
+
+	/*
+	 * initialize child expressions
+	 */
+	tidrangestate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) tidrangestate);
+
+	TidExprListCreate(tidrangestate);
+
+	/*
+	 * all done.
+	 */
+	return tidrangestate;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 02a87b7..4127d6d 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -585,6 +585,27 @@ _copyTidScan(const TidScan *from)
 }
 
 /*
+ * _copyTidRangeScan
+ */
+static TidRangeScan *
+_copyTidRangeScan(const TidRangeScan *from)
+{
+	TidRangeScan *newnode = makeNode(TidRangeScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_NODE_FIELD(tidrangequals);
+
+	return newnode;
+}
+
+/*
  * _copySubqueryScan
  */
 static SubqueryScan *
@@ -4839,6 +4860,9 @@ copyObjectImpl(const void *from)
 		case T_TidScan:
 			retval = _copyTidScan(from);
 			break;
+		case T_TidRangeScan:
+			retval = _copyTidRangeScan(from);
+			break;
 		case T_SubqueryScan:
 			retval = _copySubqueryScan(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e36d8b2..cc84b68 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -608,6 +608,16 @@ _outTidScan(StringInfo str, const TidScan *node)
 }
 
 static void
+_outTidRangeScan(StringInfo str, const TidRangeScan *node)
+{
+	WRITE_NODE_TYPE("TIDRANGESCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_NODE_FIELD(tidrangequals);
+}
+
+static void
 _outSubqueryScan(StringInfo str, const SubqueryScan *node)
 {
 	WRITE_NODE_TYPE("SUBQUERYSCAN");
@@ -3669,6 +3679,9 @@ outNode(StringInfo str, const void *obj)
 			case T_TidScan:
 				_outTidScan(str, obj);
 				break;
+			case T_TidRangeScan:
+				_outTidRangeScan(str, obj);
+				break;
 			case T_SubqueryScan:
 				_outSubqueryScan(str, obj);
 				break;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b8d406f..d81b36b 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1276,6 +1276,102 @@ cost_tidscan(Path *path, PlannerInfo *root,
 }
 
 /*
+ * cost_tidrangescan
+ *	  Determines and returns the cost of scanning a relation using a range of
+ *	  TIDs.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'tidrangequals' is the list of TID-checkable range quals
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_tidrangescan(Path *path, PlannerInfo *root,
+				  RelOptInfo *baserel, List *tidrangequals, ParamPathInfo *param_info)
+{
+	Selectivity selectivity;
+	double		pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	QualCost	tid_qual_cost;
+	double		ntuples;
+	double		nrandompages;
+	double		nseqpages;
+	double		spc_random_page_cost;
+	double		spc_seq_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (param_info)
+		path->rows = param_info->ppi_rows;
+	else
+		path->rows = baserel->rows;
+
+	/* Count how many tuples and pages we expect to scan */
+	selectivity = clauselist_selectivity(root, tidrangequals, baserel->relid,
+										 JOIN_INNER, NULL);
+	pages = ceil(selectivity * baserel->pages);
+
+	if (pages <= 0.0)
+		pages = 1.0;
+
+	/*
+	 * The first page in a range requires a random seek, but each subsequent
+	 * page is just a normal sequential page read. NOTE: it's desirable for
+	 * Tid Range Scans to cost more than the equivalent Sequential Scans,
+	 * because Seq Scans have some performance advantages such as scan
+	 * synchronization and parallelizability, and we'd prefer one of them to
+	 * be picked unless a Tid Range Scan really is better.
+	 */
+	ntuples = selectivity * baserel->tuples;
+	nseqpages = pages - 1.0;
+	nrandompages = 1.0;
+
+	if (!enable_tidscan)
+		startup_cost += disable_cost;
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrieved tuple.
+	 */
+	cost_qual_eval(&tid_qual_cost, tidrangequals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/* disk costs */
+	run_cost += spc_random_page_cost * nrandompages + spc_seq_page_cost * nseqpages;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+	/*
+	 * XXX currently we assume TID quals are a subset of qpquals at this
+	 * point; they will be removed (if possible) when we create the plan, so
+	 * we subtract their cost from the total qpqual cost.  (If the TID quals
+	 * can't be removed, this is a mistake and we're going to underestimate
+	 * the CPU cost a bit.)
+	 */
+	startup_cost += qpqual_cost.startup + tid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		tid_qual_cost.per_tuple;
+	run_cost += cpu_per_tuple * ntuples;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->pathtarget->cost.startup;
+	run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+	path->startup_cost = startup_cost;
+	path->total_cost = startup_cost + run_cost;
+}
+
+/*
  * cost_subqueryscan
  *	  Determines and returns the cost of scanning a subquery RTE.
  *
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 466e996..533e936 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -2,9 +2,9 @@
  *
  * tidpath.c
  *	  Routines to determine which TID conditions are usable for scanning
- *	  a given relation, and create TidPaths accordingly.
+ *	  a given relation, and create TidPaths and TidRangePaths accordingly.
  *
- * What we are looking for here is WHERE conditions of the form
+ * For TidPaths, we look for WHERE conditions of the form
  * "CTID = pseudoconstant", which can be implemented by just fetching
  * the tuple directly via heap_fetch().  We can also handle OR'd conditions
  * such as (CTID = const1) OR (CTID = const2), as well as ScalarArrayOpExpr
@@ -23,6 +23,9 @@
  * a function, but in practice it works better to keep the special node
  * representation all the way through to execution.
  *
+ * Additionally, TidRangePaths may be created for conditions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=, and
+ * AND-clauses composed of such conditions.
  *
  * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -63,14 +66,14 @@ IsCTIDVar(Var *var, RelOptInfo *rel)
 
 /*
  * Check to see if a RestrictInfo is of the form
- *		CTID = pseudoconstant
+ *		CTID OP pseudoconstant
  * or
- *		pseudoconstant = CTID
- * where the CTID Var belongs to relation "rel", and nothing on the
- * other side of the clause does.
+ *		pseudoconstant OP CTID
+ * where OP is a binary operation, the CTID Var belongs to relation "rel",
+ * and nothing on the other side of the clause does.
  */
 static bool
-IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+IsTidBinaryClause(RestrictInfo *rinfo, RelOptInfo *rel)
 {
 	OpExpr	   *node;
 	Node	   *arg1,
@@ -83,10 +86,9 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 		return false;
 	node = (OpExpr *) rinfo->clause;
 
-	/* Operator must be tideq */
-	if (node->opno != TIDEqualOperator)
+	/* Operator must take two arguments */
+	if (list_length(node->args) != 2)
 		return false;
-	Assert(list_length(node->args) == 2);
 	arg1 = linitial(node->args);
 	arg2 = lsecond(node->args);
 
@@ -118,6 +120,44 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 
 /*
  * Check to see if a RestrictInfo is of the form
+ *		CTID = pseudoconstant
+ * or
+ *		pseudoconstant = CTID
+ * where the CTID Var belongs to relation "rel", and nothing on the
+ * other side of the clause does.
+ */
+static bool
+IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	if (!IsTidBinaryClause(rinfo, rel))
+		return false;
+	return ((OpExpr *) rinfo->clause)->opno == TIDEqualOperator;
+}
+
+/*
+ * Check to see if a RestrictInfo is of the form
+ *		CTID OP pseudoconstant
+ * or
+ *		pseudoconstant OP CTID
+ * where OP is a range operator such as <, <=, >, or >=, the CTID Var belongs
+ * to relation "rel", and nothing on the other side of the clause does.
+ */
+static bool
+IsTidRangeClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	Oid			opno;
+
+	if (!IsTidBinaryClause(rinfo, rel))
+		return false;
+	opno = ((OpExpr *) rinfo->clause)->opno;
+	return opno == TIDLessOperator ||
+		opno == TIDLessEqOperator ||
+		opno == TIDGreaterOperator ||
+		opno == TIDGreaterEqOperator;
+}
+
+/*
+ * Check to see if a RestrictInfo is of the form
  *		CTID = ANY (pseudoconstant_array)
  * where the CTID Var belongs to relation "rel", and nothing on the
  * other side of the clause does.
@@ -302,6 +342,32 @@ TidQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
 }
 
 /*
+ * Extract a set of CTID range conditions from implicit-AND List of RestrictInfos
+ *
+ * Returns a List of CTID range qual RestrictInfos for the specified rel
+ * (with implicit AND semantics across the list), or NIL if there are no
+ * usable conditions.
+ */
+static List *
+TidRangeQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
+{
+	List	   *rlst = NIL;
+	ListCell   *l;
+
+	foreach(l, rlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (IsTidRangeClause(rinfo, rel))
+		{
+			rlst = lappend(rlst, rinfo);
+		}
+	}
+
+	return rlst;
+}
+
+/*
  * Given a list of join clauses involving our rel, create a parameterized
  * TidPath for each one that is a suitable TidEqual clause.
  *
@@ -385,6 +451,7 @@ void
 create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 {
 	List	   *tidquals;
+	List	   *tidrangequals;
 
 	/*
 	 * If any suitable quals exist in the rel's baserestrict list, generate a
@@ -405,6 +472,25 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 	}
 
 	/*
+	 * If there are range quals in the baserestrict list, generate a
+	 * TidRangePath.
+	 */
+	tidrangequals = TidRangeQualFromRestrictInfoList(rel->baserestrictinfo, rel);
+
+	if (tidrangequals)
+	{
+		/*
+		 * This path uses no join clauses, but it could still have required
+		 * parameterization due to LATERAL refs in its tlist.
+		 */
+		Relids		required_outer = rel->lateral_relids;
+
+		add_path(rel, (Path *) create_tidrangescan_path(root, rel,
+														tidrangequals,
+														required_outer));
+	}
+
+	/*
 	 * Try to generate parameterized TidPaths using equality clauses extracted
 	 * from EquivalenceClasses.  (This is important since simple "t1.ctid =
 	 * t2.ctid" clauses will turn into ECs.)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8f7f1f9..3ba1b3b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -125,6 +125,8 @@ static Plan *create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
 static void bitmap_subplan_mark_shared(Plan *plan);
 static TidScan *create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 					List *tlist, List *scan_clauses);
+static TidRangeScan *create_tidrangescan_plan(PlannerInfo *root, TidRangePath *best_path,
+						 List *tlist, List *scan_clauses);
 static SubqueryScan *create_subqueryscan_plan(PlannerInfo *root,
 						 SubqueryScanPath *best_path,
 						 List *tlist, List *scan_clauses);
@@ -184,6 +186,8 @@ static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 					 Index scanrelid);
 static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
 			 List *tidquals);
+static TidRangeScan *make_tidrangescan(List *qptlist, List *qpqual, Index scanrelid,
+				  List *tidrangequals);
 static SubqueryScan *make_subqueryscan(List *qptlist,
 				  List *qpqual,
 				  Index scanrelid,
@@ -368,6 +372,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -656,6 +661,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												scan_clauses);
 			break;
 
+		case T_TidRangeScan:
+			plan = (Plan *) create_tidrangescan_plan(root,
+													 (TidRangePath *) best_path,
+													 tlist,
+													 scan_clauses);
+			break;
+
 		case T_SubqueryScan:
 			plan = (Plan *) create_subqueryscan_plan(root,
 													 (SubqueryScanPath *) best_path,
@@ -3196,6 +3208,73 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 }
 
 /*
+ * create_tidrangescan_plan
+ *	 Returns a tidrangescan plan for the base relation scanned by 'best_path'
+ *	 with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static TidRangeScan *
+create_tidrangescan_plan(PlannerInfo *root, TidRangePath *best_path,
+						 List *tlist, List *scan_clauses)
+{
+	TidRangeScan *scan_plan;
+	Index		scan_relid = best_path->path.parent->relid;
+	List	   *tidrangequals = best_path->tidrangequals;
+
+	/* it should be a base rel... */
+	Assert(scan_relid > 0);
+	Assert(best_path->path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * The qpqual list must contain all restrictions not enforced by the
+	 * tidrangequals list.  tidquals has AND semantics, so we can simply
+	 * remove any qual that appears in it.
+	 */
+	{
+		List	   *qpqual = NIL;
+		ListCell   *l;
+
+		foreach(l, scan_clauses)
+		{
+			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+			if (rinfo->pseudoconstant)
+				continue;		/* we may drop pseudoconstants here */
+			if (list_member_ptr(tidrangequals, rinfo))
+				continue;		/* simple duplicate */
+			if (is_redundant_derived_clause(rinfo, tidrangequals))
+				continue;		/* derived from same EquivalenceClass */
+			qpqual = lappend(qpqual, rinfo);
+		}
+		scan_clauses = qpqual;
+	}
+
+	/* Sort clauses into best execution order */
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+
+	/* Reduce RestrictInfo lists to bare expressions; ignore pseudoconstants */
+	tidrangequals = extract_actual_clauses(tidrangequals, false);
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/* Replace any outer-relation variables with nestloop params */
+	if (best_path->path.param_info)
+	{
+		tidrangequals = (List *)
+			replace_nestloop_params(root, (Node *) tidrangequals);
+		scan_clauses = (List *)
+			replace_nestloop_params(root, (Node *) scan_clauses);
+	}
+
+	scan_plan = make_tidrangescan(tlist,
+								  scan_clauses,
+								  scan_relid,
+								  tidrangequals);
+
+	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
+
+	return scan_plan;
+}
+
+/*
  * create_subqueryscan_plan
  *	 Returns a subqueryscan plan for the base relation scanned by 'best_path'
  *	 with restriction clauses 'scan_clauses' and targetlist 'tlist'.
@@ -5143,6 +5222,25 @@ make_tidscan(List *qptlist,
 	return node;
 }
 
+static TidRangeScan *
+make_tidrangescan(List *qptlist,
+				  List *qpqual,
+				  Index scanrelid,
+				  List *tidrangequals)
+{
+	TidRangeScan *node = makeNode(TidRangeScan);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->tidrangequals = tidrangequals;
+
+	return node;
+}
+
 static SubqueryScan *
 make_subqueryscan(List *qptlist,
 				  List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 0213a37..0d208e9 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -537,6 +537,19 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 					fix_scan_list(root, splan->tidquals, rtoffset);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				TidRangeScan *splan = (TidRangeScan *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual, rtoffset);
+				splan->tidrangequals =
+					fix_scan_list(root, splan->tidrangequals, rtoffset);
+			}
+			break;
 		case T_SubqueryScan:
 			/* Needs special treatment, see comments below */
 			return set_subqueryscan_references(root,
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 915c6d0..c66c18a 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2064,6 +2064,12 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_TidRangeScan:
+			finalize_primnode((Node *) ((TidRangeScan *) plan)->tidrangequals,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			break;
+
 		case T_SubqueryScan:
 			{
 				SubqueryScan *sscan = (SubqueryScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index b57de6b..e6ee83d 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1210,6 +1210,35 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * create_tidscan_path
+ *	  Creates a path corresponding to a scan by a range of TIDs, returning
+ *	  the pathnode.
+ */
+TidRangePath *
+create_tidrangescan_path(PlannerInfo *root, RelOptInfo *rel, List *tidrangequals,
+						 Relids required_outer)
+{
+	TidRangePath *pathnode = makeNode(TidRangePath);
+
+	pathnode->path.pathtype = T_TidRangeScan;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_workers = 0;
+	pathnode->path.pathkeys = NIL;	/* always unordered */
+
+	pathnode->tidrangequals = tidrangequals;
+
+	cost_tidrangescan(&pathnode->path, root, rel, tidrangequals,
+					  pathnode->path.param_info);
+
+	return pathnode;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 06aec07..fd642af 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -216,15 +216,15 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git a/src/include/executor/nodeTidrangescan.h b/src/include/executor/nodeTidrangescan.h
new file mode 100644
index 0000000..cff8790
--- /dev/null
+++ b/src/include/executor/nodeTidrangescan.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeTidrangescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODETIDRANGESCAN_H
+#define NODETIDRANGESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern TidRangeScanState *ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags);
+extern void ExecEndTidRangeScan(TidRangeScanState *node);
+extern void ExecReScanTidRangeScan(TidRangeScanState *node);
+
+#endif							/* NODETIDRANGESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e22fcc0..75bffed 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1559,6 +1559,28 @@ typedef struct TidScanState
 } TidScanState;
 
 /* ----------------
+ *	 TidRangeScanState information
+ *
+ *		tidexprs			list of TidExpr structs (see nodeTidscan.c)
+ *		trss_startBlock		first block to scan
+ *		trss_endBlock		last block to scan (inclusive)
+ *		trss_startOffset	first offset in first block to scan
+ *		trss_endOffset		last offset in last block to scan (inclusive)
+ *		trss_inScan			is a scan currently in progress?
+ * ----------------
+ */
+typedef struct TidRangeScanState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	List	   *trss_tidexprs;
+	BlockNumber trss_startBlock;
+	BlockNumber trss_endBlock;
+	OffsetNumber trss_startOffset;
+	OffsetNumber trss_endOffset;
+	bool		trss_inScan;
+} TidRangeScanState;
+
+/* ----------------
  *	 SubqueryScanState information
  *
  *		SubqueryScanState is used for scanning a sub-query in the range table.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index e215ad4..284d506 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -59,6 +59,7 @@ typedef enum NodeTag
 	T_BitmapIndexScan,
 	T_BitmapHeapScan,
 	T_TidScan,
+	T_TidRangeScan,
 	T_SubqueryScan,
 	T_FunctionScan,
 	T_ValuesScan,
@@ -115,6 +116,7 @@ typedef enum NodeTag
 	T_BitmapIndexScanState,
 	T_BitmapHeapScanState,
 	T_TidScanState,
+	T_TidRangeScanState,
 	T_SubqueryScanState,
 	T_FunctionScanState,
 	T_TableFuncScanState,
@@ -229,6 +231,7 @@ typedef enum NodeTag
 	T_BitmapAndPath,
 	T_BitmapOrPath,
 	T_TidPath,
+	T_TidRangePath,
 	T_SubqueryScanPath,
 	T_ForeignPath,
 	T_CustomPath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d264bdf..2396120 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1244,6 +1244,18 @@ typedef struct TidPath
 } TidPath;
 
 /*
+ * TidRangePath represents a scan by a continguous range of TIDs
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ */
+typedef struct TidRangePath
+{
+	Path		path;
+	List	   *tidrangequals;
+} TidRangePath;
+
+/*
  * SubqueryScanPath represents a scan of an unflattened subquery-in-FROM
  *
  * Note that the subpath comes from a different planning domain; for example
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index ea37bda..e73f98f 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -489,6 +489,19 @@ typedef struct TidScan
 } TidScan;
 
 /* ----------------
+ *		tid range scan node
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ * ----------------
+ */
+typedef struct TidRangeScan
+{
+	Scan		scan;
+	List	   *tidrangequals;	/* qual(s) involving CTID op something */
+} TidRangeScan;
+
+/* ----------------
  *		subquery scan node
  *
  * SubqueryScan is for scanning the output of a sub-query in the range table.
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ac6de0f..e534fb8 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -82,6 +82,8 @@ extern void cost_bitmap_or_node(BitmapOrPath *path, PlannerInfo *root);
 extern void cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec);
 extern void cost_tidscan(Path *path, PlannerInfo *root,
 			 RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info);
+extern void cost_tidrangescan(Path *path, PlannerInfo *root,
+				  RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info);
 extern void cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
 				  RelOptInfo *baserel, ParamPathInfo *param_info);
 extern void cost_functionscan(Path *path, PlannerInfo *root,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 601f5ab..c7bd88b 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -64,6 +64,8 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
+extern TidRangePath *create_tidrangescan_path(PlannerInfo *root, RelOptInfo *rel,
+						 List *tidrangequals, Relids required_outer);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 				   List *subpaths, List *partial_subpaths,
 				   Relids required_outer,
diff --git a/src/test/regress/expected/tidrangescan.out b/src/test/regress/expected/tidrangescan.out
new file mode 100644
index 0000000..fbe961b
--- /dev/null
+++ b/src/test/regress/expected/tidrangescan.out
@@ -0,0 +1,238 @@
+-- tests for tidrangescans
+CREATE TABLE tidrangescan(id integer, data text);
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid <= '(1,5)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+ (1,1)
+ (1,2)
+ (1,3)
+ (1,4)
+ (1,5)
+(15 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(0,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid > '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+  ctid  
+--------
+ (9,9)
+ (9,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ('(9,8)'::tid < ctid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+  ctid  
+--------
+ (9,9)
+ (9,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+  ctid  
+--------
+ (9,8)
+ (9,9)
+ (9,10)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(100,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ((ctid > '(4,4)'::tid) AND ('(4,7)'::tid >= ctid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+ ctid  
+-------
+ (4,5)
+ (4,6)
+ (4,7)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+ ctid  
+-------
+ (4,5)
+ (4,6)
+ (4,7)
+(3 rows)
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan where ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan where ctid < '(0,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+ ctid 
+------
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid > '(9,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+ ctid 
+------
+(0 rows)
+
+-- cursors
+BEGIN;
+DECLARE c CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH NEXT c;
+ ctid  
+-------
+ (0,2)
+(1 row)
+
+FETCH PRIOR c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH FIRST c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH LAST c;
+  ctid  
+--------
+ (0,10)
+(1 row)
+
+COMMIT;
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index cc0bbf5..cf8d10e 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -89,7 +89,7 @@ test: brin gin gist spgist privileges init_privs security_label collate matview
 # ----------
 # Another group of parallel tests
 # ----------
-test: alter_generic alter_operator misc psql async dbsize misc_functions sysviews tsrf tidscan stats_ext
+test: alter_generic alter_operator misc psql async dbsize misc_functions sysviews tsrf tidscan tidrangescan stats_ext
 
 # rules cannot run concurrently with any test that creates a view
 test: rules psql_crosstab amutils
diff --git a/src/test/regress/sql/tidrangescan.sql b/src/test/regress/sql/tidrangescan.sql
new file mode 100644
index 0000000..042c743
--- /dev/null
+++ b/src/test/regress/sql/tidrangescan.sql
@@ -0,0 +1,74 @@
+-- tests for tidrangescans
+
+CREATE TABLE tidrangescan(id integer, data text);
+
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan where ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+SELECT ctid FROM tidrangescan where ctid < '(0,0)' LIMIT 1;
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+
+-- cursors
+BEGIN;
+DECLARE c CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+FETCH NEXT c;
+FETCH PRIOR c;
+FETCH FIRST c;
+FETCH LAST c;
+COMMIT;
+
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3d3c76d..07069dd 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2340,8 +2340,13 @@ TextPositionState
 TheLexeme
 TheSubstitute
 TidExpr
+TidExprType
 TidHashKey
+TidOpExpr
 TidPath
+TidRangePath
+TidRangeScan
+TidRangeScanState
 TidScan
 TidScanState
 TimeADT
-- 
2.7.4

v6-0004-TID-selectivity-reduce-the-density-of-the-last-page-.patchapplication/octet-stream; name=v6-0004-TID-selectivity-reduce-the-density-of-the-last-page-.patchDownload

From fa33a33654557bb577c6b58fea1f7c9cd7637412 Mon Sep 17 00:00:00 2001
From: ejrh <ejrh00@gmail.com>
Date: Tue, 27 Nov 2018 20:31:58 +1300
Subject: [PATCH 4/4] TID selectivity: reduce the density of the last page by
 half

This takes into account the fact that the last page will have only half the density, on average,
as other pages in a table.
---
 src/backend/utils/adt/selfuncs.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 9bb224d..ccb284b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -606,8 +606,18 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 			 * assume there will never be any dead tuples or empty space at
 			 * the start or in the middle of the page.  This is likely fine
 			 * for the purposes here.
+			 *
+			 * Since the last page will, on average, be only half full, we can
+			 * estimate it to have half as many tuples as earlier pages.  So
+			 * give it half the weight of a regular page.
 			 */
-			density = vardata->rel->tuples / vardata->rel->pages;
+			density = vardata->rel->tuples / (vardata->rel->pages - 0.5);
+
+			/* If it's the last page, it has half the density. */
+			if (block >= vardata->rel->pages - 1)
+				density *= 0.5;
+
+			/* Add a fraction of a block to take the offset into account. */
 			if (density > 0.0)
 			{
 				OffsetNumber offset = ItemPointerGetOffsetNumberNoCheck(itemptr);
@@ -615,7 +625,11 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 				block += Min(offset / density, 1.0);
 			}
 
-			selec = block / (double) vardata->rel->pages;
+			/*
+			 * Again, the last page has only half weight when converting the
+			 * relative block number to a selectivity.
+			 */
+			selec = block / (vardata->rel->pages - 0.5);
 
 			/*
 			 * We'll have one less tuple for "<" and one additional tuple for
-- 
2.7.4

v6-0002-Support-backward-scans-over-restricted-ranges-in-hea.patchapplication/octet-stream; name=v6-0002-Support-backward-scans-over-restricted-ranges-in-hea.patchDownload

From e4b50f956cf18bc728cf40c5938899af0cf20b16 Mon Sep 17 00:00:00 2001
From: ejrh <ejrh00@gmail.com>
Date: Mon, 4 Feb 2019 17:58:29 +1300
Subject: [PATCH 2/4] Support backward scans over restricted ranges in heap
 access method

This is required for backward TID scans, including those caused by a FETCH LAST command.
---
 src/backend/access/heap/heapam.c | 48 +++++++++++++++++++++++++++++++++-------
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dc34993..45864cd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -563,11 +563,27 @@ heapgettup(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_syncscan = false;
-			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
+
+			/*
+			 * When scanning the whole relation, start from the last page of
+			 * the scan.
+			 */
+			if (scan->rs_numblocks == InvalidBlockNumber)
+			{
+				if (scan->rs_startblock > 0)
+					page = scan->rs_startblock - 1;
+				else
+					page = scan->rs_nblocks - 1;
+			}
 			else
-				page = scan->rs_nblocks - 1;
+			{
+				/*
+				 * Otherwise, if scanning just a subset of the relation, start
+				 * at the final block in the range.
+				 */
+				page = scan->rs_startblock + scan->rs_numblocks - 1;
+			}
+
 			heapgetpage(scan, page);
 		}
 		else
@@ -864,11 +880,27 @@ heapgettup_pagemode(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_syncscan = false;
-			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
+
+			/*
+			 * When scanning the whole relation, start from the last page of
+			 * the scan.
+			 */
+			if (scan->rs_numblocks == InvalidBlockNumber)
+			{
+				if (scan->rs_startblock > 0)
+					page = scan->rs_startblock - 1;
+				else
+					page = scan->rs_nblocks - 1;
+			}
 			else
-				page = scan->rs_nblocks - 1;
+			{
+				/*
+				 * Otherwise, if scanning just a subset of the relation, start
+				 * at the final block in the range.
+				 */
+				page = scan->rs_startblock + scan->rs_numblocks - 1;
+			}
+
 			heapgetpage(scan, page);
 		}
 		else
-- 
2.7.4

#44

David Rowley

david.rowley@2ndquadrant.com

almost 7 years ago

In reply to: Edmund Horner (#43)

Re: Tid scan improvements

On Mon, 4 Feb 2019 at 18:37, Edmund Horner <ejrh00@gmail.com> wrote:

1. v6-0001-Add-selectivity-estimate-for-CTID-system-variables.patch

I think 0001 is good to go. It's a clear improvement over what we do today.

(t1 = 1 million row table with a single int column.)

Patched:
# explain (analyze, timing off) select * from t1 where ctid < '(1, 90)';
Seq Scan on t1 (cost=0.00..16925.00 rows=315 width=4) (actual
rows=315 loops=1)

# explain (analyze, timing off) select * from t1 where ctid <= '(1, 90)';
Seq Scan on t1 (cost=0.00..16925.00 rows=316 width=4) (actual
rows=316 loops=1)

Master:
# explain (analyze, timing off) select * from t1 where ctid < '(1, 90)';
Seq Scan on t1 (cost=0.00..16925.00 rows=333333 width=4) (actual
rows=315 loops=1)

# explain (analyze, timing off) select * from t1 where ctid <= '(1, 90)';
Seq Scan on t1 (cost=0.00..16925.00 rows=333333 width=4) (actual
rows=316 loops=1)

The only possible risk I can foresee is that it may be more likely we
underestimate the selectivity and that causes something like a nested
loop join due to the estimation being, say 1 row.

It could happen in a case like:

SELECT * FROM bloated_table WHERE ctid >= <last ctid that would exist
without bloat>

but I don't think we should keep using DEFAULT_INEQ_SEL just in case
this happens. We could probably fix 90% of those cases by returning 2
rows instead of 1.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#45

David Rowley

david.rowley@2ndquadrant.com

almost 7 years ago

In reply to: Edmund Horner (#43)

Re: Tid scan improvements

On Mon, 4 Feb 2019 at 18:37, Edmund Horner <ejrh00@gmail.com> wrote:

2. v6-0002-Support-backward-scans-over-restricted-ranges-in-hea.patch
3. v6-0003-Support-range-quals-in-Tid-Scan.patch
4. v6-0004-TID-selectivity-reduce-the-density-of-the-last-page-.patch

These ones need a rebase.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#46

Edmund Horner

ejrh00@gmail.com

almost 7 years ago

In reply to: David Rowley (#44)

Re: Tid scan improvements

On Thu, 14 Mar 2019 at 16:46, David Rowley <david.rowley@2ndquadrant.com> wrote:

The only possible risk I can foresee is that it may be more likely we
underestimate the selectivity and that causes something like a nested
loop join due to the estimation being, say 1 row.

It could happen in a case like:

SELECT * FROM bloated_table WHERE ctid >= <last ctid that would exist
without bloat>

but I don't think we should keep using DEFAULT_INEQ_SEL just in case
this happens. We could probably fix 90% of those cases by returning 2
rows instead of 1.

Thanks for looking at the patch David.

I'm not sure how an unreasonable underestimation would occur here. If
you have a table bloated to say 10x its minimal size, the estimator
still assumes an even distribution of tuples (I don't think we can do
much better than that). So the selectivity of "ctid >= <last ctid
that would exist without bloat>" is still going to be 0.9.

Edmund

#47

David Rowley

david.rowley@2ndquadrant.com

almost 7 years ago

In reply to: Edmund Horner (#46)

Re: Tid scan improvements

On Thu, 14 Mar 2019 at 21:12, Edmund Horner <ejrh00@gmail.com> wrote:

I'm not sure how an unreasonable underestimation would occur here. If
you have a table bloated to say 10x its minimal size, the estimator
still assumes an even distribution of tuples (I don't think we can do
much better than that). So the selectivity of "ctid >= <last ctid
that would exist without bloat>" is still going to be 0.9.

Okay, think you're right there. I guess the only risk there is just
varying tuple density per page, and that seems no greater risk than we
have with the existing stats.

Just looking again, I think the block of code starting:

+ if (density > 0.0)

needs a comment to mention what it's doing. Perhaps:

+ /*
+ * Using the average tuples per page, calculate how far into
+ * the page the itemptr is likely to be and adjust block
+ * accordingly.
+ */
+ if (density > 0.0)

Or some better choice of words. With that done, I think 0001 is good to go.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#48

Edmund Horner

ejrh00@gmail.com

almost 7 years ago

In reply to: David Rowley (#47)

Re: Tid scan improvements

On Thu, 14 Mar 2019 at 23:06, David Rowley <david.rowley@2ndquadrant.com> wrote:

On Thu, 14 Mar 2019 at 21:12, Edmund Horner <ejrh00@gmail.com> wrote:

I'm not sure how an unreasonable underestimation would occur here. If
you have a table bloated to say 10x its minimal size, the estimator
still assumes an even distribution of tuples (I don't think we can do
much better than that). So the selectivity of "ctid >= <last ctid
that would exist without bloat>" is still going to be 0.9.

Okay, think you're right there. I guess the only risk there is just
varying tuple density per page, and that seems no greater risk than we
have with the existing stats.

Yeah that is a risk, and will probably come up in practice. But at
least we're not just picking a hardcoded selectivity any more.

Just looking again, I think the block of code starting:

+ if (density > 0.0)

needs a comment to mention what it's doing. Perhaps:
+ /*
+ * Using the average tuples per page, calculate how far into
+ * the page the itemptr is likely to be and adjust block
+ * accordingly.
+ */
+ if (density > 0.0)
Or some better choice of words. With that done, I think 0001 is good to go.

Ok, I'll look at it and hopefully get a new patch up soon.

Edmund

#49

Edmund Horner

ejrh00@gmail.com

almost 7 years ago

In reply to: Edmund Horner (#48)

4 attachment(s)

Re: Tid scan improvements

On Thu, 14 Mar 2019 at 23:37, Edmund Horner <ejrh00@gmail.com> wrote:

On Thu, 14 Mar 2019 at 23:06, David Rowley <david.rowley@2ndquadrant.com> wrote:
Just looking again, I think the block of code starting:

+ if (density > 0.0)

needs a comment to mention what it's doing. Perhaps:
+ /*
+ * Using the average tuples per page, calculate how far into
+ * the page the itemptr is likely to be and adjust block
+ * accordingly.
+ */
+ if (density > 0.0)
Or some better choice of words. With that done, I think 0001 is good to go.
Ok, I'll look at it and hopefully get a new patch up soon.

Hullo,

Here's a new set of patches.

It includes new versions of the other patches, which needed to be
rebased because of the introduction of the "tableam" API by
c2fe139c20.

I've had to adapt it to use the table scan API. I've got it compiling
and passing tests, but I'm uneasy about some things that still use the
heapam API.

1. I call heap_setscanlimits as I'm not sure there is a tableam equivalent.
2. I'm not sure whether non-heap tableam implementations can also be
supported by my TID Range Scan: we need to be able to set the scan
limits. There may not be any other implementations yet, but when
there are, how do we stop the planner using a TID Range Scan for
non-heap relations?
3. When fetching tuples, I see that nodeSeqscan.c uses
table_scan_getnextslot, which saves dealing with HeapTuples. But
nodeTidrangescan wants to do some checking of the block and offset
before returning the slot. So I have it using heap_getnext and
ExecStoreBufferHeapTuple. Apart from being heapam-specific, it's just
not as clean as the new API calls.

Ideally, we can get to to support general tableam implementations
rather than using heapam-specific calls. Any advice on how to do
this?

Thanks
Edmund

Attachments:

v7-0001-Add-selectivity-estimate-for-CTID-system-variables.patchapplication/octet-stream; name=v7-0001-Add-selectivity-estimate-for-CTID-system-variables.patchDownload

From ceb815ecea6743c61353bf0773a014e10f9df41c Mon Sep 17 00:00:00 2001
From: Edmund Horner <ejrh00@gmail.com>
Date: Fri, 12 Oct 2018 13:36:24 +1300
Subject: [PATCH 1/4] Add selectivity estimate for CTID system variables

Previously, estimates for ItemPointer range quals, such as "ctid <= '(5,7)'",
resorted to the default value of 0.33 for range selectivity, although there was
special-case handling for equality quals like "ctid = '(5,7)'", which used the
appropriate selectivity for distinct items.

The estimator will now use the relation size to estimate the selectivity of a range qual.
---
 src/backend/utils/adt/selfuncs.c | 59 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 12d30d7..eca20c1 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -557,6 +557,65 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 
 	if (!HeapTupleIsValid(vardata->statsTuple))
 	{
+		/*
+		 * There are no stats for system columns, but for CTID we can estimate
+		 * based on table size.
+		 */
+		if (vardata->var && IsA(vardata->var, Var) &&
+			((Var *) vardata->var)->varattno == SelfItemPointerAttributeNumber)
+		{
+			ItemPointer itemptr;
+			double		block;
+			double		density;
+
+			/* If the relation's empty, we're going to include all of it. */
+			if (vardata->rel->pages == 0)
+				return 1.0;
+
+			itemptr = (ItemPointer) DatumGetPointer(constval);
+			block = ItemPointerGetBlockNumberNoCheck(itemptr);
+
+			/*
+			 * Determine the average number of tuples per page (density).  We
+			 * naively assume there will never be any dead tuples or empty
+			 * space at the start or in the middle of the page.  This is
+			 * likely fine for the purposes here.
+			 */
+			density = vardata->rel->tuples / vardata->rel->pages;
+
+			/*
+			 * Using the average tuples per page, calculate how far into the
+			 * page the itemptr is likely to be and adjust block accordingly,
+			 * by adding that fraction of a whole block (but never more than
+			 * a whole block, no matter how high the itemptr's offset is).
+			 */
+			if (density > 0.0)
+			{
+				OffsetNumber offset = ItemPointerGetOffsetNumberNoCheck(itemptr);
+
+				block += Min(offset / density, 1.0);
+			}
+
+			selec = block / (double) vardata->rel->pages;
+
+			/*
+			 * We'll have one less tuple for "<" and one additional tuple for
+			 * ">=", the latter of which we'll reverse the selectivity for
+			 * below, so we can simply subtract a tuple here.  We can easily
+			 * detect these two cases by iseq being equal to isgt.  They'll
+			 * either both be true or both be false.
+			 */
+			if (iseq == isgt && vardata->rel->tuples >= 1.0)
+				selec -= (1 / vardata->rel->tuples);
+
+			/* Finally, reverse the selectivity for the ">", ">=" case. */
+			if (isgt)
+				selec = 1.0 - selec;
+
+			CLAMP_PROBABILITY(selec);
+			return selec;
+		}
+
 		/* no stats available, so default result */
 		return DEFAULT_INEQ_SEL;
 	}
-- 
2.7.4

v7-0002-Support-backward-scans-over-restricted-ranges-in-hea.patchapplication/octet-stream; name=v7-0002-Support-backward-scans-over-restricted-ranges-in-hea.patchDownload

From 86b14bdf6ed315be47a393842131cbe57fd808fc Mon Sep 17 00:00:00 2001
From: ejrh <ejrh00@gmail.com>
Date: Mon, 4 Feb 2019 17:58:29 +1300
Subject: [PATCH 2/4] Support backward scans over restricted ranges in heap
 access method

This is required for backward TID scans, including those caused by a FETCH LAST command.
---
 src/backend/access/heap/heapam.c | 46 +++++++++++++++++++++++++++++++++-------
 1 file changed, 38 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3c8a5da..cd6bcc3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -560,11 +560,26 @@ heapgettup(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_base.rs_syncscan = false;
-			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
+
+			/*
+			 * When scanning the whole relation, start from the last page of
+			 * the scan.
+			 */
+			if (scan->rs_numblocks == InvalidBlockNumber)
+			{
+				if (scan->rs_startblock > 0)
+					page = scan->rs_startblock - 1;
+				else
+					page = scan->rs_nblocks - 1;
+			}
 			else
-				page = scan->rs_nblocks - 1;
+			{
+				/*
+				 * Otherwise, if scanning just a subset of the relation, start
+				 * at the final block in the range.
+				 */
+				page = scan->rs_startblock + scan->rs_numblocks - 1;
+			}
 			heapgetpage((TableScanDesc) scan, page);
 		}
 		else
@@ -871,11 +886,26 @@ heapgettup_pagemode(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_base.rs_syncscan = false;
-			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
+
+			/*
+			 * When scanning the whole relation, start from the last page of
+			 * the scan.
+			 */
+			if (scan->rs_numblocks == InvalidBlockNumber)
+			{
+				if (scan->rs_startblock > 0)
+					page = scan->rs_startblock - 1;
+				else
+					page = scan->rs_nblocks - 1;
+			}
 			else
-				page = scan->rs_nblocks - 1;
+			{
+				/*
+				 * Otherwise, if scanning just a subset of the relation, start
+				 * at the final block in the range.
+				 */
+				page = scan->rs_startblock + scan->rs_numblocks - 1;
+			}
 			heapgetpage((TableScanDesc) scan, page);
 		}
 		else
-- 
2.7.4

v7-0004-TID-selectivity-reduce-the-density-of-the-last-page-.patchapplication/octet-stream; name=v7-0004-TID-selectivity-reduce-the-density-of-the-last-page-.patchDownload

From f760433da4826ad3f33148b812a2bb7dad3b4580 Mon Sep 17 00:00:00 2001
From: ejrh <ejrh00@gmail.com>
Date: Tue, 27 Nov 2018 20:31:58 +1300
Subject: [PATCH 4/4] TID selectivity: reduce the density of the last page by
 half

This takes into account the fact that the last page will have only half the density, on average,
as other pages in a table.
---
 src/backend/utils/adt/selfuncs.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index eca20c1..2202a98 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -580,8 +580,16 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 			 * naively assume there will never be any dead tuples or empty
 			 * space at the start or in the middle of the page.  This is
 			 * likely fine for the purposes here.
+			 *
+			 * Since the last page will, on average, be only half full, we
+			 * can estimate it to have half as many tuples as earlier pages.  
+			 * So give it half the weight of a regular page.
 			 */
-			density = vardata->rel->tuples / vardata->rel->pages;
+			density = vardata->rel->tuples / (vardata->rel->pages - 0.5);
+
+			/* If it's the last page, it has half the density. */
+			if (block >= vardata->rel->pages - 1)
+				density *= 0.5;
 
 			/*
 			 * Using the average tuples per page, calculate how far into the
@@ -596,7 +604,11 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 				block += Min(offset / density, 1.0);
 			}
 
-			selec = block / (double) vardata->rel->pages;
+			/*
+			 * Again, the last page has only half weight when converting the
+			 * relative block number to a selectivity.
+			 */
+			selec = block / (vardata->rel->pages - 0.5);
 
 			/*
 			 * We'll have one less tuple for "<" and one additional tuple for
-- 
2.7.4

v7-0003-Support-range-quals-in-Tid-Scan.patchapplication/octet-stream; name=v7-0003-Support-range-quals-in-Tid-Scan.patchDownload

From f395014170dad3d71756c80ff8343f54b9042410 Mon Sep 17 00:00:00 2001
From: ejrh <ejrh00@gmail.com>
Date: Wed, 30 Jan 2019 10:37:10 +1300
Subject: [PATCH 3/4] Support range quals in Tid Scan

This means queries with expressions such as "ctid >= ? AND ctid < ?" can be
answered by scanning over that part of a table, rather than falling back to a
full SeqScan.
---
 src/backend/commands/explain.c             |  23 ++
 src/backend/executor/Makefile              |   1 +
 src/backend/executor/execAmi.c             |   6 +
 src/backend/executor/execProcnode.c        |  10 +
 src/backend/executor/nodeTidrangescan.c    | 599 +++++++++++++++++++++++++++++
 src/backend/nodes/copyfuncs.c              |  24 ++
 src/backend/nodes/outfuncs.c               |  13 +
 src/backend/optimizer/path/costsize.c      |  96 +++++
 src/backend/optimizer/path/tidpath.c       | 106 ++++-
 src/backend/optimizer/plan/createplan.c    |  98 +++++
 src/backend/optimizer/plan/setrefs.c       |  13 +
 src/backend/optimizer/plan/subselect.c     |   6 +
 src/backend/optimizer/util/pathnode.c      |  29 ++
 src/include/catalog/pg_operator.dat        |   6 +-
 src/include/executor/nodeTidrangescan.h    |  23 ++
 src/include/nodes/execnodes.h              |  22 ++
 src/include/nodes/nodes.h                  |   3 +
 src/include/nodes/pathnodes.h              |  12 +
 src/include/nodes/plannodes.h              |  13 +
 src/include/optimizer/cost.h               |   2 +
 src/include/optimizer/pathnode.h           |   2 +
 src/test/regress/expected/tidrangescan.out | 238 ++++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/sql/tidrangescan.sql      |  74 ++++
 src/tools/pgindent/typedefs.list           |   5 +
 25 files changed, 1412 insertions(+), 14 deletions(-)
 create mode 100644 src/backend/executor/nodeTidrangescan.c
 create mode 100644 src/include/executor/nodeTidrangescan.h
 create mode 100644 src/test/regress/expected/tidrangescan.out
 create mode 100644 src/test/regress/sql/tidrangescan.sql

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 400f3c9..6a63010 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -933,6 +933,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1079,6 +1080,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_TidScan:
 			pname = sname = "Tid Scan";
 			break;
+		case T_TidRangeScan:
+			pname = sname = "Tid Range Scan";
+			break;
 		case T_SubqueryScan:
 			pname = sname = "Subquery Scan";
 			break;
@@ -1270,6 +1274,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SampleScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1691,6 +1696,23 @@ ExplainNode(PlanState *planstate, List *ancestors,
 											   planstate, es);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				/*
+				 * The tidrangequals list has AND semantics, so be sure to
+				 * show it as an AND condition.
+				 */
+				List	   *tidquals = ((TidRangeScan *) plan)->tidrangequals;
+
+				if (list_length(tidquals) > 1)
+					tidquals = list_make1(make_andclause(tidquals));
+				show_scan_qual(tidquals, "TID Cond", planstate, ancestors, es);
+				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+				if (plan->qual)
+					show_instrumentation_count("Rows Removed by Filter", 1,
+											   planstate, es);
+			}
+			break;
 		case T_ForeignScan:
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
@@ -2978,6 +3000,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index cc09895..0152e31 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -28,6 +28,7 @@ OBJS = execAmi.o execCurrent.o execExpr.o execExprInterp.o \
        nodeValuesscan.o \
        nodeCtescan.o nodeNamedtuplestorescan.o nodeWorktablescan.o \
        nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
+       nodeTidrangescan.o \
        nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o tqueue.o spi.o \
        nodeTableFuncscan.o
 
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 187f892..e85ed61 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -51,6 +51,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -198,6 +199,10 @@ ExecReScan(PlanState *node)
 			ExecReScanTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecReScanTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecReScanSubqueryScan((SubqueryScanState *) node);
 			break;
@@ -524,6 +529,7 @@ ExecSupportsBackwardScan(Plan *node)
 
 		case T_SeqScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_FunctionScan:
 		case T_ValuesScan:
 		case T_CteScan:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4ab2903..46b39d0 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -108,6 +108,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -238,6 +239,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 												   estate, eflags);
 			break;
 
+		case T_TidRangeScan:
+			result = (PlanState *) ExecInitTidRangeScan((TidRangeScan *) node,
+														estate, eflags);
+			break;
+
 		case T_SubqueryScan:
 			result = (PlanState *) ExecInitSubqueryScan((SubqueryScan *) node,
 														estate, eflags);
@@ -632,6 +638,10 @@ ExecEndNode(PlanState *node)
 			ExecEndTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecEndTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecEndSubqueryScan((SubqueryScanState *) node);
 			break;
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
new file mode 100644
index 0000000..a5065f9
--- /dev/null
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -0,0 +1,599 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.c
+ *	  Routines to support tid range scans of relations
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeTidrangescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ *
+ *		ExecTidRangeScan		scans a relation using a range of tids
+ *		ExecInitTidRangeScan	creates and initializes state info.
+ *		ExecReScanTidRangeScan	rescans the tid relation.
+ *		ExecEndTidRangeScan		releases all storage.
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "access/tableam.h"
+#include "catalog/pg_operator.h"
+#include "executor/execdebug.h"
+#include "executor/nodeTidrangescan.h"
+#include "nodes/nodeFuncs.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+
+
+#define IsCTIDVar(node)  \
+	((node) != NULL && \
+	 IsA((node), Var) && \
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
+	 ((Var *) (node))->varlevelsup == 0)
+
+typedef enum
+{
+	TIDEXPR_UPPER_BOUND,
+	TIDEXPR_LOWER_BOUND
+} TidExprType;
+
+/* one element in TidExpr's opexprs */
+typedef struct TidOpExpr
+{
+	TidExprType exprtype;		/* type of op */
+	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
+	bool		inclusive;		/* whether op is inclusive */
+} TidOpExpr;
+
+/*
+ * For the given 'expr', build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.
+ */
+static TidOpExpr *
+MakeTidOpExpr(OpExpr *expr, TidRangeScanState *tidstate)
+{
+	Node	   *arg1 = get_leftop((Expr *) expr);
+	Node	   *arg2 = get_rightop((Expr *) expr);
+	ExprState  *exprstate = NULL;
+	bool		invert = false;
+	TidOpExpr  *tidopexpr;
+
+	if (IsCTIDVar(arg1))
+		exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
+	else if (IsCTIDVar(arg2))
+	{
+		exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
+		invert = true;
+	}
+	else
+		elog(ERROR, "could not identify CTID variable");
+
+	tidopexpr = (TidOpExpr *) palloc0(sizeof(TidOpExpr));
+
+	switch (expr->opno)
+	{
+		case TIDLessEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDLessOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
+			break;
+		case TIDGreaterEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDGreaterOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
+			break;
+		default:
+			elog(ERROR, "could not identify CTID expression");
+	}
+
+	tidopexpr->exprstate = exprstate;
+
+	return tidopexpr;
+}
+
+/*
+ * Extract the qual subexpressions that yield TIDs to search for,
+ * and compile them into ExprStates if they're ordinary expressions.
+ */
+static void
+TidExprListCreate(TidRangeScanState *tidrangestate)
+{
+	TidRangeScan *node = (TidRangeScan *) tidrangestate->ss.ps.plan;
+	List	   *tidexprs = NIL;
+	ListCell   *l;
+
+	foreach(l, node->tidrangequals)
+	{
+		OpExpr	   *opexpr = lfirst(l);
+		TidOpExpr  *tidopexpr = MakeTidOpExpr(opexpr, tidrangestate);
+
+		tidexprs = lappend(tidexprs, tidopexpr);
+	}
+
+	tidrangestate->trss_tidexprs = tidexprs;
+}
+
+/*
+ * Set a lower bound tid, taking into account the inclusivity of the bound.
+ * Return true if the bound is valid.
+ */
+static bool
+SetTidLowerBound(ItemPointer tid, bool inclusive, ItemPointer lowerBound)
+{
+	OffsetNumber offset;
+
+	*lowerBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	if (!inclusive)
+	{
+		/* Check if the lower bound is actually in the next block. */
+		if (offset >= MaxOffsetNumber)
+		{
+			BlockNumber block = ItemPointerGetBlockNumberNoCheck(lowerBound);
+
+			/*
+			 * If the lower bound was already at or above the maximum block
+			 * number, then there is no valid range.
+			 */
+			if (block >= MaxBlockNumber)
+				return false;
+
+			ItemPointerSetBlockNumber(lowerBound, block + 1);
+			ItemPointerSetOffsetNumber(lowerBound, 1);
+		}
+		else
+			ItemPointerSetOffsetNumber(lowerBound, OffsetNumberNext(offset));
+	}
+	else if (offset == 0)
+		ItemPointerSetOffsetNumber(lowerBound, 1);
+
+	return true;
+}
+
+/*
+ * Set an upper bound tid, taking into account the inclusivity of the bound.
+ * Return true if the bound is valid.
+ */
+static bool
+SetTidUpperBound(ItemPointer tid, bool inclusive, ItemPointer upperBound)
+{
+	OffsetNumber offset;
+
+	*upperBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	/*
+	 * Since TID offsets start at 1, an inclusive upper bound with offset 0
+	 * can be treated as an exclusive bound.  This has the benefit of
+	 * eliminating that block from the scan range.
+	 */
+	if (inclusive && offset == 0)
+		inclusive = false;
+
+	if (!inclusive)
+	{
+		/* Check if the upper bound is actually in the previous block. */
+		if (offset == 0)
+		{
+			BlockNumber block = ItemPointerGetBlockNumberNoCheck(upperBound);
+
+			/*
+			 * If the upper bound was already in block 0, then there is no
+			 * valid range.
+			 */
+			if (block == 0)
+				return false;
+
+			ItemPointerSetBlockNumber(upperBound, block - 1);
+			ItemPointerSetOffsetNumber(upperBound, MaxOffsetNumber);
+		}
+		else
+			ItemPointerSetOffsetNumber(upperBound, OffsetNumberPrev(offset));
+	}
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeEval
+ *
+ *		Compute the range of TIDs to scan, by evaluating the
+ *		expressions for them.
+ * ----------------------------------------------------------------
+ */
+static void
+TidRangeEval(TidRangeScanState *node)
+{
+	ExprContext *econtext = node->ss.ps.ps_ExprContext;
+	BlockNumber nblocks;
+	ItemPointerData lowerBound;
+	ItemPointerData upperBound;
+	ListCell   *l;
+
+	/*
+	 * We silently discard any TIDs that are out of range at the time of scan
+	 * start.  (Since we hold at least AccessShareLock on the table, it won't
+	 * be possible for someone to truncate away the blocks we intend to
+	 * visit.)
+	 */
+	nblocks = RelationGetNumberOfBlocks(node->ss.ss_currentRelation);
+
+
+	/* The biggest range on an empty table is empty; just skip it. */
+	if (nblocks == 0)
+		return;
+
+	/* Set the lower and upper bound to scan the whole table. */
+	ItemPointerSetBlockNumber(&lowerBound, 0);
+	ItemPointerSetOffsetNumber(&lowerBound, 1);
+	ItemPointerSetBlockNumber(&upperBound, nblocks - 1);
+	ItemPointerSetOffsetNumber(&upperBound, MaxOffsetNumber);
+
+	foreach(l, node->trss_tidexprs)
+	{
+		TidOpExpr  *tidopexpr = (TidOpExpr *) lfirst(l);
+		ItemPointer itemptr;
+		bool		isNull;
+
+		/* Evaluate this bound. */
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidopexpr->exprstate,
+													  econtext,
+													  &isNull));
+
+		/* If the bound is NULL, *nothing* matches the qual. */
+		if (isNull)
+			return;
+
+		if (tidopexpr->exprtype == TIDEXPR_LOWER_BOUND)
+		{
+			ItemPointerData lb;
+
+			if (!SetTidLowerBound(itemptr, tidopexpr->inclusive, &lb))
+				return;
+
+			if (ItemPointerCompare(&lb, &lowerBound) > 0)
+				lowerBound = lb;
+		}
+
+		if (tidopexpr->exprtype == TIDEXPR_UPPER_BOUND)
+		{
+			ItemPointerData ub;
+
+			if (!SetTidUpperBound(itemptr, tidopexpr->inclusive, &ub))
+				return;
+
+			if (ItemPointerCompare(&ub, &upperBound) < 0)
+				upperBound = ub;
+		}
+	}
+
+	/* If the resulting range is not empty, use it. */
+	if (ItemPointerCompare(&lowerBound, &upperBound) <= 0)
+	{
+		node->trss_startBlock = ItemPointerGetBlockNumberNoCheck(&lowerBound);
+		node->trss_endBlock = ItemPointerGetBlockNumberNoCheck(&upperBound);
+		node->trss_startOffset = ItemPointerGetOffsetNumberNoCheck(&lowerBound);
+		node->trss_endOffset = ItemPointerGetOffsetNumberNoCheck(&upperBound);
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		NextInTidRange
+ *
+ *		Fetch the next tuple when scanning a range of TIDs.
+ *
+ *		Since the heap access method may return tuples that are in the scan
+ *		limit, but not within the required TID range, this function will
+ *		check for such tuples and skip over them.
+ * ----------------------------------------------------------------
+ */
+static HeapTuple
+NextInTidRange(TidRangeScanState *node, TableScanDesc scandesc, ScanDirection direction)
+{
+	HeapTuple	tuple;
+
+	for (;;)
+	{
+		BlockNumber block;
+		OffsetNumber offset;
+
+		tuple = heap_getnext(scandesc, direction);
+		if (!tuple)
+			break;
+
+		/* Check that the tuple is within the required range. */
+		block = ItemPointerGetBlockNumber(&tuple->t_self);
+		offset = ItemPointerGetOffsetNumber(&tuple->t_self);
+
+		/* The tuple should never come from outside the scan limits. */
+		Assert(block >= node->trss_startBlock &&
+			   block <= node->trss_endBlock);
+
+		/*
+		 * If the tuple is in the first block of the range and before the
+		 * first requested offset, then we can either skip it (if scanning
+		 * forward), or end the scan (if scanning backward).
+		 */
+		if (block == node->trss_startBlock && offset < node->trss_startOffset)
+		{
+			if (ScanDirectionIsForward(direction))
+				continue;
+			else
+				tuple = NULL;
+		}
+
+		/* Similarly for the last block, after the last requested offset. */
+		if (block == node->trss_endBlock && offset > node->trss_endOffset)
+		{
+			if (ScanDirectionIsBackward(direction))
+				continue;
+			else
+				tuple = NULL;
+		}
+
+		break;
+	}
+
+	return tuple;
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeNext
+ *
+ *		Retrieve a tuple from the TidRangeScan node's currentRelation
+ *		using the tids in the TidRangeScanState information.
+ *
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+TidRangeNext(TidRangeScanState *node)
+{
+	TableScanDesc scandesc;
+	EState	   *estate;
+	ScanDirection direction;
+	HeapTuple	tuple;
+	TupleTableSlot *slot;
+
+	/*
+	 * extract necessary information from tid scan node
+	 */
+	scandesc = node->ss.ss_currentScanDesc;
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+	slot = node->ss.ss_ScanTupleSlot;
+
+	if (!node->trss_inScan)
+	{
+		BlockNumber blocks_to_scan;
+
+		/* First time through, compute the list of TID ranges to be visited */
+		if (node->trss_startBlock == InvalidBlockNumber)
+			TidRangeEval(node);
+
+		if (scandesc == NULL)
+		{
+			scandesc = table_beginscan_strat(node->ss.ss_currentRelation,
+											estate->es_snapshot,
+											0, NULL,
+											false, false);
+			node->ss.ss_currentScanDesc = scandesc;
+		}
+
+		/* Compute the number of blocks to scan and set the scan limits. */
+		if (node->trss_startBlock == InvalidBlockNumber)
+		{
+			/* If the range is empty, set the scan limits to zero blocks. */
+			node->trss_startBlock = 0;
+			blocks_to_scan = 0;
+		}
+		else
+			blocks_to_scan = node->trss_endBlock - node->trss_startBlock + 1;
+
+		heap_setscanlimits(scandesc, node->trss_startBlock, blocks_to_scan);
+		node->trss_inScan = true;
+	}
+
+	/* Fetch the next tuple. */
+	tuple = NextInTidRange(node, scandesc, direction);
+
+	/*
+	 * If we've exhuasted all the tuples in the range, reset the inScan flag.
+	 * This will cause the heap to be rescanned for any subsequent fetches,
+	 * which is important for some cursor operations: for instance, FETCH LAST
+	 * fetches all the tuples in order and then fetches one tuple in reverse.
+	 */
+	if (!tuple)
+		node->trss_inScan = false;
+
+	/*
+	 * save the tuple and the buffer returned to us by the access methods in
+	 * our scan tuple slot and return the slot.  Note also that
+	 * ExecStoreBufferHeapTuple will increment the refcount of the buffer; the
+	 * refcount will not be dropped until the tuple table slot is cleared.
+	 */
+	if (tuple)
+		ExecStoreBufferHeapTuple(tuple, /* tuple to store */
+								 slot,	/* slot to store in */
+								 ((HeapScanDesc) scandesc)->rs_cbuf);	/* buffer associated
+																		 * with this tuple */
+	else
+		ExecClearTuple(slot);
+
+	return slot;
+}
+
+/*
+ * TidRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot)
+{
+	/*
+	 * XXX shouldn't we check here to make sure tuple is in TID range? In
+	 * runtime-key case this is not certain, is it?
+	 */
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecTidRangeScan(node)
+ *
+ *		Scans the relation using tids and returns the next qualifying tuple
+ *		in the direction specified.
+ *		We call the ExecScan() routine and pass it the appropriate
+ *		access method functions.
+ *
+ *		Conditions:
+ *		  -- the "cursor" maintained by the AMI is positioned at the tuple
+ *			 returned previously.
+ *
+ *		Initial States:
+ *		  -- the relation indicated is opened for scanning so that the
+ *			 "cursor" is positioned before the first qualifying tuple.
+ *		  -- trss_startBlock is InvalidBlockNumber
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecTidRangeScan(PlanState *pstate)
+{
+	TidRangeScanState *node = castNode(TidRangeScanState, pstate);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) TidRangeNext,
+					(ExecScanRecheckMtd) TidRangeRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanTidRangeScan(node)
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanTidRangeScan(TidRangeScanState *node)
+{
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+
+	if (scan != NULL)
+		table_rescan(scan,		/* scan desc */
+					 NULL);		/* new scan keys */
+
+	/* mark scan as not in progress, and tid range list as not computed yet */
+	node->trss_inScan = false;
+	node->trss_startBlock = InvalidBlockNumber;
+
+	ExecScanReScan(&node->ss);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecEndTidRangeScan
+ *
+ *		Releases any storage allocated through C routines.
+ *		Returns nothing.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndTidRangeScan(TidRangeScanState *node)
+{
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+
+	/*
+	 * Free the exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+	/* close heap scan */
+	if (scan != NULL)
+		table_endscan(scan);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitTidRangeScan
+ *
+ *		Initializes the tid range scan's state information, creates
+ *		scan keys, and opens the base and tid relations.
+ *
+ *		Parameters:
+ *		  node: TidRangeScan node produced by the planner.
+ *		  estate: the execution state initialized in InitPlan.
+ * ----------------------------------------------------------------
+ */
+TidRangeScanState *
+ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
+{
+	TidRangeScanState *tidrangestate;
+	Relation	currentRelation;
+
+	/*
+	 * create state structure
+	 */
+	tidrangestate = makeNode(TidRangeScanState);
+	tidrangestate->ss.ps.plan = (Plan *) node;
+	tidrangestate->ss.ps.state = estate;
+	tidrangestate->ss.ps.ExecProcNode = ExecTidRangeScan;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &tidrangestate->ss.ps);
+
+	/*
+	 * mark scan as not in progress, and tid range list as not computed yet
+	 */
+	tidrangestate->trss_inScan = false;
+	tidrangestate->trss_startBlock = InvalidBlockNumber;
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	tidrangestate->ss.ss_currentRelation = currentRelation;
+	tidrangestate->ss.ss_currentScanDesc = NULL;	/* no heap scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &tidrangestate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTypeTL(&tidrangestate->ss.ps);
+	ExecAssignScanProjectionInfo(&tidrangestate->ss);
+
+	/*
+	 * initialize child expressions
+	 */
+	tidrangestate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) tidrangestate);
+
+	TidExprListCreate(tidrangestate);
+
+	/*
+	 * all done.
+	 */
+	return tidrangestate;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 74b23b7..a9ee89c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -585,6 +585,27 @@ _copyTidScan(const TidScan *from)
 }
 
 /*
+ * _copyTidRangeScan
+ */
+static TidRangeScan *
+_copyTidRangeScan(const TidRangeScan *from)
+{
+	TidRangeScan *newnode = makeNode(TidRangeScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_NODE_FIELD(tidrangequals);
+
+	return newnode;
+}
+
+/*
  * _copySubqueryScan
  */
 static SubqueryScan *
@@ -4843,6 +4864,9 @@ copyObjectImpl(const void *from)
 		case T_TidScan:
 			retval = _copyTidScan(from);
 			break;
+		case T_TidRangeScan:
+			retval = _copyTidRangeScan(from);
+			break;
 		case T_SubqueryScan:
 			retval = _copySubqueryScan(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 8fd5ad8..0aa21e2 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -608,6 +608,16 @@ _outTidScan(StringInfo str, const TidScan *node)
 }
 
 static void
+_outTidRangeScan(StringInfo str, const TidRangeScan *node)
+{
+	WRITE_NODE_TYPE("TIDRANGESCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_NODE_FIELD(tidrangequals);
+}
+
+static void
 _outSubqueryScan(StringInfo str, const SubqueryScan *node)
 {
 	WRITE_NODE_TYPE("SUBQUERYSCAN");
@@ -3683,6 +3693,9 @@ outNode(StringInfo str, const void *obj)
 			case T_TidScan:
 				_outTidScan(str, obj);
 				break;
+			case T_TidRangeScan:
+				_outTidRangeScan(str, obj);
+				break;
 			case T_SubqueryScan:
 				_outSubqueryScan(str, obj);
 				break;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 4b9be13..2d9846d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1272,6 +1272,102 @@ cost_tidscan(Path *path, PlannerInfo *root,
 }
 
 /*
+ * cost_tidrangescan
+ *	  Determines and returns the cost of scanning a relation using a range of
+ *	  TIDs.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'tidrangequals' is the list of TID-checkable range quals
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_tidrangescan(Path *path, PlannerInfo *root,
+				  RelOptInfo *baserel, List *tidrangequals, ParamPathInfo *param_info)
+{
+	Selectivity selectivity;
+	double		pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	QualCost	tid_qual_cost;
+	double		ntuples;
+	double		nrandompages;
+	double		nseqpages;
+	double		spc_random_page_cost;
+	double		spc_seq_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (param_info)
+		path->rows = param_info->ppi_rows;
+	else
+		path->rows = baserel->rows;
+
+	/* Count how many tuples and pages we expect to scan */
+	selectivity = clauselist_selectivity(root, tidrangequals, baserel->relid,
+										 JOIN_INNER, NULL);
+	pages = ceil(selectivity * baserel->pages);
+
+	if (pages <= 0.0)
+		pages = 1.0;
+
+	/*
+	 * The first page in a range requires a random seek, but each subsequent
+	 * page is just a normal sequential page read. NOTE: it's desirable for
+	 * Tid Range Scans to cost more than the equivalent Sequential Scans,
+	 * because Seq Scans have some performance advantages such as scan
+	 * synchronization and parallelizability, and we'd prefer one of them to
+	 * be picked unless a Tid Range Scan really is better.
+	 */
+	ntuples = selectivity * baserel->tuples;
+	nseqpages = pages - 1.0;
+	nrandompages = 1.0;
+
+	if (!enable_tidscan)
+		startup_cost += disable_cost;
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrieved tuple.
+	 */
+	cost_qual_eval(&tid_qual_cost, tidrangequals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/* disk costs */
+	run_cost += spc_random_page_cost * nrandompages + spc_seq_page_cost * nseqpages;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+	/*
+	 * XXX currently we assume TID quals are a subset of qpquals at this
+	 * point; they will be removed (if possible) when we create the plan, so
+	 * we subtract their cost from the total qpqual cost.  (If the TID quals
+	 * can't be removed, this is a mistake and we're going to underestimate
+	 * the CPU cost a bit.)
+	 */
+	startup_cost += qpqual_cost.startup + tid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		tid_qual_cost.per_tuple;
+	run_cost += cpu_per_tuple * ntuples;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->pathtarget->cost.startup;
+	run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+	path->startup_cost = startup_cost;
+	path->total_cost = startup_cost + run_cost;
+}
+
+/*
  * cost_subqueryscan
  *	  Determines and returns the cost of scanning a subquery RTE.
  *
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 466e996..533e936 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -2,9 +2,9 @@
  *
  * tidpath.c
  *	  Routines to determine which TID conditions are usable for scanning
- *	  a given relation, and create TidPaths accordingly.
+ *	  a given relation, and create TidPaths and TidRangePaths accordingly.
  *
- * What we are looking for here is WHERE conditions of the form
+ * For TidPaths, we look for WHERE conditions of the form
  * "CTID = pseudoconstant", which can be implemented by just fetching
  * the tuple directly via heap_fetch().  We can also handle OR'd conditions
  * such as (CTID = const1) OR (CTID = const2), as well as ScalarArrayOpExpr
@@ -23,6 +23,9 @@
  * a function, but in practice it works better to keep the special node
  * representation all the way through to execution.
  *
+ * Additionally, TidRangePaths may be created for conditions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=, and
+ * AND-clauses composed of such conditions.
  *
  * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -63,14 +66,14 @@ IsCTIDVar(Var *var, RelOptInfo *rel)
 
 /*
  * Check to see if a RestrictInfo is of the form
- *		CTID = pseudoconstant
+ *		CTID OP pseudoconstant
  * or
- *		pseudoconstant = CTID
- * where the CTID Var belongs to relation "rel", and nothing on the
- * other side of the clause does.
+ *		pseudoconstant OP CTID
+ * where OP is a binary operation, the CTID Var belongs to relation "rel",
+ * and nothing on the other side of the clause does.
  */
 static bool
-IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+IsTidBinaryClause(RestrictInfo *rinfo, RelOptInfo *rel)
 {
 	OpExpr	   *node;
 	Node	   *arg1,
@@ -83,10 +86,9 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 		return false;
 	node = (OpExpr *) rinfo->clause;
 
-	/* Operator must be tideq */
-	if (node->opno != TIDEqualOperator)
+	/* Operator must take two arguments */
+	if (list_length(node->args) != 2)
 		return false;
-	Assert(list_length(node->args) == 2);
 	arg1 = linitial(node->args);
 	arg2 = lsecond(node->args);
 
@@ -118,6 +120,44 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 
 /*
  * Check to see if a RestrictInfo is of the form
+ *		CTID = pseudoconstant
+ * or
+ *		pseudoconstant = CTID
+ * where the CTID Var belongs to relation "rel", and nothing on the
+ * other side of the clause does.
+ */
+static bool
+IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	if (!IsTidBinaryClause(rinfo, rel))
+		return false;
+	return ((OpExpr *) rinfo->clause)->opno == TIDEqualOperator;
+}
+
+/*
+ * Check to see if a RestrictInfo is of the form
+ *		CTID OP pseudoconstant
+ * or
+ *		pseudoconstant OP CTID
+ * where OP is a range operator such as <, <=, >, or >=, the CTID Var belongs
+ * to relation "rel", and nothing on the other side of the clause does.
+ */
+static bool
+IsTidRangeClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	Oid			opno;
+
+	if (!IsTidBinaryClause(rinfo, rel))
+		return false;
+	opno = ((OpExpr *) rinfo->clause)->opno;
+	return opno == TIDLessOperator ||
+		opno == TIDLessEqOperator ||
+		opno == TIDGreaterOperator ||
+		opno == TIDGreaterEqOperator;
+}
+
+/*
+ * Check to see if a RestrictInfo is of the form
  *		CTID = ANY (pseudoconstant_array)
  * where the CTID Var belongs to relation "rel", and nothing on the
  * other side of the clause does.
@@ -302,6 +342,32 @@ TidQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
 }
 
 /*
+ * Extract a set of CTID range conditions from implicit-AND List of RestrictInfos
+ *
+ * Returns a List of CTID range qual RestrictInfos for the specified rel
+ * (with implicit AND semantics across the list), or NIL if there are no
+ * usable conditions.
+ */
+static List *
+TidRangeQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
+{
+	List	   *rlst = NIL;
+	ListCell   *l;
+
+	foreach(l, rlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (IsTidRangeClause(rinfo, rel))
+		{
+			rlst = lappend(rlst, rinfo);
+		}
+	}
+
+	return rlst;
+}
+
+/*
  * Given a list of join clauses involving our rel, create a parameterized
  * TidPath for each one that is a suitable TidEqual clause.
  *
@@ -385,6 +451,7 @@ void
 create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 {
 	List	   *tidquals;
+	List	   *tidrangequals;
 
 	/*
 	 * If any suitable quals exist in the rel's baserestrict list, generate a
@@ -405,6 +472,25 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 	}
 
 	/*
+	 * If there are range quals in the baserestrict list, generate a
+	 * TidRangePath.
+	 */
+	tidrangequals = TidRangeQualFromRestrictInfoList(rel->baserestrictinfo, rel);
+
+	if (tidrangequals)
+	{
+		/*
+		 * This path uses no join clauses, but it could still have required
+		 * parameterization due to LATERAL refs in its tlist.
+		 */
+		Relids		required_outer = rel->lateral_relids;
+
+		add_path(rel, (Path *) create_tidrangescan_path(root, rel,
+														tidrangequals,
+														required_outer));
+	}
+
+	/*
 	 * Try to generate parameterized TidPaths using equality clauses extracted
 	 * from EquivalenceClasses.  (This is important since simple "t1.ctid =
 	 * t2.ctid" clauses will turn into ECs.)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index bc0ed37..f96ff23 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -125,6 +125,8 @@ static Plan *create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
 static void bitmap_subplan_mark_shared(Plan *plan);
 static TidScan *create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 					List *tlist, List *scan_clauses);
+static TidRangeScan *create_tidrangescan_plan(PlannerInfo *root, TidRangePath *best_path,
+						 List *tlist, List *scan_clauses);
 static SubqueryScan *create_subqueryscan_plan(PlannerInfo *root,
 						 SubqueryScanPath *best_path,
 						 List *tlist, List *scan_clauses);
@@ -189,6 +191,8 @@ static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 					 Index scanrelid);
 static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
 			 List *tidquals);
+static TidRangeScan *make_tidrangescan(List *qptlist, List *qpqual, Index scanrelid,
+				  List *tidrangequals);
 static SubqueryScan *make_subqueryscan(List *qptlist,
 				  List *qpqual,
 				  Index scanrelid,
@@ -373,6 +377,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -661,6 +666,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												scan_clauses);
 			break;
 
+		case T_TidRangeScan:
+			plan = (Plan *) create_tidrangescan_plan(root,
+													 (TidRangePath *) best_path,
+													 tlist,
+													 scan_clauses);
+			break;
+
 		case T_SubqueryScan:
 			plan = (Plan *) create_subqueryscan_plan(root,
 													 (SubqueryScanPath *) best_path,
@@ -3208,6 +3220,73 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 }
 
 /*
+ * create_tidrangescan_plan
+ *	 Returns a tidrangescan plan for the base relation scanned by 'best_path'
+ *	 with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static TidRangeScan *
+create_tidrangescan_plan(PlannerInfo *root, TidRangePath *best_path,
+						 List *tlist, List *scan_clauses)
+{
+	TidRangeScan *scan_plan;
+	Index		scan_relid = best_path->path.parent->relid;
+	List	   *tidrangequals = best_path->tidrangequals;
+
+	/* it should be a base rel... */
+	Assert(scan_relid > 0);
+	Assert(best_path->path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * The qpqual list must contain all restrictions not enforced by the
+	 * tidrangequals list.  tidquals has AND semantics, so we can simply
+	 * remove any qual that appears in it.
+	 */
+	{
+		List	   *qpqual = NIL;
+		ListCell   *l;
+
+		foreach(l, scan_clauses)
+		{
+			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+			if (rinfo->pseudoconstant)
+				continue;		/* we may drop pseudoconstants here */
+			if (list_member_ptr(tidrangequals, rinfo))
+				continue;		/* simple duplicate */
+			if (is_redundant_derived_clause(rinfo, tidrangequals))
+				continue;		/* derived from same EquivalenceClass */
+			qpqual = lappend(qpqual, rinfo);
+		}
+		scan_clauses = qpqual;
+	}
+
+	/* Sort clauses into best execution order */
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+
+	/* Reduce RestrictInfo lists to bare expressions; ignore pseudoconstants */
+	tidrangequals = extract_actual_clauses(tidrangequals, false);
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/* Replace any outer-relation variables with nestloop params */
+	if (best_path->path.param_info)
+	{
+		tidrangequals = (List *)
+			replace_nestloop_params(root, (Node *) tidrangequals);
+		scan_clauses = (List *)
+			replace_nestloop_params(root, (Node *) scan_clauses);
+	}
+
+	scan_plan = make_tidrangescan(tlist,
+								  scan_clauses,
+								  scan_relid,
+								  tidrangequals);
+
+	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
+
+	return scan_plan;
+}
+
+/*
  * create_subqueryscan_plan
  *	 Returns a subqueryscan plan for the base relation scanned by 'best_path'
  *	 with restriction clauses 'scan_clauses' and targetlist 'tlist'.
@@ -5109,6 +5188,25 @@ make_tidscan(List *qptlist,
 	return node;
 }
 
+static TidRangeScan *
+make_tidrangescan(List *qptlist,
+				  List *qpqual,
+				  Index scanrelid,
+				  List *tidrangequals)
+{
+	TidRangeScan *node = makeNode(TidRangeScan);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->tidrangequals = tidrangequals;
+
+	return node;
+}
+
 static SubqueryScan *
 make_subqueryscan(List *qptlist,
 				  List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 0213a37..0d208e9 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -537,6 +537,19 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 					fix_scan_list(root, splan->tidquals, rtoffset);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				TidRangeScan *splan = (TidRangeScan *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual, rtoffset);
+				splan->tidrangequals =
+					fix_scan_list(root, splan->tidrangequals, rtoffset);
+			}
+			break;
 		case T_SubqueryScan:
 			/* Needs special treatment, see comments below */
 			return set_subqueryscan_references(root,
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 33e47cc..4a958a6 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2235,6 +2235,12 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_TidRangeScan:
+			finalize_primnode((Node *) ((TidRangeScan *) plan)->tidrangequals,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			break;
+
 		case T_SubqueryScan:
 			{
 				SubqueryScan *sscan = (SubqueryScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 169e51e..a87ccf8 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1198,6 +1198,35 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * create_tidscan_path
+ *	  Creates a path corresponding to a scan by a range of TIDs, returning
+ *	  the pathnode.
+ */
+TidRangePath *
+create_tidrangescan_path(PlannerInfo *root, RelOptInfo *rel, List *tidrangequals,
+						 Relids required_outer)
+{
+	TidRangePath *pathnode = makeNode(TidRangePath);
+
+	pathnode->path.pathtype = T_TidRangeScan;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_workers = 0;
+	pathnode->path.pathkeys = NIL;	/* always unordered */
+
+	pathnode->tidrangequals = tidrangequals;
+
+	cost_tidrangescan(&pathnode->path, root, rel, tidrangequals,
+					  pathnode->path.param_info);
+
+	return pathnode;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 06aec07..fd642af 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -216,15 +216,15 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git a/src/include/executor/nodeTidrangescan.h b/src/include/executor/nodeTidrangescan.h
new file mode 100644
index 0000000..cff8790
--- /dev/null
+++ b/src/include/executor/nodeTidrangescan.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeTidrangescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODETIDRANGESCAN_H
+#define NODETIDRANGESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern TidRangeScanState *ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags);
+extern void ExecEndTidRangeScan(TidRangeScanState *node);
+extern void ExecReScanTidRangeScan(TidRangeScanState *node);
+
+#endif							/* NODETIDRANGESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 770b56c..44b146e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1562,6 +1562,28 @@ typedef struct TidScanState
 } TidScanState;
 
 /* ----------------
+ *	 TidRangeScanState information
+ *
+ *		tidexprs			list of TidExpr structs (see nodeTidscan.c)
+ *		trss_startBlock		first block to scan
+ *		trss_endBlock		last block to scan (inclusive)
+ *		trss_startOffset	first offset in first block to scan
+ *		trss_endOffset		last offset in last block to scan (inclusive)
+ *		trss_inScan			is a scan currently in progress?
+ * ----------------
+ */
+typedef struct TidRangeScanState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	List	   *trss_tidexprs;
+	BlockNumber trss_startBlock;
+	BlockNumber trss_endBlock;
+	OffsetNumber trss_startOffset;
+	OffsetNumber trss_endOffset;
+	bool		trss_inScan;
+} TidRangeScanState;
+
+/* ----------------
  *	 SubqueryScanState information
  *
  *		SubqueryScanState is used for scanning a sub-query in the range table.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index ffb4cd4..8d7dfd3 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -59,6 +59,7 @@ typedef enum NodeTag
 	T_BitmapIndexScan,
 	T_BitmapHeapScan,
 	T_TidScan,
+	T_TidRangeScan,
 	T_SubqueryScan,
 	T_FunctionScan,
 	T_ValuesScan,
@@ -115,6 +116,7 @@ typedef enum NodeTag
 	T_BitmapIndexScanState,
 	T_BitmapHeapScanState,
 	T_TidScanState,
+	T_TidRangeScanState,
 	T_SubqueryScanState,
 	T_FunctionScanState,
 	T_TableFuncScanState,
@@ -229,6 +231,7 @@ typedef enum NodeTag
 	T_BitmapAndPath,
 	T_BitmapOrPath,
 	T_TidPath,
+	T_TidRangePath,
 	T_SubqueryScanPath,
 	T_ForeignPath,
 	T_CustomPath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4b15d26..645cfc8 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1276,6 +1276,18 @@ typedef struct TidPath
 } TidPath;
 
 /*
+ * TidRangePath represents a scan by a continguous range of TIDs
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ */
+typedef struct TidRangePath
+{
+	Path		path;
+	List	   *tidrangequals;
+} TidRangePath;
+
+/*
  * SubqueryScanPath represents a scan of an unflattened subquery-in-FROM
  *
  * Note that the subpath comes from a different planning domain; for example
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 93d7f32..eaaa11b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -489,6 +489,19 @@ typedef struct TidScan
 } TidScan;
 
 /* ----------------
+ *		tid range scan node
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ * ----------------
+ */
+typedef struct TidRangeScan
+{
+	Scan		scan;
+	List	   *tidrangequals;	/* qual(s) involving CTID op something */
+} TidRangeScan;
+
+/* ----------------
  *		subquery scan node
  *
  * SubqueryScan is for scanning the output of a sub-query in the range table.
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ac6de0f..e534fb8 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -82,6 +82,8 @@ extern void cost_bitmap_or_node(BitmapOrPath *path, PlannerInfo *root);
 extern void cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec);
 extern void cost_tidscan(Path *path, PlannerInfo *root,
 			 RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info);
+extern void cost_tidrangescan(Path *path, PlannerInfo *root,
+				  RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info);
 extern void cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
 				  RelOptInfo *baserel, ParamPathInfo *param_info);
 extern void cost_functionscan(Path *path, PlannerInfo *root,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index a51a6dc..aec02f4 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,6 +63,8 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
+extern TidRangePath *create_tidrangescan_path(PlannerInfo *root, RelOptInfo *rel,
+						 List *tidrangequals, Relids required_outer);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 				   List *subpaths, List *partial_subpaths,
 				   Relids required_outer,
diff --git a/src/test/regress/expected/tidrangescan.out b/src/test/regress/expected/tidrangescan.out
new file mode 100644
index 0000000..fbe961b
--- /dev/null
+++ b/src/test/regress/expected/tidrangescan.out
@@ -0,0 +1,238 @@
+-- tests for tidrangescans
+CREATE TABLE tidrangescan(id integer, data text);
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid <= '(1,5)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+ (1,1)
+ (1,2)
+ (1,3)
+ (1,4)
+ (1,5)
+(15 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(0,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid > '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+  ctid  
+--------
+ (9,9)
+ (9,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ('(9,8)'::tid < ctid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+  ctid  
+--------
+ (9,9)
+ (9,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+  ctid  
+--------
+ (9,8)
+ (9,9)
+ (9,10)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(100,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ((ctid > '(4,4)'::tid) AND ('(4,7)'::tid >= ctid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+ ctid  
+-------
+ (4,5)
+ (4,6)
+ (4,7)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+ ctid  
+-------
+ (4,5)
+ (4,6)
+ (4,7)
+(3 rows)
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan where ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan where ctid < '(0,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+ ctid 
+------
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid > '(9,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+ ctid 
+------
+(0 rows)
+
+-- cursors
+BEGIN;
+DECLARE c CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH NEXT c;
+ ctid  
+-------
+ (0,2)
+(1 row)
+
+FETCH PRIOR c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH FIRST c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH LAST c;
+  ctid  
+--------
+ (0,10)
+(1 row)
+
+COMMIT;
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 030a71f..47070f7 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -89,7 +89,7 @@ test: brin gin gist spgist privileges init_privs security_label collate matview
 # ----------
 # Another group of parallel tests
 # ----------
-test: create_table_like alter_generic alter_operator misc psql async dbsize misc_functions sysviews tsrf tidscan stats_ext
+test: create_table_like alter_generic alter_operator misc psql async dbsize misc_functions sysviews tsrf tidscan tidrangescan stats_ext
 
 # rules cannot run concurrently with any test that creates a view
 test: rules psql_crosstab amutils
diff --git a/src/test/regress/sql/tidrangescan.sql b/src/test/regress/sql/tidrangescan.sql
new file mode 100644
index 0000000..042c743
--- /dev/null
+++ b/src/test/regress/sql/tidrangescan.sql
@@ -0,0 +1,74 @@
+-- tests for tidrangescans
+
+CREATE TABLE tidrangescan(id integer, data text);
+
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan where ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+SELECT ctid FROM tidrangescan where ctid < '(0,0)' LIMIT 1;
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+
+-- cursors
+BEGIN;
+DECLARE c CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+FETCH NEXT c;
+FETCH PRIOR c;
+FETCH FIRST c;
+FETCH LAST c;
+COMMIT;
+
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b821df9..c0da577 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2347,8 +2347,13 @@ TextPositionState
 TheLexeme
 TheSubstitute
 TidExpr
+TidExprType
 TidHashKey
+TidOpExpr
 TidPath
+TidRangePath
+TidRangeScan
+TidRangeScanState
 TidScan
 TidScanState
 TimeADT
-- 
2.7.4

#50

David Rowley

david.rowley@2ndquadrant.com

almost 7 years ago

In reply to: Edmund Horner (#49)

Re: Tid scan improvements

On Fri, 15 Mar 2019 at 18:42, Edmund Horner <ejrh00@gmail.com> wrote:

I've had to adapt it to use the table scan API. I've got it compiling
and passing tests, but I'm uneasy about some things that still use the
heapam API.

1. I call heap_setscanlimits as I'm not sure there is a tableam equivalent.
2. I'm not sure whether non-heap tableam implementations can also be
supported by my TID Range Scan: we need to be able to set the scan
limits. There may not be any other implementations yet, but when
there are, how do we stop the planner using a TID Range Scan for
non-heap relations?
3. When fetching tuples, I see that nodeSeqscan.c uses
table_scan_getnextslot, which saves dealing with HeapTuples. But
nodeTidrangescan wants to do some checking of the block and offset
before returning the slot. So I have it using heap_getnext and
ExecStoreBufferHeapTuple. Apart from being heapam-specific, it's just
not as clean as the new API calls.

Ideally, we can get to to support general tableam implementations
rather than using heapam-specific calls. Any advice on how to do
this?

The commit message in 8586bf7ed mentions:

Subsequent commits will incrementally abstract table access
functionality to be routed through table access methods. That change
is too large to be reviewed & committed at once, so it'll be done
incrementally.

and looking at [1]/messages/by-id/20190311193746.hhv4e4e62nxtq3k6@alap3.anarazel.de I see patch 0004 introduces some changes in
nodeTidscan.c to call a new tableam API function named
heapam_fetch_row_version. I see this function does have a ItemPointer
argument, so I guess we must be keeping those as unique row
identifiers in the API.

Patch 0001 does change the signature of heap_setscanlimits() (appears
to be committed already), and then in 0010 the only code that calls
heap_setscanlimits() (IndexBuildHeapRangeScan()) is moved and renamed
to heapam_index_build_range_scan() and set to be called via the
index_build_range_scan TableAmRoutine method. So it looks like out of
that patch series nothing is there to allow you to access
heap_setscanlimits() directly via the TableAmRoutine API, so perhaps
for this to work heap_setscanlimits will need to be interfaced,
however, I'm unsure if that'll violate any assumptions that Andres
wants to keep out of the API... Andres?

[1]: /messages/by-id/20190311193746.hhv4e4e62nxtq3k6@alap3.anarazel.de

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#51

David Steele

david@pgmasters.net

almost 7 years ago

In reply to: David Rowley (#50)

Re: Re: Tid scan improvements

On 3/18/19 1:35 PM, David Rowley wrote:

On Fri, 15 Mar 2019 at 18:42, Edmund Horner <ejrh00@gmail.com> wrote:

Subsequent commits will incrementally abstract table access
functionality to be routed through table access methods. That change
is too large to be reviewed & committed at once, so it'll be done
incrementally.

and looking at [1] I see patch 0004 introduces some changes in
nodeTidscan.c to call a new tableam API function named
heapam_fetch_row_version. I see this function does have a ItemPointer
argument, so I guess we must be keeping those as unique row
identifiers in the API.

Patch 0001 does change the signature of heap_setscanlimits() (appears
to be committed already), and then in 0010 the only code that calls
heap_setscanlimits() (IndexBuildHeapRangeScan()) is moved and renamed
to heapam_index_build_range_scan() and set to be called via the
index_build_range_scan TableAmRoutine method. So it looks like out of
that patch series nothing is there to allow you to access
heap_setscanlimits() directly via the TableAmRoutine API, so perhaps
for this to work heap_setscanlimits will need to be interfaced,
however, I'm unsure if that'll violate any assumptions that Andres
wants to keep out of the API... Andres?

Thoughts on this, Andres?

Regards,
--
-David
david@pgmasters.net

#52

Andres Freund

andres@anarazel.de

almost 7 years ago

In reply to: Edmund Horner (#49)

Re: Tid scan improvements

Hi,

On 2019-03-15 18:42:40 +1300, Edmund Horner wrote:

I've had to adapt it to use the table scan API. I've got it compiling
and passing tests, but I'm uneasy about some things that still use the
heapam API.

1. I call heap_setscanlimits as I'm not sure there is a tableam
equivalent.

There used to be, but it wasn't clear that it was useful. In core pg the
only caller are index range scans, and those are - in a later patch in
the series - moved into the AM as well, as they need to deal with things
like HOT.

2. I'm not sure whether non-heap tableam implementations can also be
supported by my TID Range Scan: we need to be able to set the scan
limits. There may not be any other implementations yet, but when
there are, how do we stop the planner using a TID Range Scan for
non-heap relations?

I've not yet looked through your code, but if required we'd probably
need to add a new tableam callback. It'd be marked optional, and the
planner could just check for its presence. A later part of the pluggable
storage series does that for bitmap scans, perhaps it's worth looking at
that?

3. When fetching tuples, I see that nodeSeqscan.c uses
table_scan_getnextslot, which saves dealing with HeapTuples. But
nodeTidrangescan wants to do some checking of the block and offset
before returning the slot. So I have it using heap_getnext and
ExecStoreBufferHeapTuple. Apart from being heapam-specific, it's just
not as clean as the new API calls.

Yea, that's not ok. Note that, since yesterday, nodeTidscan doesn't
call heap_fetch() anymore (there's still a heap dependency, but that's
just for heap_get_latest_tid(), which I'll move into execMain or such).

Ideally, we can get to to support general tableam implementations
rather than using heapam-specific calls. Any advice on how to do
this?

Not yet - could you perhaps look at the bitmap scan patch in the tableam
queue, and see if that gives you inspiration?

- Andres

#53

Andres Freund

andres@anarazel.de

almost 7 years ago

In reply to: David Rowley (#50)

Re: Tid scan improvements

Hi,

On 2019-03-18 22:35:05 +1300, David Rowley wrote:

The commit message in 8586bf7ed mentions:

Subsequent commits will incrementally abstract table access
functionality to be routed through table access methods. That change
is too large to be reviewed & committed at once, so it'll be done
incrementally.

and looking at [1] I see patch 0004 introduces some changes in
nodeTidscan.c to call a new tableam API function named
heapam_fetch_row_version. I see this function does have a ItemPointer
argument, so I guess we must be keeping those as unique row
identifiers in the API.

Right, we are. At least for now - there's some discussions around
allowing different format for TIDs, to allow things like index organized
tables, but that's for later.

Patch 0001 does change the signature of heap_setscanlimits() (appears
to be committed already), and then in 0010 the only code that calls
heap_setscanlimits() (IndexBuildHeapRangeScan()) is moved and renamed
to heapam_index_build_range_scan() and set to be called via the
index_build_range_scan TableAmRoutine method. So it looks like out of
that patch series nothing is there to allow you to access
heap_setscanlimits() directly via the TableAmRoutine API, so perhaps
for this to work heap_setscanlimits will need to be interfaced,
however, I'm unsure if that'll violate any assumptions that Andres
wants to keep out of the API...

I was kinda hoping to keep block numbers out of the "main" APIs, to
avoid assuming everything is BLCKSZ based. I don't have a particular
problem allowing an optional setscanlimits type callback that works with
block numbers. The planner could check its presence and just not build
tid range scans if not present. Alternatively a bespoke scan API for
tid range scans, like the later patches in the tableam series for
bitmap, sample, analyze scans, might be an option.

Greetings,

Andres Freund

#54

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Andres Freund (#53)

Re: Tid scan improvements

Andres Freund <andres@anarazel.de> writes:

I was kinda hoping to keep block numbers out of the "main" APIs, to
avoid assuming everything is BLCKSZ based. I don't have a particular
problem allowing an optional setscanlimits type callback that works with
block numbers. The planner could check its presence and just not build
tid range scans if not present. Alternatively a bespoke scan API for
tid range scans, like the later patches in the tableam series for
bitmap, sample, analyze scans, might be an option.

Given Andres' API concerns, and the short amount of time remaining in
this CF, I'm not sure how much of this patch set we can expect to land
in v12. It seems like it might be a good idea to scale back our ambitions
and see whether there's a useful subset we can push in easily.

With that in mind, I went ahead and pushed 0001+0004, since improving
the planner's selectivity estimate for a "ctid vs constant" qual is
likely to be helpful whether the executor is smart about it or not.

FWIW, I don't really see the point of treating 0002 as a separate patch.
If it had some utility on its own, then it'd be sensible, but what
would that be? Also, it looks from 0002 like you are trying to overload
rs_startblock with a different meaning than it has for syncscans, and
I think that might be a bad idea.

regards, tom lane

#55

Edmund Horner

ejrh00@gmail.com

almost 7 years ago

In reply to: Tom Lane (#54)

Re: Tid scan improvements

On Tue, 26 Mar 2019 at 11:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

I was kinda hoping to keep block numbers out of the "main" APIs, to
avoid assuming everything is BLCKSZ based. I don't have a particular
problem allowing an optional setscanlimits type callback that works with
block numbers. The planner could check its presence and just not build
tid range scans if not present. Alternatively a bespoke scan API for
tid range scans, like the later patches in the tableam series for
bitmap, sample, analyze scans, might be an option.

Given Andres' API concerns, and the short amount of time remaining in
this CF, I'm not sure how much of this patch set we can expect to land
in v12. It seems like it might be a good idea to scale back our ambitions
and see whether there's a useful subset we can push in easily.

I agree. It'll take some time to digest Andres' advice and write a
better patch.

Should I set update CF app to a) set the target version to 13, and/or
move it to next commitfest?

With that in mind, I went ahead and pushed 0001+0004, since improving
the planner's selectivity estimate for a "ctid vs constant" qual is
likely to be helpful whether the executor is smart about it or not.

Cool.

FWIW, I don't really see the point of treating 0002 as a separate patch.
If it had some utility on its own, then it'd be sensible, but what
would that be? Also, it looks from 0002 like you are trying to overload
rs_startblock with a different meaning than it has for syncscans, and
I think that might be a bad idea.

Yeah I don't think either patch is useful without the other. They
were separate because, initially, only some of the TidRangeScan
functionality depended on it, and I was particularly uncomfortable
with what I was doing to heapam.c.

The changes in heapam.c were required for backward scan support, as
used by ORDER BY ctid DESC and MAX(ctid); and also for FETCH LAST and
FETCH PRIOR. I have removed the backward scans functionality from the
current set of patches, but support for backward cursor fetches
remains.

I guess to brutally simplify the patch further, we could give up
backward cursor fetches entirely? This means such cursors that end up
using a TidRangeScan will require SCROLL to go backwards (which is a
small pain for user experience), but TBH I don't think backwards-going
cursors on CTID will be hugely common.

I'm still not familiar enough with heapam.c to have any better ideas
on how to support backward scanning a limited range.

#56

David Steele

david@pgmasters.net

almost 7 years ago

In reply to: Edmund Horner (#55)

Re: Tid scan improvements

On 3/26/19 8:11 AM, Edmund Horner wrote:

On Tue, 26 Mar 2019 at 11:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

I was kinda hoping to keep block numbers out of the "main" APIs, to
avoid assuming everything is BLCKSZ based. I don't have a particular
problem allowing an optional setscanlimits type callback that works with
block numbers. The planner could check its presence and just not build
tid range scans if not present. Alternatively a bespoke scan API for
tid range scans, like the later patches in the tableam series for
bitmap, sample, analyze scans, might be an option.

Given Andres' API concerns, and the short amount of time remaining in
this CF, I'm not sure how much of this patch set we can expect to land
in v12. It seems like it might be a good idea to scale back our ambitions
and see whether there's a useful subset we can push in easily.

I agree. It'll take some time to digest Andres' advice and write a
better patch.

Should I set update CF app to a) set the target version to 13, and/or
move it to next commitfest?

If you plan to continue working on it in this CF then you can just
change the target to PG13. If you plan to take a break and pick up the
work later then go ahead and push it to the next CF.

Regards,
--
-David
david@pgmasters.net

#57

Andres Freund

andres@anarazel.de

almost 7 years ago

In reply to: Edmund Horner (#55)

Re: Tid scan improvements

Hi,

On 2019-03-26 19:11:13 +1300, Edmund Horner wrote:

The changes in heapam.c were required for backward scan support, as
used by ORDER BY ctid DESC and MAX(ctid); and also for FETCH LAST and
FETCH PRIOR. I have removed the backward scans functionality from the
current set of patches, but support for backward cursor fetches
remains.

I guess to brutally simplify the patch further, we could give up
backward cursor fetches entirely? This means such cursors that end up
using a TidRangeScan will require SCROLL to go backwards (which is a
small pain for user experience), but TBH I don't think backwards-going
cursors on CTID will be hugely common.

FWIW, I think it'd be entirely reasonable to remove support for backward
scans without SCROLL. In fact, I think it'd be wasted effort to maintain
code for it, without a pretty clear reason why we need it (unless it
were trivial to support, which it isn't).

Greetings,

Andres Freund

#58

Thomas Munro

thomas.munro@gmail.com

over 6 years ago

In reply to: David Steele (#56)

Re: Tid scan improvements

On Tue, Mar 26, 2019 at 7:25 PM David Steele <david@pgmasters.net> wrote:

On 3/26/19 8:11 AM, Edmund Horner wrote:

Should I set update CF app to a) set the target version to 13, and/or
move it to next commitfest?

If you plan to continue working on it in this CF then you can just
change the target to PG13. If you plan to take a break and pick up the
work later then go ahead and push it to the next CF.

Hi Edmund,

The new CF is here. I'm going through poking threads for submissions
that don't apply, but it sounds like this needs more than a rebase?
Perhaps this belongs in the next CF?

--
Thomas Munro
https://enterprisedb.com

#59

David Rowley

david.rowley@2ndquadrant.com

over 6 years ago

In reply to: Thomas Munro (#58)

Re: Tid scan improvements

On Mon, 1 Jul 2019 at 23:29, Thomas Munro <thomas.munro@gmail.com> wrote:

The new CF is here. I'm going through poking threads for submissions
that don't apply, but it sounds like this needs more than a rebase?
Perhaps this belongs in the next CF?

0001 and 0004 of v7 got pushed in PG12. The CFbot will be trying to
apply 0001 still, but on testing 0002, no joy there either.

It would be good to see this back in PG13. For now, I'll mark it as
waiting on author.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#60

Edmund Horner

ejrh00@gmail.com

over 6 years ago

In reply to: David Rowley (#59)

Re: Tid scan improvements

On Thu, 4 Jul 2019 at 15:43, David Rowley <david.rowley@2ndquadrant.com> wrote:

On Mon, 1 Jul 2019 at 23:29, Thomas Munro <thomas.munro@gmail.com> wrote:

The new CF is here. I'm going through poking threads for submissions
that don't apply, but it sounds like this needs more than a rebase?
Perhaps this belongs in the next CF?

0001 and 0004 of v7 got pushed in PG12. The CFbot will be trying to
apply 0001 still, but on testing 0002, no joy there either.

It would be good to see this back in PG13. For now, I'll mark it as
waiting on author.

Hi,

I'm not really sure how to proceed. I started with a fairly pragmatic
solution to "WHERE ctid > ? AND ctid < ?" for tables, and then tableam
came along.

The options I see are:

A. Continue to target only heapam tables, making the bare minimum
changes necessary for the new tableam api.
B. Try to do something more general that works on all tableam
implementations for which it may be useful.

There may not be much different between them, but B. means a bit more
research into zheap, zstore and other possible tableams.

Next question, how will the executor access the table?

1. Continue to use the seqscan tableam methods, by setting limits.
2. Use the bitmap scan methods, for instance by faking a BitmapIteratorResuit.
3. Add new tableam methods specially for scanning a range of TIDs.

Any thoughts?

#61

David Rowley

david.rowley@2ndquadrant.com

over 6 years ago

In reply to: Edmund Horner (#60)

Re: Tid scan improvements

On Sun, 7 Jul 2019 at 15:32, Edmund Horner <ejrh00@gmail.com> wrote:

I'm not really sure how to proceed. I started with a fairly pragmatic
solution to "WHERE ctid > ? AND ctid < ?" for tables, and then tableam
came along.

The options I see are:

A. Continue to target only heapam tables, making the bare minimum
changes necessary for the new tableam api.
B. Try to do something more general that works on all tableam
implementations for which it may be useful.

Going by the conversation with Andres above:

On Tue, 26 Mar 2019 at 10:52, Andres Freund <andres@anarazel.de> wrote:

On 2019-03-18 22:35:05 +1300, David Rowley wrote:

The commit message in 8586bf7ed mentions:

Subsequent commits will incrementally abstract table access
functionality to be routed through table access methods. That change
is too large to be reviewed & committed at once, so it'll be done
incrementally.

and looking at [1] I see patch 0004 introduces some changes in
nodeTidscan.c to call a new tableam API function named
heapam_fetch_row_version. I see this function does have a ItemPointer
argument, so I guess we must be keeping those as unique row
identifiers in the API.

Right, we are. At least for now - there's some discussions around
allowing different format for TIDs, to allow things like index organized
tables, but that's for later.

So it seems that the plan is to insist that TIDs are tuple identifiers
for all table AMs, for now. If that changes in the future, then so be
it, but I don't think that's cause for delaying any work on TID Range
Scans. Also from scanning around tableam.h, I see that there's no
shortage of usages of BlockNumber, so it seems reasonable to assume
table AMs must use blocks... It's hard to imagine moving away from
that given that we have shared buffers.

We do appear to have some table AM methods that are optional, although
I'm not sure where the documentation is about that. For example, in
get_relation_info() I see:

info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
relation->rd_tableam->scan_bitmap_next_block != NULL;

so it appears that at least scan_bitmap_next_block is optional.

I think what I'd do would be to add a table_setscanlimits API method
for table AM and perhaps have the planner only add TID range scan
paths if the relation has a non-NULL function pointer for that API
function. It would be good to stick a comment at least in tableam.h
that mentions that the callback is optional. That might help a bit
when it comes to writing documentation on each API function and what
they do.

There may not be much different between them, but B. means a bit more
research into zheap, zstore and other possible tableams.

Next question, how will the executor access the table?

1. Continue to use the seqscan tableam methods, by setting limits.
2. Use the bitmap scan methods, for instance by faking a BitmapIteratorResuit.
3. Add new tableam methods specially for scanning a range of TIDs.

Any thoughts?

I think #1 is fine for now. #3 might be slightly more efficient since
you'd not need to read each tuple in the given page before the given
offset and throw it away, but I don't really think it's worth adding a
new table AM method for.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#62

Edmund Horner

ejrh00@gmail.com

over 6 years ago

In reply to: David Rowley (#61)

Re: Tid scan improvements

On Thu, 11 Jul 2019 at 10:22, David Rowley <david.rowley@2ndquadrant.com> wrote:

A. Continue to target only heapam tables, making the bare minimum
changes necessary for the new tableam api.
B. Try to do something more general that works on all tableam
implementations for which it may be useful.

Going by the conversation with Andres above:

On Tue, 26 Mar 2019 at 10:52, Andres Freund <andres@anarazel.de> wrote:

On 2019-03-18 22:35:05 +1300, David Rowley wrote:

The commit message in 8586bf7ed mentions:

Subsequent commits will incrementally abstract table access
functionality to be routed through table access methods. That change
is too large to be reviewed & committed at once, so it'll be done
incrementally.

and looking at [1] I see patch 0004 introduces some changes in
nodeTidscan.c to call a new tableam API function named
heapam_fetch_row_version. I see this function does have a ItemPointer
argument, so I guess we must be keeping those as unique row
identifiers in the API.

Right, we are. At least for now - there's some discussions around
allowing different format for TIDs, to allow things like index organized
tables, but that's for later.

So it seems that the plan is to insist that TIDs are tuple identifiers
for all table AMs, for now. If that changes in the future, then so be
it, but I don't think that's cause for delaying any work on TID Range
Scans. Also from scanning around tableam.h, I see that there's no
shortage of usages of BlockNumber, so it seems reasonable to assume
table AMs must use blocks... It's hard to imagine moving away from
that given that we have shared buffers.

We do appear to have some table AM methods that are optional, although
I'm not sure where the documentation is about that. For example, in
get_relation_info() I see:

info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
relation->rd_tableam->scan_bitmap_next_block != NULL;

so it appears that at least scan_bitmap_next_block is optional.

I think what I'd do would be to add a table_setscanlimits API method
for table AM and perhaps have the planner only add TID range scan
paths if the relation has a non-NULL function pointer for that API
function. It would be good to stick a comment at least in tableam.h
that mentions that the callback is optional. That might help a bit
when it comes to writing documentation on each API function and what
they do.

This seems like a good path forward.

There may not be much different between them, but B. means a bit more
research into zheap, zstore and other possible tableams.

Next question, how will the executor access the table?

1. Continue to use the seqscan tableam methods, by setting limits.
2. Use the bitmap scan methods, for instance by faking a BitmapIteratorResuit.
3. Add new tableam methods specially for scanning a range of TIDs.

Any thoughts?

I think #1 is fine for now. #3 might be slightly more efficient since
you'd not need to read each tuple in the given page before the given
offset and throw it away, but I don't really think it's worth adding a
new table AM method for.

Yeah, it's not perfectly efficient in that regard. But it's at least
a step in the right direction.

Thanks for the guidance David. I think I'll be able to make some
progress now before hitting the next obstacle!

Edmund

#63

Edmund Horner

ejrh00@gmail.com

over 6 years ago

In reply to: David Rowley (#61)

1 attachment(s)

Re: Tid scan improvements

On Thu, 11 Jul 2019 at 10:22, David Rowley <david.rowley@2ndquadrant.com> wrote:

So it seems that the plan is to insist that TIDs are tuple identifiers
for all table AMs, for now. If that changes in the future, then so be
it, but I don't think that's cause for delaying any work on TID Range
Scans. Also from scanning around tableam.h, I see that there's no
shortage of usages of BlockNumber, so it seems reasonable to assume
table AMs must use blocks... It's hard to imagine moving away from
that given that we have shared buffers.

We do appear to have some table AM methods that are optional, although
I'm not sure where the documentation is about that. For example, in
get_relation_info() I see:

info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
relation->rd_tableam->scan_bitmap_next_block != NULL;

so it appears that at least scan_bitmap_next_block is optional.

I think what I'd do would be to add a table_setscanlimits API method
for table AM and perhaps have the planner only add TID range scan
paths if the relation has a non-NULL function pointer for that API
function. It would be good to stick a comment at least in tableam.h
that mentions that the callback is optional. That might help a bit
when it comes to writing documentation on each API function and what
they do.

Hi. Here's a new patch.

Summary of changes compared to last time:
- I've added the additional "scan_setlimits" table AM method. To
check whether it's implemented in the planner, I have added an
additional "has_scan_setlimits" flag to RelOptInfo. It seems to work.
- I've also changed nodeTidrangescan to not require anything from heapam now.
- To simply the patch and avoid changing heapam, I've removed the
backward scan support (which was needed for FETCH LAST/PRIOR) and made
ExecSupportsBackwardScan return false for this plan type.
- I've removed the vestigial passing of "direction" through
nodeTidrangescan. If my understanding is correct, the direction
passed to TidRangeNext will always be forward. But I did leave FETCH
LAST/PRIOR in the regression tests (after adding SCROLL to the
cursor).

The patch now only targets the simple use case of restricting the
range of a table to scan. I think it would be nice to eventually
support the other major use cases of ORDER BY ctid ASC/DESC and
MIN/MAX(ctid), but that can be another feature...

Edmund

Attachments:

v8-0001-Add-a-new-plan-type-Tid-Range-Scan-to-support-range-.patchapplication/octet-stream; name=v8-0001-Add-a-new-plan-type-Tid-Range-Scan-to-support-range-.patchDownload

From a87440180dbc3356739abf6e493a84a6e6f8505f Mon Sep 17 00:00:00 2001
From: ejrh <ejrh00@gmail.com>
Date: Wed, 30 Jan 2019 10:37:10 +1300
Subject: [PATCH] Add a new plan type, Tid Range Scan, to support range quals
 over CTID.

This means queries with expressions such as "ctid >= ? AND ctid < ?" can be
answered by scanning over that part of a table, rather than falling back to a
full SeqScan.
---
 src/backend/access/heap/heapam_handler.c   |   1 +
 src/backend/commands/explain.c             |  23 ++
 src/backend/executor/Makefile              |   1 +
 src/backend/executor/execAmi.c             |   9 +
 src/backend/executor/execProcnode.c        |  10 +
 src/backend/executor/nodeTidrangescan.c    | 573 +++++++++++++++++++++++++++++
 src/backend/nodes/copyfuncs.c              |  24 ++
 src/backend/nodes/outfuncs.c               |  13 +
 src/backend/optimizer/README               |   1 +
 src/backend/optimizer/path/costsize.c      |  96 +++++
 src/backend/optimizer/path/tidpath.c       | 109 +++++-
 src/backend/optimizer/plan/createplan.c    |  98 +++++
 src/backend/optimizer/plan/setrefs.c       |  13 +
 src/backend/optimizer/plan/subselect.c     |   6 +
 src/backend/optimizer/util/pathnode.c      |  29 ++
 src/backend/optimizer/util/plancat.c       |   3 +
 src/backend/optimizer/util/relnode.c       |   3 +
 src/include/access/tableam.h               |  20 +
 src/include/catalog/pg_operator.dat        |   6 +-
 src/include/executor/nodeTidrangescan.h    |  23 ++
 src/include/nodes/execnodes.h              |  22 ++
 src/include/nodes/nodes.h                  |   3 +
 src/include/nodes/pathnodes.h              |  13 +
 src/include/nodes/plannodes.h              |  13 +
 src/include/optimizer/cost.h               |   2 +
 src/include/optimizer/pathnode.h           |   2 +
 src/test/regress/expected/tidrangescan.out | 238 ++++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/sql/tidrangescan.sql      |  74 ++++
 src/tools/pgindent/typedefs.list           |   5 +
 30 files changed, 1421 insertions(+), 14 deletions(-)
 create mode 100644 src/backend/executor/nodeTidrangescan.c
 create mode 100644 src/include/executor/nodeTidrangescan.h
 create mode 100644 src/test/regress/expected/tidrangescan.out
 create mode 100644 src/test/regress/sql/tidrangescan.sql

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 09bc6fe..1751ca6 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2499,6 +2499,7 @@ static const TableAmRoutine heapam_methods = {
 	.scan_begin = heap_beginscan,
 	.scan_end = heap_endscan,
 	.scan_rescan = heap_rescan,
+	.scan_setlimits = heap_setscanlimits,
 	.scan_getnextslot = heap_getnextslot,
 
 	.parallelscan_estimate = table_block_parallelscan_estimate,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index dff2ed3..ea779bc 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1009,6 +1009,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1155,6 +1156,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_TidScan:
 			pname = sname = "Tid Scan";
 			break;
+		case T_TidRangeScan:
+			pname = sname = "Tid Range Scan";
+			break;
 		case T_SubqueryScan:
 			pname = sname = "Subquery Scan";
 			break;
@@ -1346,6 +1350,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SampleScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1767,6 +1772,23 @@ ExplainNode(PlanState *planstate, List *ancestors,
 											   planstate, es);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				/*
+				 * The tidrangequals list has AND semantics, so be sure to
+				 * show it as an AND condition.
+				 */
+				List	   *tidquals = ((TidRangeScan *) plan)->tidrangequals;
+
+				if (list_length(tidquals) > 1)
+					tidquals = list_make1(make_andclause(tidquals));
+				show_scan_qual(tidquals, "TID Cond", planstate, ancestors, es);
+				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+				if (plan->qual)
+					show_instrumentation_count("Rows Removed by Filter", 1,
+											   planstate, es);
+			}
+			break;
 		case T_ForeignScan:
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
@@ -3054,6 +3076,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index cc09895..0152e31 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -28,6 +28,7 @@ OBJS = execAmi.o execCurrent.o execExpr.o execExprInterp.o \
        nodeValuesscan.o \
        nodeCtescan.o nodeNamedtuplestorescan.o nodeWorktablescan.o \
        nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
+       nodeTidrangescan.o \
        nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o tqueue.o spi.o \
        nodeTableFuncscan.o
 
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 1f18e5d..d64771a 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -51,6 +51,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -198,6 +199,10 @@ ExecReScan(PlanState *node)
 			ExecReScanTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecReScanTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecReScanSubqueryScan((SubqueryScanState *) node);
 			break;
@@ -531,6 +536,10 @@ ExecSupportsBackwardScan(Plan *node)
 			/* Simplify life for tablesample methods by disallowing this */
 			return false;
 
+		case T_TidRangeScan:
+			/* Keep TidRangeScan as simple as possible. */
+			return false;
+
 		case T_Gather:
 			return false;
 
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index c227282..23561c9 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -108,6 +108,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -238,6 +239,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 												   estate, eflags);
 			break;
 
+		case T_TidRangeScan:
+			result = (PlanState *) ExecInitTidRangeScan((TidRangeScan *) node,
+														estate, eflags);
+			break;
+
 		case T_SubqueryScan:
 			result = (PlanState *) ExecInitSubqueryScan((SubqueryScan *) node,
 														estate, eflags);
@@ -632,6 +638,10 @@ ExecEndNode(PlanState *node)
 			ExecEndTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecEndTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecEndSubqueryScan((SubqueryScanState *) node);
 			break;
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
new file mode 100644
index 0000000..c4706e3
--- /dev/null
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -0,0 +1,573 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.c
+ *	  Routines to support tid range scans of relations
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeTidrangescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ *
+ *		ExecTidRangeScan		scans a relation using a range of tids
+ *		ExecInitTidRangeScan	creates and initializes state info.
+ *		ExecReScanTidRangeScan	rescans the tid relation.
+ *		ExecEndTidRangeScan		releases all storage.
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "access/tableam.h"
+#include "catalog/pg_operator.h"
+#include "executor/execdebug.h"
+#include "executor/nodeTidrangescan.h"
+#include "nodes/nodeFuncs.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+
+
+#define IsCTIDVar(node)  \
+	((node) != NULL && \
+	 IsA((node), Var) && \
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
+	 ((Var *) (node))->varlevelsup == 0)
+
+typedef enum
+{
+	TIDEXPR_UPPER_BOUND,
+	TIDEXPR_LOWER_BOUND
+} TidExprType;
+
+/* one element in TidExpr's opexprs */
+typedef struct TidOpExpr
+{
+	TidExprType exprtype;		/* type of op */
+	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
+	bool		inclusive;		/* whether op is inclusive */
+} TidOpExpr;
+
+/*
+ * For the given 'expr', build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.
+ */
+static TidOpExpr *
+MakeTidOpExpr(OpExpr *expr, TidRangeScanState *tidstate)
+{
+	Node	   *arg1 = get_leftop((Expr *) expr);
+	Node	   *arg2 = get_rightop((Expr *) expr);
+	ExprState  *exprstate = NULL;
+	bool		invert = false;
+	TidOpExpr  *tidopexpr;
+
+	if (IsCTIDVar(arg1))
+		exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
+	else if (IsCTIDVar(arg2))
+	{
+		exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
+		invert = true;
+	}
+	else
+		elog(ERROR, "could not identify CTID variable");
+
+	tidopexpr = (TidOpExpr *) palloc0(sizeof(TidOpExpr));
+
+	switch (expr->opno)
+	{
+		case TIDLessEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDLessOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
+			break;
+		case TIDGreaterEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDGreaterOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
+			break;
+		default:
+			elog(ERROR, "could not identify CTID expression");
+	}
+
+	tidopexpr->exprstate = exprstate;
+
+	return tidopexpr;
+}
+
+/*
+ * Extract the qual subexpressions that yield TIDs to search for,
+ * and compile them into ExprStates if they're ordinary expressions.
+ */
+static void
+TidExprListCreate(TidRangeScanState *tidrangestate)
+{
+	TidRangeScan *node = (TidRangeScan *) tidrangestate->ss.ps.plan;
+	List	   *tidexprs = NIL;
+	ListCell   *l;
+
+	foreach(l, node->tidrangequals)
+	{
+		OpExpr	   *opexpr = lfirst(l);
+		TidOpExpr  *tidopexpr = MakeTidOpExpr(opexpr, tidrangestate);
+
+		tidexprs = lappend(tidexprs, tidopexpr);
+	}
+
+	tidrangestate->trss_tidexprs = tidexprs;
+}
+
+/*
+ * Set a lower bound tid, taking into account the inclusivity of the bound.
+ * Return true if the bound is valid.
+ */
+static bool
+SetTidLowerBound(ItemPointer tid, bool inclusive, ItemPointer lowerBound)
+{
+	OffsetNumber offset;
+
+	*lowerBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	if (!inclusive)
+	{
+		/* Check if the lower bound is actually in the next block. */
+		if (offset >= MaxOffsetNumber)
+		{
+			BlockNumber block = ItemPointerGetBlockNumberNoCheck(lowerBound);
+
+			/*
+			 * If the lower bound was already at or above the maximum block
+			 * number, then there is no valid range.
+			 */
+			if (block >= MaxBlockNumber)
+				return false;
+
+			ItemPointerSet(lowerBound, block + 1, 1);
+		}
+		else
+			ItemPointerSetOffsetNumber(lowerBound, OffsetNumberNext(offset));
+	}
+	else if (offset == 0)
+		ItemPointerSetOffsetNumber(lowerBound, 1);
+
+	return true;
+}
+
+/*
+ * Set an upper bound tid, taking into account the inclusivity of the bound.
+ * Return true if the bound is valid.
+ */
+static bool
+SetTidUpperBound(ItemPointer tid, bool inclusive, ItemPointer upperBound)
+{
+	OffsetNumber offset;
+
+	*upperBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	/*
+	 * Since TID offsets start at 1, an inclusive upper bound with offset 0
+	 * can be treated as an exclusive bound.  This has the benefit of
+	 * eliminating that block from the scan range.
+	 */
+	if (inclusive && offset == 0)
+		inclusive = false;
+
+	if (!inclusive)
+	{
+		/* Check if the upper bound is actually in the previous block. */
+		if (offset == 0)
+		{
+			BlockNumber block = ItemPointerGetBlockNumberNoCheck(upperBound);
+
+			/*
+			 * If the upper bound was already in block 0, then there is no
+			 * valid range.
+			 */
+			if (block == 0)
+				return false;
+
+			ItemPointerSet(upperBound, block - 1, MaxOffsetNumber);
+		}
+		else
+			ItemPointerSetOffsetNumber(upperBound, OffsetNumberPrev(offset));
+	}
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeEval
+ *
+ *		Compute the range of TIDs to scan, by evaluating the
+ *		expressions for them.
+ * ----------------------------------------------------------------
+ */
+static void
+TidRangeEval(TidRangeScanState *node)
+{
+	ExprContext *econtext = node->ss.ps.ps_ExprContext;
+	BlockNumber nblocks;
+	ItemPointerData lowerBound;
+	ItemPointerData upperBound;
+	ListCell   *l;
+
+	/*
+	 * We silently discard any TIDs that are out of range at the time of scan
+	 * start.  (Since we hold at least AccessShareLock on the table, it won't
+	 * be possible for someone to truncate away the blocks we intend to
+	 * visit.)
+	 */
+	nblocks = RelationGetNumberOfBlocks(node->ss.ss_currentRelation);
+
+
+	/* The biggest range on an empty table is empty; just skip it. */
+	if (nblocks == 0)
+		return;
+
+	/* Set the lower and upper bound to scan the whole table. */
+	ItemPointerSet(&lowerBound, 0, 1);
+	ItemPointerSet(&upperBound, nblocks - 1, MaxOffsetNumber);
+
+	foreach(l, node->trss_tidexprs)
+	{
+		TidOpExpr  *tidopexpr = (TidOpExpr *) lfirst(l);
+		ItemPointer itemptr;
+		bool		isNull;
+
+		/* Evaluate this bound. */
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidopexpr->exprstate,
+													  econtext,
+													  &isNull));
+
+		/* If the bound is NULL, *nothing* matches the qual. */
+		if (isNull)
+			return;
+
+		if (tidopexpr->exprtype == TIDEXPR_LOWER_BOUND)
+		{
+			ItemPointerData lb;
+
+			if (!SetTidLowerBound(itemptr, tidopexpr->inclusive, &lb))
+				return;
+
+			if (ItemPointerCompare(&lb, &lowerBound) > 0)
+				lowerBound = lb;
+		}
+
+		if (tidopexpr->exprtype == TIDEXPR_UPPER_BOUND)
+		{
+			ItemPointerData ub;
+
+			if (!SetTidUpperBound(itemptr, tidopexpr->inclusive, &ub))
+				return;
+
+			if (ItemPointerCompare(&ub, &upperBound) < 0)
+				upperBound = ub;
+		}
+	}
+
+	/* If the resulting range is not empty, use it. */
+	if (ItemPointerCompare(&lowerBound, &upperBound) <= 0)
+	{
+		node->trss_startBlock = ItemPointerGetBlockNumberNoCheck(&lowerBound);
+		node->trss_endBlock = ItemPointerGetBlockNumberNoCheck(&upperBound);
+		node->trss_startOffset = ItemPointerGetOffsetNumberNoCheck(&lowerBound);
+		node->trss_endOffset = ItemPointerGetOffsetNumberNoCheck(&upperBound);
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		NextInTidRange
+ *
+ *		Fetch the next tuple when scanning a range of TIDs.
+ *
+ *		Since the heap access method may return tuples that are in the scan
+ *		limit, but not within the required TID range, this function will
+ *		check for such tuples and skip over them.
+ * ----------------------------------------------------------------
+ */
+static bool
+NextInTidRange(TidRangeScanState *node, TableScanDesc scandesc,
+			   TupleTableSlot *slot)
+{
+	for (;;)
+	{
+		BlockNumber block;
+		OffsetNumber offset;
+
+		if (!table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+			return false;
+
+		/* Check that the tuple is within the required range. */
+		block = ItemPointerGetBlockNumber(&slot->tts_tid);
+		offset = ItemPointerGetOffsetNumber(&slot->tts_tid);
+
+		/* The tuple should never come from outside the scan limits. */
+		Assert(block >= node->trss_startBlock &&
+			   block <= node->trss_endBlock);
+
+		/*
+		 * If the tuple is in the first block of the range and before the
+		 * first requested offset, then we can skip it.
+		 */
+		if (block == node->trss_startBlock && offset < node->trss_startOffset)
+		{
+			ExecClearTuple(slot);
+			continue;
+		}
+
+		/*
+		 * Similarly, if the tuple is in the last block and after the last
+		 * requested offset, we can end the scan.
+		 */
+		if (block == node->trss_endBlock && offset > node->trss_endOffset)
+		{
+			ExecClearTuple(slot);
+			return false;
+		}
+
+		return true;
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeNext
+ *
+ *		Retrieve a tuple from the TidRangeScan node's currentRelation
+ *		using the tids in the TidRangeScanState information.
+ *
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+TidRangeNext(TidRangeScanState *node)
+{
+	TableScanDesc scandesc;
+	EState	   *estate;
+	TupleTableSlot *slot;
+	bool		foundTuple;
+
+	/*
+	 * extract necessary information from tid scan node
+	 */
+	scandesc = node->ss.ss_currentScanDesc;
+	estate = node->ss.ps.state;
+	slot = node->ss.ss_ScanTupleSlot;
+
+	Assert(ScanDirectionIsForward(estate->es_direction));
+
+	if (!node->trss_inScan)
+	{
+		BlockNumber blocks_to_scan;
+
+		/* First time through, compute the list of TID ranges to be visited */
+		if (node->trss_startBlock == InvalidBlockNumber)
+			TidRangeEval(node);
+
+		if (scandesc == NULL)
+		{
+			scandesc = table_beginscan_strat(node->ss.ss_currentRelation,
+											 estate->es_snapshot,
+											 0, NULL,
+											 false, false);
+			node->ss.ss_currentScanDesc = scandesc;
+		}
+
+		/* Compute the number of blocks to scan and set the scan limits. */
+		if (node->trss_startBlock == InvalidBlockNumber)
+		{
+			/* If the range is empty, set the scan limits to zero blocks. */
+			node->trss_startBlock = 0;
+			blocks_to_scan = 0;
+		}
+		else
+			blocks_to_scan = node->trss_endBlock - node->trss_startBlock + 1;
+
+		table_scan_setlimits(scandesc, node->trss_startBlock, blocks_to_scan);
+		node->trss_inScan = true;
+	}
+
+	/* Fetch the next tuple. */
+	foundTuple = NextInTidRange(node, scandesc, slot);
+
+	/*
+	 * If we've exhuasted all the tuples in the range, reset the inScan flag.
+	 * This will cause the heap to be rescanned for any subsequent fetches,
+	 * which is important for some cursor operations: for instance, FETCH LAST
+	 * fetches all the tuples in order and then fetches one tuple in reverse.
+	 */
+	if (!foundTuple)
+		node->trss_inScan = false;
+
+	return slot;
+}
+
+/*
+ * TidRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot)
+{
+	/*
+	 * XXX shouldn't we check here to make sure tuple is in TID range? In
+	 * runtime-key case this is not certain, is it?
+	 */
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecTidRangeScan(node)
+ *
+ *		Scans the relation using tids and returns the next qualifying tuple.
+ *		We call the ExecScan() routine and pass it the appropriate
+ *		access method functions.
+ *
+ *		Conditions:
+ *		  -- the "cursor" maintained by the AMI is positioned at the tuple
+ *			 returned previously.
+ *
+ *		Initial States:
+ *		  -- the relation indicated is opened for scanning so that the
+ *			 "cursor" is positioned before the first qualifying tuple.
+ *		  -- trss_startBlock is InvalidBlockNumber
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecTidRangeScan(PlanState *pstate)
+{
+	TidRangeScanState *node = castNode(TidRangeScanState, pstate);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) TidRangeNext,
+					(ExecScanRecheckMtd) TidRangeRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanTidRangeScan(node)
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanTidRangeScan(TidRangeScanState *node)
+{
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+
+	if (scan != NULL)
+		table_rescan(scan,		/* scan desc */
+					 NULL);		/* new scan keys */
+
+	/* mark scan as not in progress, and tid range list as not computed yet */
+	node->trss_inScan = false;
+	node->trss_startBlock = InvalidBlockNumber;
+
+	ExecScanReScan(&node->ss);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecEndTidRangeScan
+ *
+ *		Releases any storage allocated through C routines.
+ *		Returns nothing.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndTidRangeScan(TidRangeScanState *node)
+{
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+
+	/*
+	 * Free the exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+	/* close heap scan */
+	if (scan != NULL)
+		table_endscan(scan);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitTidRangeScan
+ *
+ *		Initializes the tid range scan's state information, creates
+ *		scan keys, and opens the base and tid relations.
+ *
+ *		Parameters:
+ *		  node: TidRangeScan node produced by the planner.
+ *		  estate: the execution state initialized in InitPlan.
+ * ----------------------------------------------------------------
+ */
+TidRangeScanState *
+ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
+{
+	TidRangeScanState *tidrangestate;
+	Relation	currentRelation;
+
+	/*
+	 * create state structure
+	 */
+	tidrangestate = makeNode(TidRangeScanState);
+	tidrangestate->ss.ps.plan = (Plan *) node;
+	tidrangestate->ss.ps.state = estate;
+	tidrangestate->ss.ps.ExecProcNode = ExecTidRangeScan;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &tidrangestate->ss.ps);
+
+	/*
+	 * mark scan as not in progress, and tid range list as not computed yet
+	 */
+	tidrangestate->trss_inScan = false;
+	tidrangestate->trss_startBlock = InvalidBlockNumber;
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	tidrangestate->ss.ss_currentRelation = currentRelation;
+	tidrangestate->ss.ss_currentScanDesc = NULL;	/* no heap scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &tidrangestate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTypeTL(&tidrangestate->ss.ps);
+	ExecAssignScanProjectionInfo(&tidrangestate->ss);
+
+	/*
+	 * initialize child expressions
+	 */
+	tidrangestate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) tidrangestate);
+
+	TidExprListCreate(tidrangestate);
+
+	/*
+	 * all done.
+	 */
+	return tidrangestate;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 78deade..551b221 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -586,6 +586,27 @@ _copyTidScan(const TidScan *from)
 }
 
 /*
+ * _copyTidRangeScan
+ */
+static TidRangeScan *
+_copyTidRangeScan(const TidRangeScan *from)
+{
+	TidRangeScan *newnode = makeNode(TidRangeScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_NODE_FIELD(tidrangequals);
+
+	return newnode;
+}
+
+/*
  * _copySubqueryScan
  */
 static SubqueryScan *
@@ -4855,6 +4876,9 @@ copyObjectImpl(const void *from)
 		case T_TidScan:
 			retval = _copyTidScan(from);
 			break;
+		case T_TidRangeScan:
+			retval = _copyTidRangeScan(from);
+			break;
 		case T_SubqueryScan:
 			retval = _copySubqueryScan(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 8400dd3..a12c8bb 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -609,6 +609,16 @@ _outTidScan(StringInfo str, const TidScan *node)
 }
 
 static void
+_outTidRangeScan(StringInfo str, const TidRangeScan *node)
+{
+	WRITE_NODE_TYPE("TIDRANGESCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_NODE_FIELD(tidrangequals);
+}
+
+static void
 _outSubqueryScan(StringInfo str, const SubqueryScan *node)
 {
 	WRITE_NODE_TYPE("SUBQUERYSCAN");
@@ -3701,6 +3711,9 @@ outNode(StringInfo str, const void *obj)
 			case T_TidScan:
 				_outTidScan(str, obj);
 				break;
+			case T_TidRangeScan:
+				_outTidRangeScan(str, obj);
+				break;
 			case T_SubqueryScan:
 				_outSubqueryScan(str, obj);
 				break;
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 89ce373..6d2f7b8 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -374,6 +374,7 @@ RelOptInfo      - a relation or joined relations
   IndexPath     - index scan
   BitmapHeapPath - top of a bitmapped index scan
   TidPath       - scan by CTID
+  TidRangePath  - scan a contiguous range of CTIDs
   SubqueryScanPath - scan a subquery-in-FROM
   ForeignPath   - scan a foreign table, foreign join or foreign upper-relation
   CustomPath    - for custom scan providers
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a2a9b1f..abf8b73 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1272,6 +1272,102 @@ cost_tidscan(Path *path, PlannerInfo *root,
 }
 
 /*
+ * cost_tidrangescan
+ *	  Determines and returns the cost of scanning a relation using a range of
+ *	  TIDs.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'tidrangequals' is the list of TID-checkable range quals
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_tidrangescan(Path *path, PlannerInfo *root,
+				  RelOptInfo *baserel, List *tidrangequals, ParamPathInfo *param_info)
+{
+	Selectivity selectivity;
+	double		pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	QualCost	tid_qual_cost;
+	double		ntuples;
+	double		nrandompages;
+	double		nseqpages;
+	double		spc_random_page_cost;
+	double		spc_seq_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (param_info)
+		path->rows = param_info->ppi_rows;
+	else
+		path->rows = baserel->rows;
+
+	/* Count how many tuples and pages we expect to scan */
+	selectivity = clauselist_selectivity(root, tidrangequals, baserel->relid,
+										 JOIN_INNER, NULL);
+	pages = ceil(selectivity * baserel->pages);
+
+	if (pages <= 0.0)
+		pages = 1.0;
+
+	/*
+	 * The first page in a range requires a random seek, but each subsequent
+	 * page is just a normal sequential page read. NOTE: it's desirable for
+	 * Tid Range Scans to cost more than the equivalent Sequential Scans,
+	 * because Seq Scans have some performance advantages such as scan
+	 * synchronization and parallelizability, and we'd prefer one of them to
+	 * be picked unless a Tid Range Scan really is better.
+	 */
+	ntuples = selectivity * baserel->tuples;
+	nseqpages = pages - 1.0;
+	nrandompages = 1.0;
+
+	if (!enable_tidscan)
+		startup_cost += disable_cost;
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrieved tuple.
+	 */
+	cost_qual_eval(&tid_qual_cost, tidrangequals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/* disk costs */
+	run_cost += spc_random_page_cost * nrandompages + spc_seq_page_cost * nseqpages;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+	/*
+	 * XXX currently we assume TID quals are a subset of qpquals at this
+	 * point; they will be removed (if possible) when we create the plan, so
+	 * we subtract their cost from the total qpqual cost.  (If the TID quals
+	 * can't be removed, this is a mistake and we're going to underestimate
+	 * the CPU cost a bit.)
+	 */
+	startup_cost += qpqual_cost.startup + tid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		tid_qual_cost.per_tuple;
+	run_cost += cpu_per_tuple * ntuples;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->pathtarget->cost.startup;
+	run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+	path->startup_cost = startup_cost;
+	path->total_cost = startup_cost + run_cost;
+}
+
+/*
  * cost_subqueryscan
  *	  Determines and returns the cost of scanning a subquery RTE.
  *
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 466e996..3f8533c 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -2,9 +2,9 @@
  *
  * tidpath.c
  *	  Routines to determine which TID conditions are usable for scanning
- *	  a given relation, and create TidPaths accordingly.
+ *	  a given relation, and create TidPaths and TidRangePaths accordingly.
  *
- * What we are looking for here is WHERE conditions of the form
+ * For TidPaths, we look for WHERE conditions of the form
  * "CTID = pseudoconstant", which can be implemented by just fetching
  * the tuple directly via heap_fetch().  We can also handle OR'd conditions
  * such as (CTID = const1) OR (CTID = const2), as well as ScalarArrayOpExpr
@@ -23,6 +23,9 @@
  * a function, but in practice it works better to keep the special node
  * representation all the way through to execution.
  *
+ * Additionally, TidRangePaths may be created for conditions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=, and
+ * AND-clauses composed of such conditions.
  *
  * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -63,14 +66,14 @@ IsCTIDVar(Var *var, RelOptInfo *rel)
 
 /*
  * Check to see if a RestrictInfo is of the form
- *		CTID = pseudoconstant
+ *		CTID OP pseudoconstant
  * or
- *		pseudoconstant = CTID
- * where the CTID Var belongs to relation "rel", and nothing on the
- * other side of the clause does.
+ *		pseudoconstant OP CTID
+ * where OP is a binary operation, the CTID Var belongs to relation "rel",
+ * and nothing on the other side of the clause does.
  */
 static bool
-IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+IsTidBinaryClause(RestrictInfo *rinfo, RelOptInfo *rel)
 {
 	OpExpr	   *node;
 	Node	   *arg1,
@@ -83,10 +86,9 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 		return false;
 	node = (OpExpr *) rinfo->clause;
 
-	/* Operator must be tideq */
-	if (node->opno != TIDEqualOperator)
+	/* Operator must take two arguments */
+	if (list_length(node->args) != 2)
 		return false;
-	Assert(list_length(node->args) == 2);
 	arg1 = linitial(node->args);
 	arg2 = lsecond(node->args);
 
@@ -118,6 +120,44 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 
 /*
  * Check to see if a RestrictInfo is of the form
+ *		CTID = pseudoconstant
+ * or
+ *		pseudoconstant = CTID
+ * where the CTID Var belongs to relation "rel", and nothing on the
+ * other side of the clause does.
+ */
+static bool
+IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	if (!IsTidBinaryClause(rinfo, rel))
+		return false;
+	return ((OpExpr *) rinfo->clause)->opno == TIDEqualOperator;
+}
+
+/*
+ * Check to see if a RestrictInfo is of the form
+ *		CTID OP pseudoconstant
+ * or
+ *		pseudoconstant OP CTID
+ * where OP is a range operator such as <, <=, >, or >=, the CTID Var belongs
+ * to relation "rel", and nothing on the other side of the clause does.
+ */
+static bool
+IsTidRangeClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	Oid			opno;
+
+	if (!IsTidBinaryClause(rinfo, rel))
+		return false;
+	opno = ((OpExpr *) rinfo->clause)->opno;
+	return opno == TIDLessOperator ||
+		opno == TIDLessEqOperator ||
+		opno == TIDGreaterOperator ||
+		opno == TIDGreaterEqOperator;
+}
+
+/*
+ * Check to see if a RestrictInfo is of the form
  *		CTID = ANY (pseudoconstant_array)
  * where the CTID Var belongs to relation "rel", and nothing on the
  * other side of the clause does.
@@ -302,6 +342,35 @@ TidQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
 }
 
 /*
+ * Extract a set of CTID range conditions from implicit-AND List of RestrictInfos
+ *
+ * Returns a List of CTID range qual RestrictInfos for the specified rel
+ * (with implicit AND semantics across the list), or NIL if there are no
+ * usable conditions.
+ */
+static List *
+TidRangeQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
+{
+	List	   *rlst = NIL;
+	ListCell   *l;
+
+	if (!rel->has_scan_setlimits)
+		return NIL;
+
+	foreach(l, rlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (IsTidRangeClause(rinfo, rel))
+		{
+			rlst = lappend(rlst, rinfo);
+		}
+	}
+
+	return rlst;
+}
+
+/*
  * Given a list of join clauses involving our rel, create a parameterized
  * TidPath for each one that is a suitable TidEqual clause.
  *
@@ -385,6 +454,7 @@ void
 create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 {
 	List	   *tidquals;
+	List	   *tidrangequals;
 
 	/*
 	 * If any suitable quals exist in the rel's baserestrict list, generate a
@@ -405,6 +475,25 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 	}
 
 	/*
+	 * If there are range quals in the baserestrict list, generate a
+	 * TidRangePath.
+	 */
+	tidrangequals = TidRangeQualFromRestrictInfoList(rel->baserestrictinfo, rel);
+
+	if (tidrangequals)
+	{
+		/*
+		 * This path uses no join clauses, but it could still have required
+		 * parameterization due to LATERAL refs in its tlist.
+		 */
+		Relids		required_outer = rel->lateral_relids;
+
+		add_path(rel, (Path *) create_tidrangescan_path(root, rel,
+														tidrangequals,
+														required_outer));
+	}
+
+	/*
 	 * Try to generate parameterized TidPaths using equality clauses extracted
 	 * from EquivalenceClasses.  (This is important since simple "t1.ctid =
 	 * t2.ctid" clauses will turn into ECs.)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 12fba56..b845194 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -127,6 +127,8 @@ static Plan *create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
 static void bitmap_subplan_mark_shared(Plan *plan);
 static TidScan *create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 									List *tlist, List *scan_clauses);
+static TidRangeScan *create_tidrangescan_plan(PlannerInfo *root, TidRangePath *best_path,
+											  List *tlist, List *scan_clauses);
 static SubqueryScan *create_subqueryscan_plan(PlannerInfo *root,
 											  SubqueryScanPath *best_path,
 											  List *tlist, List *scan_clauses);
@@ -191,6 +193,8 @@ static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 											Index scanrelid);
 static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
 							 List *tidquals);
+static TidRangeScan *make_tidrangescan(List *qptlist, List *qpqual,
+									   Index scanrelid, List *tidrangequals);
 static SubqueryScan *make_subqueryscan(List *qptlist,
 									   List *qpqual,
 									   Index scanrelid,
@@ -373,6 +377,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -663,6 +668,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												scan_clauses);
 			break;
 
+		case T_TidRangeScan:
+			plan = (Plan *) create_tidrangescan_plan(root,
+													 (TidRangePath *) best_path,
+													 tlist,
+													 scan_clauses);
+			break;
+
 		case T_SubqueryScan:
 			plan = (Plan *) create_subqueryscan_plan(root,
 													 (SubqueryScanPath *) best_path,
@@ -3357,6 +3369,73 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 }
 
 /*
+ * create_tidrangescan_plan
+ *	 Returns a tidrangescan plan for the base relation scanned by 'best_path'
+ *	 with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static TidRangeScan *
+create_tidrangescan_plan(PlannerInfo *root, TidRangePath *best_path,
+						 List *tlist, List *scan_clauses)
+{
+	TidRangeScan *scan_plan;
+	Index		scan_relid = best_path->path.parent->relid;
+	List	   *tidrangequals = best_path->tidrangequals;
+
+	/* it should be a base rel... */
+	Assert(scan_relid > 0);
+	Assert(best_path->path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * The qpqual list must contain all restrictions not enforced by the
+	 * tidrangequals list.  tidquals has AND semantics, so we can simply
+	 * remove any qual that appears in it.
+	 */
+	{
+		List	   *qpqual = NIL;
+		ListCell   *l;
+
+		foreach(l, scan_clauses)
+		{
+			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+			if (rinfo->pseudoconstant)
+				continue;		/* we may drop pseudoconstants here */
+			if (list_member_ptr(tidrangequals, rinfo))
+				continue;		/* simple duplicate */
+			if (is_redundant_derived_clause(rinfo, tidrangequals))
+				continue;		/* derived from same EquivalenceClass */
+			qpqual = lappend(qpqual, rinfo);
+		}
+		scan_clauses = qpqual;
+	}
+
+	/* Sort clauses into best execution order */
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+
+	/* Reduce RestrictInfo lists to bare expressions; ignore pseudoconstants */
+	tidrangequals = extract_actual_clauses(tidrangequals, false);
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/* Replace any outer-relation variables with nestloop params */
+	if (best_path->path.param_info)
+	{
+		tidrangequals = (List *)
+			replace_nestloop_params(root, (Node *) tidrangequals);
+		scan_clauses = (List *)
+			replace_nestloop_params(root, (Node *) scan_clauses);
+	}
+
+	scan_plan = make_tidrangescan(tlist,
+								  scan_clauses,
+								  scan_relid,
+								  tidrangequals);
+
+	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
+
+	return scan_plan;
+}
+
+/*
  * create_subqueryscan_plan
  *	 Returns a subqueryscan plan for the base relation scanned by 'best_path'
  *	 with restriction clauses 'scan_clauses' and targetlist 'tlist'.
@@ -5258,6 +5337,25 @@ make_tidscan(List *qptlist,
 	return node;
 }
 
+static TidRangeScan *
+make_tidrangescan(List *qptlist,
+				  List *qpqual,
+				  Index scanrelid,
+				  List *tidrangequals)
+{
+	TidRangeScan *node = makeNode(TidRangeScan);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->tidrangequals = tidrangequals;
+
+	return node;
+}
+
 static SubqueryScan *
 make_subqueryscan(List *qptlist,
 				  List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index dc11f09..69a5e73 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -547,6 +547,19 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 					fix_scan_list(root, splan->tidquals, rtoffset);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				TidRangeScan *splan = (TidRangeScan *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual, rtoffset);
+				splan->tidrangequals =
+					fix_scan_list(root, splan->tidrangequals, rtoffset);
+			}
+			break;
 		case T_SubqueryScan:
 			/* Needs special treatment, see comments below */
 			return set_subqueryscan_references(root,
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index efd0fbc..0725125 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2301,6 +2301,12 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_TidRangeScan:
+			finalize_primnode((Node *) ((TidRangeScan *) plan)->tidrangequals,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			break;
+
 		case T_SubqueryScan:
 			{
 				SubqueryScan *sscan = (SubqueryScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d884d2b..91aea4d 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1198,6 +1198,35 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 }
 
 /*
+ * create_tidscan_path
+ *	  Creates a path corresponding to a scan by a range of TIDs, returning
+ *	  the pathnode.
+ */
+TidRangePath *
+create_tidrangescan_path(PlannerInfo *root, RelOptInfo *rel, List *tidrangequals,
+						 Relids required_outer)
+{
+	TidRangePath *pathnode = makeNode(TidRangePath);
+
+	pathnode->path.pathtype = T_TidRangeScan;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_workers = 0;
+	pathnode->path.pathkeys = NIL;	/* always unordered */
+
+	pathnode->tidrangequals = tidrangequals;
+
+	cost_tidrangescan(&pathnode->path, root, rel, tidrangequals,
+					  pathnode->path.param_info);
+
+	return pathnode;
+}
+
+/*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
  *	  pathnode.
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 40f4976..5c49df9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -444,6 +444,9 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	/* Collect info about relation's foreign keys, if relevant */
 	get_relation_foreign_keys(root, rel, relation, inhparent);
 
+	/* Collect info about functions implemented by the rel's table AM. */
+	rel->has_scan_setlimits = relation->rd_tableam && relation->rd_tableam->scan_bitmap_next_block != NULL;
+
 	/*
 	 * Collect info about relation's partitioning scheme, if any. Only
 	 * inheritance parents may be partitioned.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 6054bd2..fd485fd 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -234,6 +234,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->baserestrict_min_security = UINT_MAX;
 	rel->joininfo = NIL;
 	rel->has_eclass_joins = false;
+	rel->has_scan_setlimits = false;
 	rel->consider_partitionwise_join = false;	/* might get changed later */
 	rel->part_scheme = NULL;
 	rel->nparts = 0;
@@ -645,6 +646,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->baserestrict_min_security = UINT_MAX;
 	joinrel->joininfo = NIL;
 	joinrel->has_eclass_joins = false;
+	joinrel->has_scan_setlimits = false;
 	joinrel->consider_partitionwise_join = false;	/* might get changed later */
 	joinrel->top_parent_relids = NULL;
 	joinrel->part_scheme = NULL;
@@ -820,6 +822,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	joinrel->baserestrictcost.per_tuple = 0;
 	joinrel->joininfo = NIL;
 	joinrel->has_eclass_joins = false;
+	joinrel->has_scan_setlimits = false;
 	joinrel->consider_partitionwise_join = false;	/* might get changed later */
 	joinrel->top_parent_relids = NULL;
 	joinrel->part_scheme = NULL;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index c2b0481..109741b 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -217,6 +217,15 @@ typedef struct TableAmRoutine
 								bool allow_sync, bool allow_pagemode);
 
 	/*
+	 * Set the range of a scan.
+	 *
+	 * Optional callback: A table AM can implement this to enable TID range
+	 * scans.
+	 */
+	void		(*scan_setlimits) (TableScanDesc scan,
+								   BlockNumber startBlk, BlockNumber numBlks);
+
+	/*
 	 * Return next tuple from `scan`, store in slot.
 	 */
 	bool		(*scan_getnextslot) (TableScanDesc scan,
@@ -844,6 +853,17 @@ table_rescan(TableScanDesc scan,
 }
 
 /*
+ * Set the range of a scan.
+ */
+static inline void
+table_scan_setlimits(TableScanDesc scan,
+					 BlockNumber startBlk, BlockNumber numBlks)
+{
+	Assert(scan->rs_rd->rd_tableam->scan_setlimits != NULL);
+	scan->rs_rd->rd_tableam->scan_setlimits(scan, startBlk, numBlks);
+}
+
+/*
  * Restart a relation scan after changing params.
  *
  * This call allows changing the buffer strategy, syncscan, and pagemode
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 96823cd..5a32361 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -216,15 +216,15 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git a/src/include/executor/nodeTidrangescan.h b/src/include/executor/nodeTidrangescan.h
new file mode 100644
index 0000000..cff8790
--- /dev/null
+++ b/src/include/executor/nodeTidrangescan.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeTidrangescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODETIDRANGESCAN_H
+#define NODETIDRANGESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern TidRangeScanState *ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags);
+extern void ExecEndTidRangeScan(TidRangeScanState *node);
+extern void ExecReScanTidRangeScan(TidRangeScanState *node);
+
+#endif							/* NODETIDRANGESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 98bdcbc..fdf37b0 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1575,6 +1575,28 @@ typedef struct TidScanState
 } TidScanState;
 
 /* ----------------
+ *	 TidRangeScanState information
+ *
+ *		trss_tidexprs		list of TidOpExpr structs (see nodeTidrangescan.c)
+ *		trss_startBlock		first block to scan
+ *		trss_endBlock		last block to scan (inclusive)
+ *		trss_startOffset	first offset in first block to scan
+ *		trss_endOffset		last offset in last block to scan (inclusive)
+ *		trss_inScan			is a scan currently in progress?
+ * ----------------
+ */
+typedef struct TidRangeScanState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	List	   *trss_tidexprs;
+	BlockNumber trss_startBlock;
+	BlockNumber trss_endBlock;
+	OffsetNumber trss_startOffset;
+	OffsetNumber trss_endOffset;
+	bool		trss_inScan;
+} TidRangeScanState;
+
+/* ----------------
  *	 SubqueryScanState information
  *
  *		SubqueryScanState is used for scanning a sub-query in the range table.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 4e2fb39..5a1cdce 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -59,6 +59,7 @@ typedef enum NodeTag
 	T_BitmapIndexScan,
 	T_BitmapHeapScan,
 	T_TidScan,
+	T_TidRangeScan,
 	T_SubqueryScan,
 	T_FunctionScan,
 	T_ValuesScan,
@@ -115,6 +116,7 @@ typedef enum NodeTag
 	T_BitmapIndexScanState,
 	T_BitmapHeapScanState,
 	T_TidScanState,
+	T_TidRangeScanState,
 	T_SubqueryScanState,
 	T_FunctionScanState,
 	T_TableFuncScanState,
@@ -229,6 +231,7 @@ typedef enum NodeTag
 	T_BitmapAndPath,
 	T_BitmapOrPath,
 	T_TidPath,
+	T_TidRangePath,
 	T_SubqueryScanPath,
 	T_ForeignPath,
 	T_CustomPath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 441e64e..a0b44c6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -703,6 +703,7 @@ typedef struct RelOptInfo
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
 	bool		has_eclass_joins;	/* T means joininfo is incomplete */
+	bool		has_scan_setlimits; /* Rel's table AM has scan_setlimits */
 
 	/* used by partitionwise joins: */
 	bool		consider_partitionwise_join;	/* consider partitionwise join
@@ -1286,6 +1287,18 @@ typedef struct TidPath
 } TidPath;
 
 /*
+ * TidRangePath represents a scan by a continguous range of TIDs
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ */
+typedef struct TidRangePath
+{
+	Path		path;
+	List	   *tidrangequals;
+} TidRangePath;
+
+/*
  * SubqueryScanPath represents a scan of an unflattened subquery-in-FROM
  *
  * Note that the subpath comes from a different planning domain; for example
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 70f8b8e..5b97ac5 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -490,6 +490,19 @@ typedef struct TidScan
 } TidScan;
 
 /* ----------------
+ *		tid range scan node
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ * ----------------
+ */
+typedef struct TidRangeScan
+{
+	Scan		scan;
+	List	   *tidrangequals;	/* qual(s) involving CTID op something */
+} TidRangeScan;
+
+/* ----------------
  *		subquery scan node
  *
  * SubqueryScan is for scanning the output of a sub-query in the range table.
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index b3d0b4f..9d90a99 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -82,6 +82,8 @@ extern void cost_bitmap_or_node(BitmapOrPath *path, PlannerInfo *root);
 extern void cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec);
 extern void cost_tidscan(Path *path, PlannerInfo *root,
 						 RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info);
+extern void cost_tidrangescan(Path *path, PlannerInfo *root,
+							  RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info);
 extern void cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
 							  RelOptInfo *baserel, ParamPathInfo *param_info);
 extern void cost_functionscan(Path *path, PlannerInfo *root,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 182ffee..23b6bc4 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,6 +63,8 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 										   List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 									List *tidquals, Relids required_outer);
+extern TidRangePath *create_tidrangescan_path(PlannerInfo *root, RelOptInfo *rel,
+											  List *tidrangequals, Relids required_outer);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 									  List *subpaths, List *partial_subpaths,
 									  List *pathkeys, Relids required_outer,
diff --git a/src/test/regress/expected/tidrangescan.out b/src/test/regress/expected/tidrangescan.out
new file mode 100644
index 0000000..6a1d086
--- /dev/null
+++ b/src/test/regress/expected/tidrangescan.out
@@ -0,0 +1,238 @@
+-- tests for tidrangescans
+CREATE TABLE tidrangescan(id integer, data text);
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid <= '(1,5)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+ (1,1)
+ (1,2)
+ (1,3)
+ (1,4)
+ (1,5)
+(15 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(0,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid > '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+  ctid  
+--------
+ (9,9)
+ (9,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ('(9,8)'::tid < ctid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+  ctid  
+--------
+ (9,9)
+ (9,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(9,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+  ctid  
+--------
+ (9,8)
+ (9,9)
+ (9,10)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(100,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ((ctid > '(4,4)'::tid) AND ('(4,7)'::tid >= ctid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+ ctid  
+-------
+ (4,5)
+ (4,6)
+ (4,7)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+ ctid  
+-------
+ (4,5)
+ (4,6)
+ (4,7)
+(3 rows)
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan where ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan where ctid < '(0,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+ ctid 
+------
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid > '(9,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+ ctid 
+------
+(0 rows)
+
+-- cursors
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH NEXT c;
+ ctid  
+-------
+ (0,2)
+(1 row)
+
+FETCH PRIOR c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH FIRST c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH LAST c;
+  ctid  
+--------
+ (0,10)
+(1 row)
+
+COMMIT;
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 8fb55f0..86b42ce 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -78,7 +78,7 @@ test: brin gin gist spgist privileges init_privs security_label collate matview
 # ----------
 # Another group of parallel tests
 # ----------
-test: create_table_like alter_generic alter_operator misc async dbsize misc_functions sysviews tsrf tidscan
+test: create_table_like alter_generic alter_operator misc async dbsize misc_functions sysviews tsrf tidscan tidrangescan
 
 # rules cannot run concurrently with any test that creates
 # a view or rule in the public schema
diff --git a/src/test/regress/sql/tidrangescan.sql b/src/test/regress/sql/tidrangescan.sql
new file mode 100644
index 0000000..1baf584
--- /dev/null
+++ b/src/test/regress/sql/tidrangescan.sql
@@ -0,0 +1,74 @@
+-- tests for tidrangescans
+
+CREATE TABLE tidrangescan(id integer, data text);
+
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,1000) AS s(i);
+DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+VACUUM tidrangescan;
+
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan where ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+SELECT ctid FROM tidrangescan where ctid < '(0,0)' LIMIT 1;
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+
+-- cursors
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+FETCH NEXT c;
+FETCH PRIOR c;
+FETCH FIRST c;
+FETCH LAST c;
+COMMIT;
+
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 432d2d8..8579646 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2426,8 +2426,13 @@ TextPositionState
 TheLexeme
 TheSubstitute
 TidExpr
+TidExprType
 TidHashKey
+TidOpExpr
 TidPath
+TidRangePath
+TidRangeScan
+TidRangeScanState
 TidScan
 TidScanState
 TimeADT
-- 
2.7.4

#64

David Rowley

david.rowley@2ndquadrant.com

over 6 years ago

In reply to: Edmund Horner (#63)

2 attachment(s)

Re: Tid scan improvements

On Mon, 15 Jul 2019 at 17:54, Edmund Horner <ejrh00@gmail.com> wrote:

Summary of changes compared to last time:
- I've added the additional "scan_setlimits" table AM method. To
check whether it's implemented in the planner, I have added an
additional "has_scan_setlimits" flag to RelOptInfo. It seems to work.
- I've also changed nodeTidrangescan to not require anything from heapam now.
- To simply the patch and avoid changing heapam, I've removed the
backward scan support (which was needed for FETCH LAST/PRIOR) and made
ExecSupportsBackwardScan return false for this plan type.
- I've removed the vestigial passing of "direction" through
nodeTidrangescan. If my understanding is correct, the direction
passed to TidRangeNext will always be forward. But I did leave FETCH
LAST/PRIOR in the regression tests (after adding SCROLL to the
cursor).

I spent some time today hacking at this. I fixed a bug in how
has_scan_setlimits set, rewrite a few comments and simplified some of
the code.

When I mentioned up-thread about the optional scan_setlimits table AM
callback, I'd forgotten that you'd not have access to check that
directly during planning. As you mention above, you've added
RelOptInfo has_scan_setlimits so the planner knows if it can use TID
Range scans or not. It would be nice to not have to add this flag, but
that would require either:

1. Making scan_setlimits a non-optional callback function in table AM, or;
2. Allowing the planner to have access to the opened Relation.

#2 is not for this patch, but there has been some talk about it. It
was done for the executor last year in d73f4c74dd3.

I wonder if Andres has any thoughts on #1?

The other thing I was thinking about was if enable_tidscan should be
in charge of TID Range scans too. I see you have it that way, but
should we be adding enable_tidrangescan? The docs claim that
enable_tidscan: "Enables or disables the query planner's use of TID
scan plan types.". Note: "types" is plural. Maybe we could call that
fate and keep it the way the patch has it already. Does anyone have
another idea about that?

I've attached a delta of the changes I made and also a complete v9 patch.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

v8_to_v9_delta.patchapplication/octet-stream; name=v8_to_v9_delta.patchDownload

diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 0152e31f0c..ea053209a1 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -27,9 +27,8 @@ OBJS = execAmi.o execCurrent.o execExpr.o execExprInterp.o \
        nodeSamplescan.o nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
        nodeValuesscan.o \
        nodeCtescan.o nodeNamedtuplestorescan.o nodeWorktablescan.o \
-       nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
-       nodeTidrangescan.o \
-       nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o tqueue.o spi.o \
-       nodeTableFuncscan.o
+       nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidrangescan.o \
+       nodeTidscan.o nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o \
+       tqueue.o spi.o nodeTableFuncscan.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index d64771a097..58e4d8555a 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -536,10 +536,6 @@ ExecSupportsBackwardScan(Plan *node)
 			/* Simplify life for tablesample methods by disallowing this */
 			return false;
 
-		case T_TidRangeScan:
-			/* Keep TidRangeScan as simple as possible. */
-			return false;
-
 		case T_Gather:
 			return false;
 
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index c4706e3677..8a72f52074 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -12,14 +12,6 @@
  *
  *-------------------------------------------------------------------------
  */
-/*
- * INTERFACE ROUTINES
- *
- *		ExecTidRangeScan		scans a relation using a range of tids
- *		ExecInitTidRangeScan	creates and initializes state info.
- *		ExecReScanTidRangeScan	rescans the tid relation.
- *		ExecEndTidRangeScan		releases all storage.
- */
 #include "postgres.h"
 
 #include "access/relscan.h"
@@ -45,7 +37,7 @@ typedef enum
 	TIDEXPR_LOWER_BOUND
 } TidExprType;
 
-/* one element in TidExpr's opexprs */
+/* Upper or lower range bound for scan */
 typedef struct TidOpExpr
 {
 	TidExprType exprtype;		/* type of op */
@@ -93,7 +85,7 @@ MakeTidOpExpr(OpExpr *expr, TidRangeScanState *tidstate)
 			tidopexpr->exprtype = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
 			break;
 		default:
-			elog(ERROR, "could not identify CTID expression");
+			elog(ERROR, "could not identify CTID operator");
 	}
 
 	tidopexpr->exprstate = exprstate;
@@ -115,8 +107,12 @@ TidExprListCreate(TidRangeScanState *tidrangestate)
 	foreach(l, node->tidrangequals)
 	{
 		OpExpr	   *opexpr = lfirst(l);
-		TidOpExpr  *tidopexpr = MakeTidOpExpr(opexpr, tidrangestate);
+		TidOpExpr  *tidopexpr;
 
+		if (!IsA(opexpr, OpExpr))
+			elog(ERROR, "could not identify CTID expression");
+
+		tidopexpr = MakeTidOpExpr(opexpr, tidrangestate);
 		tidexprs = lappend(tidexprs, tidopexpr);
 	}
 
@@ -124,8 +120,10 @@ TidExprListCreate(TidRangeScanState *tidrangestate)
 }
 
 /*
- * Set a lower bound tid, taking into account the inclusivity of the bound.
- * Return true if the bound is valid.
+ * Set 'lowerBound' based on 'tid'.  If 'inclusive' is false then the
+ * lowerBound is incremented to the next tid value so that it becomes
+ * inclusive.  If there is no valid next tid value then we return false,
+ * otherwise we return true.
  */
 static bool
 SetTidLowerBound(ItemPointer tid, bool inclusive, ItemPointer lowerBound)
@@ -144,11 +142,12 @@ SetTidLowerBound(ItemPointer tid, bool inclusive, ItemPointer lowerBound)
 
 			/*
 			 * If the lower bound was already at or above the maximum block
-			 * number, then there is no valid range.
+			 * number, then there is no valid value for it be set to.
 			 */
 			if (block >= MaxBlockNumber)
 				return false;
 
+			/* Set the lowerBound to the first offset in the next block */
 			ItemPointerSet(lowerBound, block + 1, 1);
 		}
 		else
@@ -161,8 +160,10 @@ SetTidLowerBound(ItemPointer tid, bool inclusive, ItemPointer lowerBound)
 }
 
 /*
- * Set an upper bound tid, taking into account the inclusivity of the bound.
- * Return true if the bound is valid.
+ * Set 'upperBound' based on 'tid'.  If 'inclusive' is false then the
+ * upperBound is decremented to the previous tid value so that it becomes
+ * inclusive.  If there is no valid previous tid value then we return false,
+ * otherwise we return true.
  */
 static bool
 SetTidUpperBound(ItemPointer tid, bool inclusive, ItemPointer upperBound)
@@ -189,7 +190,7 @@ SetTidUpperBound(ItemPointer tid, bool inclusive, ItemPointer upperBound)
 
 			/*
 			 * If the upper bound was already in block 0, then there is no
-			 * valid range.
+			 * valid value for it to be set to.
 			 */
 			if (block == 0)
 				return false;
@@ -206,8 +207,9 @@ SetTidUpperBound(ItemPointer tid, bool inclusive, ItemPointer upperBound)
 /* ----------------------------------------------------------------
  *		TidRangeEval
  *
- *		Compute the range of TIDs to scan, by evaluating the
- *		expressions for them.
+ *		Compute and set node's block and offset range to scan by evaluating
+ *		the trss_tidexprs.  If we detect an invalid range that cannot yield
+ *		any rows, the range is left unset.
  * ----------------------------------------------------------------
  */
 static void
@@ -227,7 +229,6 @@ TidRangeEval(TidRangeScanState *node)
 	 */
 	nblocks = RelationGetNumberOfBlocks(node->ss.ss_currentRelation);
 
-
 	/* The biggest range on an empty table is empty; just skip it. */
 	if (nblocks == 0)
 		return;
@@ -256,6 +257,10 @@ TidRangeEval(TidRangeScanState *node)
 		{
 			ItemPointerData lb;
 
+			/*
+			 * If the lower bound is beyond the maximum value for ctid, then
+			 * just bail without setting the range.  No rows can match.
+			 */
 			if (!SetTidLowerBound(itemptr, tidopexpr->inclusive, &lb))
 				return;
 
@@ -267,6 +272,10 @@ TidRangeEval(TidRangeScanState *node)
 		{
 			ItemPointerData ub;
 
+			/*
+			 * If the upper bound is below the minimum value for ctid, then
+			 * just bail without setting the range.  No rows can match.
+			 */
 			if (!SetTidUpperBound(itemptr, tidopexpr->inclusive, &ub))
 				return;
 
@@ -275,7 +284,7 @@ TidRangeEval(TidRangeScanState *node)
 		}
 	}
 
-	/* If the resulting range is not empty, use it. */
+	/* If the resulting range is not empty, set it. */
 	if (ItemPointerCompare(&lowerBound, &upperBound) <= 0)
 	{
 		node->trss_startBlock = ItemPointerGetBlockNumberNoCheck(&lowerBound);
@@ -290,7 +299,7 @@ TidRangeEval(TidRangeScanState *node)
  *
  *		Fetch the next tuple when scanning a range of TIDs.
  *
- *		Since the heap access method may return tuples that are in the scan
+ *		Since the table access method may return tuples that are in the scan
  *		limit, but not within the required TID range, this function will
  *		check for such tuples and skip over them.
  * ----------------------------------------------------------------
@@ -399,7 +408,7 @@ TidRangeNext(TidRangeScanState *node)
 	foundTuple = NextInTidRange(node, scandesc, slot);
 
 	/*
-	 * If we've exhuasted all the tuples in the range, reset the inScan flag.
+	 * If we've exhausted all the tuples in the range, reset the inScan flag.
 	 * This will cause the heap to be rescanned for any subsequent fetches,
 	 * which is important for some cursor operations: for instance, FETCH LAST
 	 * fetches all the tuples in order and then fetches one tuple in reverse.
@@ -460,8 +469,7 @@ ExecReScanTidRangeScan(TidRangeScanState *node)
 	TableScanDesc scan = node->ss.ss_currentScanDesc;
 
 	if (scan != NULL)
-		table_rescan(scan,		/* scan desc */
-					 NULL);		/* new scan keys */
+		table_rescan(scan, NULL);
 
 	/* mark scan as not in progress, and tid range list as not computed yet */
 	node->trss_inScan = false;
@@ -482,6 +490,9 @@ ExecEndTidRangeScan(TidRangeScanState *node)
 {
 	TableScanDesc scan = node->ss.ss_currentScanDesc;
 
+	if (scan != NULL)
+		table_endscan(scan);
+
 	/*
 	 * Free the exprcontext
 	 */
@@ -493,10 +504,6 @@ ExecEndTidRangeScan(TidRangeScanState *node)
 	if (node->ss.ps.ps_ResultTupleSlot)
 		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
-	/* close heap scan */
-	if (scan != NULL)
-		table_endscan(scan);
 }
 
 /* ----------------------------------------------------------------
@@ -532,7 +539,7 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
 	ExecAssignExprContext(estate, &tidrangestate->ss.ps);
 
 	/*
-	 * mark scan as not in progress, and tid range list as not computed yet
+	 * mark scan as not in progress, and tid range as not computed yet
 	 */
 	tidrangestate->trss_inScan = false;
 	tidrangestate->trss_startBlock = InvalidBlockNumber;
@@ -543,7 +550,7 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
 	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
 
 	tidrangestate->ss.ss_currentRelation = currentRelation;
-	tidrangestate->ss.ss_currentScanDesc = NULL;	/* no heap scan here */
+	tidrangestate->ss.ss_currentScanDesc = NULL;	/* no table scan here */
 
 	/*
 	 * get the scan type from the relation descriptor.
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a67111ee6b..616fe75749 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1273,8 +1273,8 @@ cost_tidscan(Path *path, PlannerInfo *root,
 
 /*
  * cost_tidrangescan
- *	  Determines and returns the cost of scanning a relation using a range of
- *	  TIDs.
+ *	  Determines and sets the costs of scanning a relation using a range of
+ *	  TIDs for 'path'
  *
  * 'baserel' is the relation to be scanned
  * 'tidrangequals' is the list of TID-checkable range quals
@@ -1282,7 +1282,8 @@ cost_tidscan(Path *path, PlannerInfo *root,
  */
 void
 cost_tidrangescan(Path *path, PlannerInfo *root,
-				  RelOptInfo *baserel, List *tidrangequals, ParamPathInfo *param_info)
+				  RelOptInfo *baserel, List *tidrangequals,
+				  ParamPathInfo *param_info)
 {
 	Selectivity selectivity;
 	double		pages;
@@ -1292,7 +1293,6 @@ cost_tidrangescan(Path *path, PlannerInfo *root,
 	Cost		cpu_per_tuple;
 	QualCost	tid_qual_cost;
 	double		ntuples;
-	double		nrandompages;
 	double		nseqpages;
 	double		spc_random_page_cost;
 	double		spc_seq_page_cost;
@@ -1325,7 +1325,6 @@ cost_tidrangescan(Path *path, PlannerInfo *root,
 	 */
 	ntuples = selectivity * baserel->tuples;
 	nseqpages = pages - 1.0;
-	nrandompages = 1.0;
 
 	if (!enable_tidscan)
 		startup_cost += disable_cost;
@@ -1341,8 +1340,8 @@ cost_tidrangescan(Path *path, PlannerInfo *root,
 							  &spc_random_page_cost,
 							  &spc_seq_page_cost);
 
-	/* disk costs */
-	run_cost += spc_random_page_cost * nrandompages + spc_seq_page_cost * nseqpages;
+	/* disk costs; 1 random page and the remainder as seq pages */
+	run_cost += spc_random_page_cost + spc_seq_page_cost * nseqpages;
 
 	/* Add scanning CPU costs */
 	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 3f8533c8d8..2e8535fa14 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -73,7 +73,7 @@ IsCTIDVar(Var *var, RelOptInfo *rel)
  * and nothing on the other side of the clause does.
  */
 static bool
-IsTidBinaryClause(RestrictInfo *rinfo, RelOptInfo *rel)
+IsBinaryTidClause(RestrictInfo *rinfo, RelOptInfo *rel)
 {
 	OpExpr	   *node;
 	Node	   *arg1,
@@ -86,7 +86,7 @@ IsTidBinaryClause(RestrictInfo *rinfo, RelOptInfo *rel)
 		return false;
 	node = (OpExpr *) rinfo->clause;
 
-	/* Operator must take two arguments */
+	/* OpExpr must have two arguments */
 	if (list_length(node->args) != 2)
 		return false;
 	arg1 = linitial(node->args);
@@ -129,9 +129,13 @@ IsTidBinaryClause(RestrictInfo *rinfo, RelOptInfo *rel)
 static bool
 IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 {
-	if (!IsTidBinaryClause(rinfo, rel))
+	if (!IsBinaryTidClause(rinfo, rel))
 		return false;
-	return ((OpExpr *) rinfo->clause)->opno == TIDEqualOperator;
+
+	if (((OpExpr *) rinfo->clause)->opno == TIDEqualOperator)
+		return true;
+
+	return false;
 }
 
 /*
@@ -147,13 +151,15 @@ IsTidRangeClause(RestrictInfo *rinfo, RelOptInfo *rel)
 {
 	Oid			opno;
 
-	if (!IsTidBinaryClause(rinfo, rel))
+	if (!IsBinaryTidClause(rinfo, rel))
 		return false;
 	opno = ((OpExpr *) rinfo->clause)->opno;
-	return opno == TIDLessOperator ||
-		opno == TIDLessEqOperator ||
-		opno == TIDGreaterOperator ||
-		opno == TIDGreaterEqOperator;
+
+	if (opno == TIDLessOperator || opno == TIDLessEqOperator ||
+		opno == TIDGreaterOperator || opno == TIDGreaterEqOperator)
+		return true;
+
+	return false;
 }
 
 /*
@@ -262,7 +268,7 @@ TidQualFromRestrictInfo(RestrictInfo *rinfo, RelOptInfo *rel)
  *
  * Returns a List of CTID qual RestrictInfos for the specified rel (with
  * implicit OR semantics across the list), or NIL if there are no usable
- * conditions.
+ * equality conditions.
  *
  * This function is just concerned with handling AND/OR recursion.
  */
@@ -346,7 +352,7 @@ TidQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
  *
  * Returns a List of CTID range qual RestrictInfos for the specified rel
  * (with implicit AND semantics across the list), or NIL if there are no
- * usable conditions.
+ * usable range conditions.
  */
 static List *
 TidRangeQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
@@ -362,9 +368,7 @@ TidRangeQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
 		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
 
 		if (IsTidRangeClause(rinfo, rel))
-		{
 			rlst = lappend(rlst, rinfo);
-		}
 	}
 
 	return rlst;
@@ -478,7 +482,8 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 	 * If there are range quals in the baserestrict list, generate a
 	 * TidRangePath.
 	 */
-	tidrangequals = TidRangeQualFromRestrictInfoList(rel->baserestrictinfo, rel);
+	tidrangequals = TidRangeQualFromRestrictInfoList(rel->baserestrictinfo,
+													 rel);
 
 	if (tidrangequals)
 	{
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 018da82719..104be4082d 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -127,8 +127,10 @@ static Plan *create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
 static void bitmap_subplan_mark_shared(Plan *plan);
 static TidScan *create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 									List *tlist, List *scan_clauses);
-static TidRangeScan *create_tidrangescan_plan(PlannerInfo *root, TidRangePath *best_path,
-											  List *tlist, List *scan_clauses);
+static TidRangeScan *create_tidrangescan_plan(PlannerInfo *root,
+											  TidRangePath *best_path,
+											  List *tlist,
+											  List *scan_clauses);
 static SubqueryScan *create_subqueryscan_plan(PlannerInfo *root,
 											  SubqueryScanPath *best_path,
 											  List *tlist, List *scan_clauses);
@@ -3386,7 +3388,7 @@ create_tidrangescan_plan(PlannerInfo *root, TidRangePath *best_path,
 
 	/*
 	 * The qpqual list must contain all restrictions not enforced by the
-	 * tidrangequals list.  tidquals has AND semantics, so we can simply
+	 * tidrangequals list.  tidrangequals has AND semantics, so we can simply
 	 * remove any qual that appears in it.
 	 */
 	{
@@ -3401,8 +3403,6 @@ create_tidrangescan_plan(PlannerInfo *root, TidRangePath *best_path,
 				continue;		/* we may drop pseudoconstants here */
 			if (list_member_ptr(tidrangequals, rinfo))
 				continue;		/* simple duplicate */
-			if (is_redundant_derived_clause(rinfo, tidrangequals))
-				continue;		/* derived from same EquivalenceClass */
 			qpqual = lappend(qpqual, rinfo);
 		}
 		scan_clauses = qpqual;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index c06a053ca9..5c7a3a04d0 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1178,8 +1178,8 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  *	  the pathnode.
  */
 TidRangePath *
-create_tidrangescan_path(PlannerInfo *root, RelOptInfo *rel, List *tidrangequals,
-						 Relids required_outer)
+create_tidrangescan_path(PlannerInfo *root, RelOptInfo *rel,
+						 List *tidrangequals, Relids required_outer)
 {
 	TidRangePath *pathnode = makeNode(TidRangePath);
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 162a369655..77e4ba6726 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -445,7 +445,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	get_relation_foreign_keys(root, rel, relation, inhparent);
 
 	/* Collect info about functions implemented by the rel's table AM. */
-	rel->has_scan_setlimits = relation->rd_tableam && relation->rd_tableam->scan_bitmap_next_block != NULL;
+	rel->has_scan_setlimits = relation->rd_tableam &&
+							  relation->rd_tableam->scan_setlimits != NULL;
 
 	/*
 	 * Collect info about relation's partitioning scheme, if any. Only
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 109741b52e..fa47119213 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -859,7 +859,6 @@ static inline void
 table_scan_setlimits(TableScanDesc scan,
 					 BlockNumber startBlk, BlockNumber numBlks)
 {
-	Assert(scan->rs_rd->rd_tableam->scan_setlimits != NULL);
 	scan->rs_rd->rd_tableam->scan_setlimits(scan, startBlk, numBlks);
 }
 
diff --git a/src/include/executor/nodeTidrangescan.h b/src/include/executor/nodeTidrangescan.h
index cff87907fa..f0bbcc6a04 100644
--- a/src/include/executor/nodeTidrangescan.h
+++ b/src/include/executor/nodeTidrangescan.h
@@ -16,7 +16,8 @@
 
 #include "nodes/execnodes.h"
 
-extern TidRangeScanState *ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags);
+extern TidRangeScanState *ExecInitTidRangeScan(TidRangeScan *node,
+											   EState *estate, int eflags);
 extern void ExecEndTidRangeScan(TidRangeScanState *node);
 extern void ExecReScanTidRangeScan(TidRangeScanState *node);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index fdf37b0bc9..6a1328481d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1580,7 +1580,8 @@ typedef struct TidScanState
  *		trss_tidexprs		list of TidOpExpr structs (see nodeTidrangescan.c)
  *		trss_startBlock		first block to scan
  *		trss_endBlock		last block to scan (inclusive)
- *		trss_startOffset	first offset in first block to scan
+ *		trss_startOffset	first offset in first block to scan or InvalidBlockNumber
+ *							when the range is not set
  *		trss_endOffset		last offset in last block to scan (inclusive)
  *		trss_inScan			is a scan currently in progress?
  * ----------------
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 9d90a996df..c352a7c1a7 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -83,7 +83,8 @@ extern void cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec);
 extern void cost_tidscan(Path *path, PlannerInfo *root,
 						 RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info);
 extern void cost_tidrangescan(Path *path, PlannerInfo *root,
-							  RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info);
+							  RelOptInfo *baserel, List *tidquals,
+							  ParamPathInfo *param_info);
 extern void cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
 							  RelOptInfo *baserel, ParamPathInfo *param_info);
 extern void cost_functionscan(Path *path, PlannerInfo *root,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 23b6bc4d3d..dff8fdd126 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,8 +63,10 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 										   List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 									List *tidquals, Relids required_outer);
-extern TidRangePath *create_tidrangescan_path(PlannerInfo *root, RelOptInfo *rel,
-											  List *tidrangequals, Relids required_outer);
+extern TidRangePath *create_tidrangescan_path(PlannerInfo *root,
+											  RelOptInfo *rel,
+											  List *tidrangequals,
+											  Relids required_outer);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 									  List *subpaths, List *partial_subpaths,
 									  List *pathkeys, Relids required_outer,
diff --git a/src/test/regress/expected/tidrangescan.out b/src/test/regress/expected/tidrangescan.out
index 6a1d08686c..fc11894c8e 100644
--- a/src/test/regress/expected/tidrangescan.out
+++ b/src/test/regress/expected/tidrangescan.out
@@ -1,7 +1,13 @@
 -- tests for tidrangescans
+SET enable_seqscan TO off;
 CREATE TABLE tidrangescan(id integer, data text);
-INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,1000) AS s(i);
-DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+-- insert enough tuples to fill at least two pages
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,200) AS s(i);
+-- remove all tuples after the 10th tuple on each page.  Trying to ensure
+-- we get the same layout with all CPU architectures and smaller than standard
+-- page sizes.
+DELETE FROM tidrangescan
+WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer > 2;
 VACUUM tidrangescan;
 -- range scans with upper bound
 EXPLAIN (COSTS OFF)
@@ -70,49 +76,49 @@ SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
 
 -- range scans with lower bound
 EXPLAIN (COSTS OFF)
-SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
             QUERY PLAN             
 -----------------------------------
  Tid Range Scan on tidrangescan
-   TID Cond: (ctid > '(9,8)'::tid)
+   TID Cond: (ctid > '(2,8)'::tid)
 (2 rows)
 
-SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
   ctid  
 --------
- (9,9)
- (9,10)
+ (2,9)
+ (2,10)
 (2 rows)
 
 EXPLAIN (COSTS OFF)
-SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
             QUERY PLAN             
 -----------------------------------
  Tid Range Scan on tidrangescan
-   TID Cond: ('(9,8)'::tid < ctid)
+   TID Cond: ('(2,8)'::tid < ctid)
 (2 rows)
 
-SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
   ctid  
 --------
- (9,9)
- (9,10)
+ (2,9)
+ (2,10)
 (2 rows)
 
 EXPLAIN (COSTS OFF)
-SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
              QUERY PLAN             
 ------------------------------------
  Tid Range Scan on tidrangescan
-   TID Cond: (ctid >= '(9,8)'::tid)
+   TID Cond: (ctid >= '(2,8)'::tid)
 (2 rows)
 
-SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
   ctid  
 --------
- (9,8)
- (9,9)
- (9,10)
+ (2,8)
+ (2,9)
+ (2,10)
 (3 rows)
 
 EXPLAIN (COSTS OFF)
@@ -130,35 +136,35 @@ SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
 
 -- range scans with both bounds
 EXPLAIN (COSTS OFF)
-SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
                            QUERY PLAN                           
 ----------------------------------------------------------------
  Tid Range Scan on tidrangescan
-   TID Cond: ((ctid > '(4,4)'::tid) AND ('(4,7)'::tid >= ctid))
+   TID Cond: ((ctid > '(1,4)'::tid) AND ('(1,7)'::tid >= ctid))
 (2 rows)
 
-SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
  ctid  
 -------
- (4,5)
- (4,6)
- (4,7)
+ (1,5)
+ (1,6)
+ (1,7)
 (3 rows)
 
 EXPLAIN (COSTS OFF)
-SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
                            QUERY PLAN                           
 ----------------------------------------------------------------
  Tid Range Scan on tidrangescan
-   TID Cond: (('(4,7)'::tid >= ctid) AND (ctid > '(4,4)'::tid))
+   TID Cond: (('(1,7)'::tid >= ctid) AND (ctid > '(1,4)'::tid))
 (2 rows)
 
-SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
  ctid  
 -------
- (4,5)
- (4,6)
- (4,7)
+ (1,5)
+ (1,6)
+ (1,7)
 (3 rows)
 
 -- extreme offsets
@@ -236,3 +242,4 @@ FETCH LAST c;
 COMMIT;
 DROP TABLE tidrangescan;
 DROP TABLE tidrangescan_empty;
+RESET enable_seqscan;
diff --git a/src/test/regress/sql/tidrangescan.sql b/src/test/regress/sql/tidrangescan.sql
index 1baf584937..d60439d56c 100644
--- a/src/test/regress/sql/tidrangescan.sql
+++ b/src/test/regress/sql/tidrangescan.sql
@@ -1,9 +1,16 @@
 -- tests for tidrangescans
 
+SET enable_seqscan TO off;
 CREATE TABLE tidrangescan(id integer, data text);
 
-INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,1000) AS s(i);
-DELETE FROM tidrangescan WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer >= 10;;
+-- insert enough tuples to fill at least two pages
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,200) AS s(i);
+
+-- remove all tuples after the 10th tuple on each page.  Trying to ensure
+-- we get the same layout with all CPU architectures and smaller than standard
+-- page sizes.
+DELETE FROM tidrangescan
+WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer > 2;
 VACUUM tidrangescan;
 
 -- range scans with upper bound
@@ -21,16 +28,16 @@ SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
 
 -- range scans with lower bound
 EXPLAIN (COSTS OFF)
-SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
-SELECT ctid FROM tidrangescan WHERE ctid > '(9,8)';
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
 
 EXPLAIN (COSTS OFF)
-SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
-SELECT ctid FROM tidrangescan WHERE '(9,8)' < ctid;
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
 
 EXPLAIN (COSTS OFF)
-SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
-SELECT ctid FROM tidrangescan WHERE ctid >= '(9,8)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
 
 EXPLAIN (COSTS OFF)
 SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
@@ -38,12 +45,12 @@ SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
 
 -- range scans with both bounds
 EXPLAIN (COSTS OFF)
-SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
-SELECT ctid FROM tidrangescan WHERE ctid > '(4,4)' AND '(4,7)' >= ctid;
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
 
 EXPLAIN (COSTS OFF)
-SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
-SELECT ctid FROM tidrangescan WHERE '(4,7)' >= ctid AND ctid > '(4,4)';
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
 
 -- extreme offsets
 SELECT ctid FROM tidrangescan where ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
@@ -72,3 +79,5 @@ COMMIT;
 
 DROP TABLE tidrangescan;
 DROP TABLE tidrangescan_empty;
+
+RESET enable_seqscan;

v9_tid_range_scans.patchapplication/octet-stream; name=v9_tid_range_scans.patchDownload

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 09bc6fe98a..1751ca6b35 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2499,6 +2499,7 @@ static const TableAmRoutine heapam_methods = {
 	.scan_begin = heap_beginscan,
 	.scan_end = heap_endscan,
 	.scan_rescan = heap_rescan,
+	.scan_setlimits = heap_setscanlimits,
 	.scan_getnextslot = heap_getnextslot,
 
 	.parallelscan_estimate = table_block_parallelscan_estimate,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 62fb3434a3..460c1b095c 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1009,6 +1009,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1155,6 +1156,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_TidScan:
 			pname = sname = "Tid Scan";
 			break;
+		case T_TidRangeScan:
+			pname = sname = "Tid Range Scan";
+			break;
 		case T_SubqueryScan:
 			pname = sname = "Subquery Scan";
 			break;
@@ -1346,6 +1350,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SampleScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1767,6 +1772,23 @@ ExplainNode(PlanState *planstate, List *ancestors,
 											   planstate, es);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				/*
+				 * The tidrangequals list has AND semantics, so be sure to
+				 * show it as an AND condition.
+				 */
+				List	   *tidquals = ((TidRangeScan *) plan)->tidrangequals;
+
+				if (list_length(tidquals) > 1)
+					tidquals = list_make1(make_andclause(tidquals));
+				show_scan_qual(tidquals, "TID Cond", planstate, ancestors, es);
+				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+				if (plan->qual)
+					show_instrumentation_count("Rows Removed by Filter", 1,
+											   planstate, es);
+			}
+			break;
 		case T_ForeignScan:
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
@@ -3054,6 +3076,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index cc09895fa5..ea053209a1 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -27,8 +27,8 @@ OBJS = execAmi.o execCurrent.o execExpr.o execExprInterp.o \
        nodeSamplescan.o nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
        nodeValuesscan.o \
        nodeCtescan.o nodeNamedtuplestorescan.o nodeWorktablescan.o \
-       nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
-       nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o tqueue.o spi.o \
-       nodeTableFuncscan.o
+       nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidrangescan.o \
+       nodeTidscan.o nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o \
+       tqueue.o spi.o nodeTableFuncscan.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 1f18e5d3a2..58e4d8555a 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -51,6 +51,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -198,6 +199,10 @@ ExecReScan(PlanState *node)
 			ExecReScanTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecReScanTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecReScanSubqueryScan((SubqueryScanState *) node);
 			break;
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index c227282975..23561c9ba5 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -108,6 +108,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -238,6 +239,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 												   estate, eflags);
 			break;
 
+		case T_TidRangeScan:
+			result = (PlanState *) ExecInitTidRangeScan((TidRangeScan *) node,
+														estate, eflags);
+			break;
+
 		case T_SubqueryScan:
 			result = (PlanState *) ExecInitSubqueryScan((SubqueryScan *) node,
 														estate, eflags);
@@ -632,6 +638,10 @@ ExecEndNode(PlanState *node)
 			ExecEndTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecEndTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecEndSubqueryScan((SubqueryScanState *) node);
 			break;
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
new file mode 100644
index 0000000000..8a72f52074
--- /dev/null
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -0,0 +1,580 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.c
+ *	  Routines to support tid range scans of relations
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeTidrangescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "access/tableam.h"
+#include "catalog/pg_operator.h"
+#include "executor/execdebug.h"
+#include "executor/nodeTidrangescan.h"
+#include "nodes/nodeFuncs.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+
+
+#define IsCTIDVar(node)  \
+	((node) != NULL && \
+	 IsA((node), Var) && \
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
+	 ((Var *) (node))->varlevelsup == 0)
+
+typedef enum
+{
+	TIDEXPR_UPPER_BOUND,
+	TIDEXPR_LOWER_BOUND
+} TidExprType;
+
+/* Upper or lower range bound for scan */
+typedef struct TidOpExpr
+{
+	TidExprType exprtype;		/* type of op */
+	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
+	bool		inclusive;		/* whether op is inclusive */
+} TidOpExpr;
+
+/*
+ * For the given 'expr', build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.
+ */
+static TidOpExpr *
+MakeTidOpExpr(OpExpr *expr, TidRangeScanState *tidstate)
+{
+	Node	   *arg1 = get_leftop((Expr *) expr);
+	Node	   *arg2 = get_rightop((Expr *) expr);
+	ExprState  *exprstate = NULL;
+	bool		invert = false;
+	TidOpExpr  *tidopexpr;
+
+	if (IsCTIDVar(arg1))
+		exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
+	else if (IsCTIDVar(arg2))
+	{
+		exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
+		invert = true;
+	}
+	else
+		elog(ERROR, "could not identify CTID variable");
+
+	tidopexpr = (TidOpExpr *) palloc0(sizeof(TidOpExpr));
+
+	switch (expr->opno)
+	{
+		case TIDLessEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDLessOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
+			break;
+		case TIDGreaterEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDGreaterOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
+			break;
+		default:
+			elog(ERROR, "could not identify CTID operator");
+	}
+
+	tidopexpr->exprstate = exprstate;
+
+	return tidopexpr;
+}
+
+/*
+ * Extract the qual subexpressions that yield TIDs to search for,
+ * and compile them into ExprStates if they're ordinary expressions.
+ */
+static void
+TidExprListCreate(TidRangeScanState *tidrangestate)
+{
+	TidRangeScan *node = (TidRangeScan *) tidrangestate->ss.ps.plan;
+	List	   *tidexprs = NIL;
+	ListCell   *l;
+
+	foreach(l, node->tidrangequals)
+	{
+		OpExpr	   *opexpr = lfirst(l);
+		TidOpExpr  *tidopexpr;
+
+		if (!IsA(opexpr, OpExpr))
+			elog(ERROR, "could not identify CTID expression");
+
+		tidopexpr = MakeTidOpExpr(opexpr, tidrangestate);
+		tidexprs = lappend(tidexprs, tidopexpr);
+	}
+
+	tidrangestate->trss_tidexprs = tidexprs;
+}
+
+/*
+ * Set 'lowerBound' based on 'tid'.  If 'inclusive' is false then the
+ * lowerBound is incremented to the next tid value so that it becomes
+ * inclusive.  If there is no valid next tid value then we return false,
+ * otherwise we return true.
+ */
+static bool
+SetTidLowerBound(ItemPointer tid, bool inclusive, ItemPointer lowerBound)
+{
+	OffsetNumber offset;
+
+	*lowerBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	if (!inclusive)
+	{
+		/* Check if the lower bound is actually in the next block. */
+		if (offset >= MaxOffsetNumber)
+		{
+			BlockNumber block = ItemPointerGetBlockNumberNoCheck(lowerBound);
+
+			/*
+			 * If the lower bound was already at or above the maximum block
+			 * number, then there is no valid value for it be set to.
+			 */
+			if (block >= MaxBlockNumber)
+				return false;
+
+			/* Set the lowerBound to the first offset in the next block */
+			ItemPointerSet(lowerBound, block + 1, 1);
+		}
+		else
+			ItemPointerSetOffsetNumber(lowerBound, OffsetNumberNext(offset));
+	}
+	else if (offset == 0)
+		ItemPointerSetOffsetNumber(lowerBound, 1);
+
+	return true;
+}
+
+/*
+ * Set 'upperBound' based on 'tid'.  If 'inclusive' is false then the
+ * upperBound is decremented to the previous tid value so that it becomes
+ * inclusive.  If there is no valid previous tid value then we return false,
+ * otherwise we return true.
+ */
+static bool
+SetTidUpperBound(ItemPointer tid, bool inclusive, ItemPointer upperBound)
+{
+	OffsetNumber offset;
+
+	*upperBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	/*
+	 * Since TID offsets start at 1, an inclusive upper bound with offset 0
+	 * can be treated as an exclusive bound.  This has the benefit of
+	 * eliminating that block from the scan range.
+	 */
+	if (inclusive && offset == 0)
+		inclusive = false;
+
+	if (!inclusive)
+	{
+		/* Check if the upper bound is actually in the previous block. */
+		if (offset == 0)
+		{
+			BlockNumber block = ItemPointerGetBlockNumberNoCheck(upperBound);
+
+			/*
+			 * If the upper bound was already in block 0, then there is no
+			 * valid value for it to be set to.
+			 */
+			if (block == 0)
+				return false;
+
+			ItemPointerSet(upperBound, block - 1, MaxOffsetNumber);
+		}
+		else
+			ItemPointerSetOffsetNumber(upperBound, OffsetNumberPrev(offset));
+	}
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeEval
+ *
+ *		Compute and set node's block and offset range to scan by evaluating
+ *		the trss_tidexprs.  If we detect an invalid range that cannot yield
+ *		any rows, the range is left unset.
+ * ----------------------------------------------------------------
+ */
+static void
+TidRangeEval(TidRangeScanState *node)
+{
+	ExprContext *econtext = node->ss.ps.ps_ExprContext;
+	BlockNumber nblocks;
+	ItemPointerData lowerBound;
+	ItemPointerData upperBound;
+	ListCell   *l;
+
+	/*
+	 * We silently discard any TIDs that are out of range at the time of scan
+	 * start.  (Since we hold at least AccessShareLock on the table, it won't
+	 * be possible for someone to truncate away the blocks we intend to
+	 * visit.)
+	 */
+	nblocks = RelationGetNumberOfBlocks(node->ss.ss_currentRelation);
+
+	/* The biggest range on an empty table is empty; just skip it. */
+	if (nblocks == 0)
+		return;
+
+	/* Set the lower and upper bound to scan the whole table. */
+	ItemPointerSet(&lowerBound, 0, 1);
+	ItemPointerSet(&upperBound, nblocks - 1, MaxOffsetNumber);
+
+	foreach(l, node->trss_tidexprs)
+	{
+		TidOpExpr  *tidopexpr = (TidOpExpr *) lfirst(l);
+		ItemPointer itemptr;
+		bool		isNull;
+
+		/* Evaluate this bound. */
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidopexpr->exprstate,
+													  econtext,
+													  &isNull));
+
+		/* If the bound is NULL, *nothing* matches the qual. */
+		if (isNull)
+			return;
+
+		if (tidopexpr->exprtype == TIDEXPR_LOWER_BOUND)
+		{
+			ItemPointerData lb;
+
+			/*
+			 * If the lower bound is beyond the maximum value for ctid, then
+			 * just bail without setting the range.  No rows can match.
+			 */
+			if (!SetTidLowerBound(itemptr, tidopexpr->inclusive, &lb))
+				return;
+
+			if (ItemPointerCompare(&lb, &lowerBound) > 0)
+				lowerBound = lb;
+		}
+
+		if (tidopexpr->exprtype == TIDEXPR_UPPER_BOUND)
+		{
+			ItemPointerData ub;
+
+			/*
+			 * If the upper bound is below the minimum value for ctid, then
+			 * just bail without setting the range.  No rows can match.
+			 */
+			if (!SetTidUpperBound(itemptr, tidopexpr->inclusive, &ub))
+				return;
+
+			if (ItemPointerCompare(&ub, &upperBound) < 0)
+				upperBound = ub;
+		}
+	}
+
+	/* If the resulting range is not empty, set it. */
+	if (ItemPointerCompare(&lowerBound, &upperBound) <= 0)
+	{
+		node->trss_startBlock = ItemPointerGetBlockNumberNoCheck(&lowerBound);
+		node->trss_endBlock = ItemPointerGetBlockNumberNoCheck(&upperBound);
+		node->trss_startOffset = ItemPointerGetOffsetNumberNoCheck(&lowerBound);
+		node->trss_endOffset = ItemPointerGetOffsetNumberNoCheck(&upperBound);
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		NextInTidRange
+ *
+ *		Fetch the next tuple when scanning a range of TIDs.
+ *
+ *		Since the table access method may return tuples that are in the scan
+ *		limit, but not within the required TID range, this function will
+ *		check for such tuples and skip over them.
+ * ----------------------------------------------------------------
+ */
+static bool
+NextInTidRange(TidRangeScanState *node, TableScanDesc scandesc,
+			   TupleTableSlot *slot)
+{
+	for (;;)
+	{
+		BlockNumber block;
+		OffsetNumber offset;
+
+		if (!table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+			return false;
+
+		/* Check that the tuple is within the required range. */
+		block = ItemPointerGetBlockNumber(&slot->tts_tid);
+		offset = ItemPointerGetOffsetNumber(&slot->tts_tid);
+
+		/* The tuple should never come from outside the scan limits. */
+		Assert(block >= node->trss_startBlock &&
+			   block <= node->trss_endBlock);
+
+		/*
+		 * If the tuple is in the first block of the range and before the
+		 * first requested offset, then we can skip it.
+		 */
+		if (block == node->trss_startBlock && offset < node->trss_startOffset)
+		{
+			ExecClearTuple(slot);
+			continue;
+		}
+
+		/*
+		 * Similarly, if the tuple is in the last block and after the last
+		 * requested offset, we can end the scan.
+		 */
+		if (block == node->trss_endBlock && offset > node->trss_endOffset)
+		{
+			ExecClearTuple(slot);
+			return false;
+		}
+
+		return true;
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeNext
+ *
+ *		Retrieve a tuple from the TidRangeScan node's currentRelation
+ *		using the tids in the TidRangeScanState information.
+ *
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+TidRangeNext(TidRangeScanState *node)
+{
+	TableScanDesc scandesc;
+	EState	   *estate;
+	TupleTableSlot *slot;
+	bool		foundTuple;
+
+	/*
+	 * extract necessary information from tid scan node
+	 */
+	scandesc = node->ss.ss_currentScanDesc;
+	estate = node->ss.ps.state;
+	slot = node->ss.ss_ScanTupleSlot;
+
+	Assert(ScanDirectionIsForward(estate->es_direction));
+
+	if (!node->trss_inScan)
+	{
+		BlockNumber blocks_to_scan;
+
+		/* First time through, compute the list of TID ranges to be visited */
+		if (node->trss_startBlock == InvalidBlockNumber)
+			TidRangeEval(node);
+
+		if (scandesc == NULL)
+		{
+			scandesc = table_beginscan_strat(node->ss.ss_currentRelation,
+											 estate->es_snapshot,
+											 0, NULL,
+											 false, false);
+			node->ss.ss_currentScanDesc = scandesc;
+		}
+
+		/* Compute the number of blocks to scan and set the scan limits. */
+		if (node->trss_startBlock == InvalidBlockNumber)
+		{
+			/* If the range is empty, set the scan limits to zero blocks. */
+			node->trss_startBlock = 0;
+			blocks_to_scan = 0;
+		}
+		else
+			blocks_to_scan = node->trss_endBlock - node->trss_startBlock + 1;
+
+		table_scan_setlimits(scandesc, node->trss_startBlock, blocks_to_scan);
+		node->trss_inScan = true;
+	}
+
+	/* Fetch the next tuple. */
+	foundTuple = NextInTidRange(node, scandesc, slot);
+
+	/*
+	 * If we've exhausted all the tuples in the range, reset the inScan flag.
+	 * This will cause the heap to be rescanned for any subsequent fetches,
+	 * which is important for some cursor operations: for instance, FETCH LAST
+	 * fetches all the tuples in order and then fetches one tuple in reverse.
+	 */
+	if (!foundTuple)
+		node->trss_inScan = false;
+
+	return slot;
+}
+
+/*
+ * TidRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot)
+{
+	/*
+	 * XXX shouldn't we check here to make sure tuple is in TID range? In
+	 * runtime-key case this is not certain, is it?
+	 */
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecTidRangeScan(node)
+ *
+ *		Scans the relation using tids and returns the next qualifying tuple.
+ *		We call the ExecScan() routine and pass it the appropriate
+ *		access method functions.
+ *
+ *		Conditions:
+ *		  -- the "cursor" maintained by the AMI is positioned at the tuple
+ *			 returned previously.
+ *
+ *		Initial States:
+ *		  -- the relation indicated is opened for scanning so that the
+ *			 "cursor" is positioned before the first qualifying tuple.
+ *		  -- trss_startBlock is InvalidBlockNumber
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecTidRangeScan(PlanState *pstate)
+{
+	TidRangeScanState *node = castNode(TidRangeScanState, pstate);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) TidRangeNext,
+					(ExecScanRecheckMtd) TidRangeRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanTidRangeScan(node)
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanTidRangeScan(TidRangeScanState *node)
+{
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+
+	if (scan != NULL)
+		table_rescan(scan, NULL);
+
+	/* mark scan as not in progress, and tid range list as not computed yet */
+	node->trss_inScan = false;
+	node->trss_startBlock = InvalidBlockNumber;
+
+	ExecScanReScan(&node->ss);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecEndTidRangeScan
+ *
+ *		Releases any storage allocated through C routines.
+ *		Returns nothing.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndTidRangeScan(TidRangeScanState *node)
+{
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+
+	if (scan != NULL)
+		table_endscan(scan);
+
+	/*
+	 * Free the exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitTidRangeScan
+ *
+ *		Initializes the tid range scan's state information, creates
+ *		scan keys, and opens the base and tid relations.
+ *
+ *		Parameters:
+ *		  node: TidRangeScan node produced by the planner.
+ *		  estate: the execution state initialized in InitPlan.
+ * ----------------------------------------------------------------
+ */
+TidRangeScanState *
+ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
+{
+	TidRangeScanState *tidrangestate;
+	Relation	currentRelation;
+
+	/*
+	 * create state structure
+	 */
+	tidrangestate = makeNode(TidRangeScanState);
+	tidrangestate->ss.ps.plan = (Plan *) node;
+	tidrangestate->ss.ps.state = estate;
+	tidrangestate->ss.ps.ExecProcNode = ExecTidRangeScan;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &tidrangestate->ss.ps);
+
+	/*
+	 * mark scan as not in progress, and tid range as not computed yet
+	 */
+	tidrangestate->trss_inScan = false;
+	tidrangestate->trss_startBlock = InvalidBlockNumber;
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	tidrangestate->ss.ss_currentRelation = currentRelation;
+	tidrangestate->ss.ss_currentScanDesc = NULL;	/* no table scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &tidrangestate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTypeTL(&tidrangestate->ss.ps);
+	ExecAssignScanProjectionInfo(&tidrangestate->ss);
+
+	/*
+	 * initialize child expressions
+	 */
+	tidrangestate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) tidrangestate);
+
+	TidExprListCreate(tidrangestate);
+
+	/*
+	 * all done.
+	 */
+	return tidrangestate;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 6414aded0e..132a22d9d7 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -585,6 +585,27 @@ _copyTidScan(const TidScan *from)
 	return newnode;
 }
 
+/*
+ * _copyTidRangeScan
+ */
+static TidRangeScan *
+_copyTidRangeScan(const TidRangeScan *from)
+{
+	TidRangeScan *newnode = makeNode(TidRangeScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_NODE_FIELD(tidrangequals);
+
+	return newnode;
+}
+
 /*
  * _copySubqueryScan
  */
@@ -4813,6 +4834,9 @@ copyObjectImpl(const void *from)
 		case T_TidScan:
 			retval = _copyTidScan(from);
 			break;
+		case T_TidRangeScan:
+			retval = _copyTidRangeScan(from);
+			break;
 		case T_SubqueryScan:
 			retval = _copySubqueryScan(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 8e31fae47f..33ddf94a6f 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -608,6 +608,16 @@ _outTidScan(StringInfo str, const TidScan *node)
 	WRITE_NODE_FIELD(tidquals);
 }
 
+static void
+_outTidRangeScan(StringInfo str, const TidRangeScan *node)
+{
+	WRITE_NODE_TYPE("TIDRANGESCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_NODE_FIELD(tidrangequals);
+}
+
 static void
 _outSubqueryScan(StringInfo str, const SubqueryScan *node)
 {
@@ -3701,6 +3711,9 @@ outNode(StringInfo str, const void *obj)
 			case T_TidScan:
 				_outTidScan(str, obj);
 				break;
+			case T_TidRangeScan:
+				_outTidRangeScan(str, obj);
+				break;
 			case T_SubqueryScan:
 				_outSubqueryScan(str, obj);
 				break;
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 89ce373d5e..6d2f7b80b0 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -374,6 +374,7 @@ RelOptInfo      - a relation or joined relations
   IndexPath     - index scan
   BitmapHeapPath - top of a bitmapped index scan
   TidPath       - scan by CTID
+  TidRangePath  - scan a contiguous range of CTIDs
   SubqueryScanPath - scan a subquery-in-FROM
   ForeignPath   - scan a foreign table, foreign join or foreign upper-relation
   CustomPath    - for custom scan providers
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 3a9a994733..616fe75749 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1271,6 +1271,101 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_tidrangescan
+ *	  Determines and sets the costs of scanning a relation using a range of
+ *	  TIDs for 'path'
+ *
+ * 'baserel' is the relation to be scanned
+ * 'tidrangequals' is the list of TID-checkable range quals
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_tidrangescan(Path *path, PlannerInfo *root,
+				  RelOptInfo *baserel, List *tidrangequals,
+				  ParamPathInfo *param_info)
+{
+	Selectivity selectivity;
+	double		pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	QualCost	tid_qual_cost;
+	double		ntuples;
+	double		nseqpages;
+	double		spc_random_page_cost;
+	double		spc_seq_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (param_info)
+		path->rows = param_info->ppi_rows;
+	else
+		path->rows = baserel->rows;
+
+	/* Count how many tuples and pages we expect to scan */
+	selectivity = clauselist_selectivity(root, tidrangequals, baserel->relid,
+										 JOIN_INNER, NULL);
+	pages = ceil(selectivity * baserel->pages);
+
+	if (pages <= 0.0)
+		pages = 1.0;
+
+	/*
+	 * The first page in a range requires a random seek, but each subsequent
+	 * page is just a normal sequential page read. NOTE: it's desirable for
+	 * Tid Range Scans to cost more than the equivalent Sequential Scans,
+	 * because Seq Scans have some performance advantages such as scan
+	 * synchronization and parallelizability, and we'd prefer one of them to
+	 * be picked unless a Tid Range Scan really is better.
+	 */
+	ntuples = selectivity * baserel->tuples;
+	nseqpages = pages - 1.0;
+
+	if (!enable_tidscan)
+		startup_cost += disable_cost;
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrieved tuple.
+	 */
+	cost_qual_eval(&tid_qual_cost, tidrangequals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/* disk costs; 1 random page and the remainder as seq pages */
+	run_cost += spc_random_page_cost + spc_seq_page_cost * nseqpages;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+	/*
+	 * XXX currently we assume TID quals are a subset of qpquals at this
+	 * point; they will be removed (if possible) when we create the plan, so
+	 * we subtract their cost from the total qpqual cost.  (If the TID quals
+	 * can't be removed, this is a mistake and we're going to underestimate
+	 * the CPU cost a bit.)
+	 */
+	startup_cost += qpqual_cost.startup + tid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		tid_qual_cost.per_tuple;
+	run_cost += cpu_per_tuple * ntuples;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->pathtarget->cost.startup;
+	run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+	path->startup_cost = startup_cost;
+	path->total_cost = startup_cost + run_cost;
+}
+
 /*
  * cost_subqueryscan
  *	  Determines and returns the cost of scanning a subquery RTE.
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 466e996011..2e8535fa14 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -2,9 +2,9 @@
  *
  * tidpath.c
  *	  Routines to determine which TID conditions are usable for scanning
- *	  a given relation, and create TidPaths accordingly.
+ *	  a given relation, and create TidPaths and TidRangePaths accordingly.
  *
- * What we are looking for here is WHERE conditions of the form
+ * For TidPaths, we look for WHERE conditions of the form
  * "CTID = pseudoconstant", which can be implemented by just fetching
  * the tuple directly via heap_fetch().  We can also handle OR'd conditions
  * such as (CTID = const1) OR (CTID = const2), as well as ScalarArrayOpExpr
@@ -23,6 +23,9 @@
  * a function, but in practice it works better to keep the special node
  * representation all the way through to execution.
  *
+ * Additionally, TidRangePaths may be created for conditions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=, and
+ * AND-clauses composed of such conditions.
  *
  * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -63,14 +66,14 @@ IsCTIDVar(Var *var, RelOptInfo *rel)
 
 /*
  * Check to see if a RestrictInfo is of the form
- *		CTID = pseudoconstant
+ *		CTID OP pseudoconstant
  * or
- *		pseudoconstant = CTID
- * where the CTID Var belongs to relation "rel", and nothing on the
- * other side of the clause does.
+ *		pseudoconstant OP CTID
+ * where OP is a binary operation, the CTID Var belongs to relation "rel",
+ * and nothing on the other side of the clause does.
  */
 static bool
-IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+IsBinaryTidClause(RestrictInfo *rinfo, RelOptInfo *rel)
 {
 	OpExpr	   *node;
 	Node	   *arg1,
@@ -83,10 +86,9 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 		return false;
 	node = (OpExpr *) rinfo->clause;
 
-	/* Operator must be tideq */
-	if (node->opno != TIDEqualOperator)
+	/* OpExpr must have two arguments */
+	if (list_length(node->args) != 2)
 		return false;
-	Assert(list_length(node->args) == 2);
 	arg1 = linitial(node->args);
 	arg2 = lsecond(node->args);
 
@@ -116,6 +118,50 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 	return true;				/* success */
 }
 
+/*
+ * Check to see if a RestrictInfo is of the form
+ *		CTID = pseudoconstant
+ * or
+ *		pseudoconstant = CTID
+ * where the CTID Var belongs to relation "rel", and nothing on the
+ * other side of the clause does.
+ */
+static bool
+IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	if (!IsBinaryTidClause(rinfo, rel))
+		return false;
+
+	if (((OpExpr *) rinfo->clause)->opno == TIDEqualOperator)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check to see if a RestrictInfo is of the form
+ *		CTID OP pseudoconstant
+ * or
+ *		pseudoconstant OP CTID
+ * where OP is a range operator such as <, <=, >, or >=, the CTID Var belongs
+ * to relation "rel", and nothing on the other side of the clause does.
+ */
+static bool
+IsTidRangeClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	Oid			opno;
+
+	if (!IsBinaryTidClause(rinfo, rel))
+		return false;
+	opno = ((OpExpr *) rinfo->clause)->opno;
+
+	if (opno == TIDLessOperator || opno == TIDLessEqOperator ||
+		opno == TIDGreaterOperator || opno == TIDGreaterEqOperator)
+		return true;
+
+	return false;
+}
+
 /*
  * Check to see if a RestrictInfo is of the form
  *		CTID = ANY (pseudoconstant_array)
@@ -222,7 +268,7 @@ TidQualFromRestrictInfo(RestrictInfo *rinfo, RelOptInfo *rel)
  *
  * Returns a List of CTID qual RestrictInfos for the specified rel (with
  * implicit OR semantics across the list), or NIL if there are no usable
- * conditions.
+ * equality conditions.
  *
  * This function is just concerned with handling AND/OR recursion.
  */
@@ -301,6 +347,33 @@ TidQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
 	return rlst;
 }
 
+/*
+ * Extract a set of CTID range conditions from implicit-AND List of RestrictInfos
+ *
+ * Returns a List of CTID range qual RestrictInfos for the specified rel
+ * (with implicit AND semantics across the list), or NIL if there are no
+ * usable range conditions.
+ */
+static List *
+TidRangeQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
+{
+	List	   *rlst = NIL;
+	ListCell   *l;
+
+	if (!rel->has_scan_setlimits)
+		return NIL;
+
+	foreach(l, rlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (IsTidRangeClause(rinfo, rel))
+			rlst = lappend(rlst, rinfo);
+	}
+
+	return rlst;
+}
+
 /*
  * Given a list of join clauses involving our rel, create a parameterized
  * TidPath for each one that is a suitable TidEqual clause.
@@ -385,6 +458,7 @@ void
 create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 {
 	List	   *tidquals;
+	List	   *tidrangequals;
 
 	/*
 	 * If any suitable quals exist in the rel's baserestrict list, generate a
@@ -404,6 +478,26 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 												   required_outer));
 	}
 
+	/*
+	 * If there are range quals in the baserestrict list, generate a
+	 * TidRangePath.
+	 */
+	tidrangequals = TidRangeQualFromRestrictInfoList(rel->baserestrictinfo,
+													 rel);
+
+	if (tidrangequals)
+	{
+		/*
+		 * This path uses no join clauses, but it could still have required
+		 * parameterization due to LATERAL refs in its tlist.
+		 */
+		Relids		required_outer = rel->lateral_relids;
+
+		add_path(rel, (Path *) create_tidrangescan_path(root, rel,
+														tidrangequals,
+														required_outer));
+	}
+
 	/*
 	 * Try to generate parameterized TidPaths using equality clauses extracted
 	 * from EquivalenceClasses.  (This is important since simple "t1.ctid =
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c6b8553a08..104be4082d 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -127,6 +127,10 @@ static Plan *create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
 static void bitmap_subplan_mark_shared(Plan *plan);
 static TidScan *create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 									List *tlist, List *scan_clauses);
+static TidRangeScan *create_tidrangescan_plan(PlannerInfo *root,
+											  TidRangePath *best_path,
+											  List *tlist,
+											  List *scan_clauses);
 static SubqueryScan *create_subqueryscan_plan(PlannerInfo *root,
 											  SubqueryScanPath *best_path,
 											  List *tlist, List *scan_clauses);
@@ -191,6 +195,8 @@ static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 											Index scanrelid);
 static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
 							 List *tidquals);
+static TidRangeScan *make_tidrangescan(List *qptlist, List *qpqual,
+									   Index scanrelid, List *tidrangequals);
 static SubqueryScan *make_subqueryscan(List *qptlist,
 									   List *qpqual,
 									   Index scanrelid,
@@ -373,6 +379,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -663,6 +670,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												scan_clauses);
 			break;
 
+		case T_TidRangeScan:
+			plan = (Plan *) create_tidrangescan_plan(root,
+													 (TidRangePath *) best_path,
+													 tlist,
+													 scan_clauses);
+			break;
+
 		case T_SubqueryScan:
 			plan = (Plan *) create_subqueryscan_plan(root,
 													 (SubqueryScanPath *) best_path,
@@ -3355,6 +3369,71 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_tidrangescan_plan
+ *	 Returns a tidrangescan plan for the base relation scanned by 'best_path'
+ *	 with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static TidRangeScan *
+create_tidrangescan_plan(PlannerInfo *root, TidRangePath *best_path,
+						 List *tlist, List *scan_clauses)
+{
+	TidRangeScan *scan_plan;
+	Index		scan_relid = best_path->path.parent->relid;
+	List	   *tidrangequals = best_path->tidrangequals;
+
+	/* it should be a base rel... */
+	Assert(scan_relid > 0);
+	Assert(best_path->path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * The qpqual list must contain all restrictions not enforced by the
+	 * tidrangequals list.  tidrangequals has AND semantics, so we can simply
+	 * remove any qual that appears in it.
+	 */
+	{
+		List	   *qpqual = NIL;
+		ListCell   *l;
+
+		foreach(l, scan_clauses)
+		{
+			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+			if (rinfo->pseudoconstant)
+				continue;		/* we may drop pseudoconstants here */
+			if (list_member_ptr(tidrangequals, rinfo))
+				continue;		/* simple duplicate */
+			qpqual = lappend(qpqual, rinfo);
+		}
+		scan_clauses = qpqual;
+	}
+
+	/* Sort clauses into best execution order */
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+
+	/* Reduce RestrictInfo lists to bare expressions; ignore pseudoconstants */
+	tidrangequals = extract_actual_clauses(tidrangequals, false);
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/* Replace any outer-relation variables with nestloop params */
+	if (best_path->path.param_info)
+	{
+		tidrangequals = (List *)
+			replace_nestloop_params(root, (Node *) tidrangequals);
+		scan_clauses = (List *)
+			replace_nestloop_params(root, (Node *) scan_clauses);
+	}
+
+	scan_plan = make_tidrangescan(tlist,
+								  scan_clauses,
+								  scan_relid,
+								  tidrangequals);
+
+	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
+
+	return scan_plan;
+}
+
 /*
  * create_subqueryscan_plan
  *	 Returns a subqueryscan plan for the base relation scanned by 'best_path'
@@ -5257,6 +5336,25 @@ make_tidscan(List *qptlist,
 	return node;
 }
 
+static TidRangeScan *
+make_tidrangescan(List *qptlist,
+				  List *qpqual,
+				  Index scanrelid,
+				  List *tidrangequals)
+{
+	TidRangeScan *node = makeNode(TidRangeScan);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->tidrangequals = tidrangequals;
+
+	return node;
+}
+
 static SubqueryScan *
 make_subqueryscan(List *qptlist,
 				  List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index dc11f098e0..69a5e73fb7 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -547,6 +547,19 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 					fix_scan_list(root, splan->tidquals, rtoffset);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				TidRangeScan *splan = (TidRangeScan *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual, rtoffset);
+				splan->tidrangequals =
+					fix_scan_list(root, splan->tidrangequals, rtoffset);
+			}
+			break;
 		case T_SubqueryScan:
 			/* Needs special treatment, see comments below */
 			return set_subqueryscan_references(root,
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 48b62a55de..890e6de8d3 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2301,6 +2301,12 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_TidRangeScan:
+			finalize_primnode((Node *) ((TidRangeScan *) plan)->tidrangequals,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			break;
+
 		case T_SubqueryScan:
 			{
 				SubqueryScan *sscan = (SubqueryScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 0ac73984d2..5c7a3a04d0 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1172,6 +1172,35 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 	return pathnode;
 }
 
+/*
+ * create_tidscan_path
+ *	  Creates a path corresponding to a scan by a range of TIDs, returning
+ *	  the pathnode.
+ */
+TidRangePath *
+create_tidrangescan_path(PlannerInfo *root, RelOptInfo *rel,
+						 List *tidrangequals, Relids required_outer)
+{
+	TidRangePath *pathnode = makeNode(TidRangePath);
+
+	pathnode->path.pathtype = T_TidRangeScan;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_workers = 0;
+	pathnode->path.pathkeys = NIL;	/* always unordered */
+
+	pathnode->tidrangequals = tidrangequals;
+
+	cost_tidrangescan(&pathnode->path, root, rel, tidrangequals,
+					  pathnode->path.param_info);
+
+	return pathnode;
+}
+
 /*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 6ea625a148..77e4ba6726 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -444,6 +444,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	/* Collect info about relation's foreign keys, if relevant */
 	get_relation_foreign_keys(root, rel, relation, inhparent);
 
+	/* Collect info about functions implemented by the rel's table AM. */
+	rel->has_scan_setlimits = relation->rd_tableam &&
+							  relation->rd_tableam->scan_setlimits != NULL;
+
 	/*
 	 * Collect info about relation's partitioning scheme, if any. Only
 	 * inheritance parents may be partitioned.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 6054bd2b53..fd485fd306 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -234,6 +234,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->baserestrict_min_security = UINT_MAX;
 	rel->joininfo = NIL;
 	rel->has_eclass_joins = false;
+	rel->has_scan_setlimits = false;
 	rel->consider_partitionwise_join = false;	/* might get changed later */
 	rel->part_scheme = NULL;
 	rel->nparts = 0;
@@ -645,6 +646,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->baserestrict_min_security = UINT_MAX;
 	joinrel->joininfo = NIL;
 	joinrel->has_eclass_joins = false;
+	joinrel->has_scan_setlimits = false;
 	joinrel->consider_partitionwise_join = false;	/* might get changed later */
 	joinrel->top_parent_relids = NULL;
 	joinrel->part_scheme = NULL;
@@ -820,6 +822,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	joinrel->baserestrictcost.per_tuple = 0;
 	joinrel->joininfo = NIL;
 	joinrel->has_eclass_joins = false;
+	joinrel->has_scan_setlimits = false;
 	joinrel->consider_partitionwise_join = false;	/* might get changed later */
 	joinrel->top_parent_relids = NULL;
 	joinrel->part_scheme = NULL;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index c2b0481e7e..fa47119213 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -216,6 +216,15 @@ typedef struct TableAmRoutine
 								bool set_params, bool allow_strat,
 								bool allow_sync, bool allow_pagemode);
 
+	/*
+	 * Set the range of a scan.
+	 *
+	 * Optional callback: A table AM can implement this to enable TID range
+	 * scans.
+	 */
+	void		(*scan_setlimits) (TableScanDesc scan,
+								   BlockNumber startBlk, BlockNumber numBlks);
+
 	/*
 	 * Return next tuple from `scan`, store in slot.
 	 */
@@ -843,6 +852,16 @@ table_rescan(TableScanDesc scan,
 	scan->rs_rd->rd_tableam->scan_rescan(scan, key, false, false, false, false);
 }
 
+/*
+ * Set the range of a scan.
+ */
+static inline void
+table_scan_setlimits(TableScanDesc scan,
+					 BlockNumber startBlk, BlockNumber numBlks)
+{
+	scan->rs_rd->rd_tableam->scan_setlimits(scan, startBlk, numBlks);
+}
+
 /*
  * Restart a relation scan after changing params.
  *
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 96823cd59b..5a32361f7a 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -216,15 +216,15 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git a/src/include/executor/nodeTidrangescan.h b/src/include/executor/nodeTidrangescan.h
new file mode 100644
index 0000000000..f0bbcc6a04
--- /dev/null
+++ b/src/include/executor/nodeTidrangescan.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeTidrangescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODETIDRANGESCAN_H
+#define NODETIDRANGESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern TidRangeScanState *ExecInitTidRangeScan(TidRangeScan *node,
+											   EState *estate, int eflags);
+extern void ExecEndTidRangeScan(TidRangeScanState *node);
+extern void ExecReScanTidRangeScan(TidRangeScanState *node);
+
+#endif							/* NODETIDRANGESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 98bdcbcef5..6a1328481d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1574,6 +1574,29 @@ typedef struct TidScanState
 	HeapTupleData tss_htup;
 } TidScanState;
 
+/* ----------------
+ *	 TidRangeScanState information
+ *
+ *		trss_tidexprs		list of TidOpExpr structs (see nodeTidrangescan.c)
+ *		trss_startBlock		first block to scan
+ *		trss_endBlock		last block to scan (inclusive)
+ *		trss_startOffset	first offset in first block to scan or InvalidBlockNumber
+ *							when the range is not set
+ *		trss_endOffset		last offset in last block to scan (inclusive)
+ *		trss_inScan			is a scan currently in progress?
+ * ----------------
+ */
+typedef struct TidRangeScanState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	List	   *trss_tidexprs;
+	BlockNumber trss_startBlock;
+	BlockNumber trss_endBlock;
+	OffsetNumber trss_startOffset;
+	OffsetNumber trss_endOffset;
+	bool		trss_inScan;
+} TidRangeScanState;
+
 /* ----------------
  *	 SubqueryScanState information
  *
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 4e2fb39105..5a1cdce824 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -59,6 +59,7 @@ typedef enum NodeTag
 	T_BitmapIndexScan,
 	T_BitmapHeapScan,
 	T_TidScan,
+	T_TidRangeScan,
 	T_SubqueryScan,
 	T_FunctionScan,
 	T_ValuesScan,
@@ -115,6 +116,7 @@ typedef enum NodeTag
 	T_BitmapIndexScanState,
 	T_BitmapHeapScanState,
 	T_TidScanState,
+	T_TidRangeScanState,
 	T_SubqueryScanState,
 	T_FunctionScanState,
 	T_TableFuncScanState,
@@ -229,6 +231,7 @@ typedef enum NodeTag
 	T_BitmapAndPath,
 	T_BitmapOrPath,
 	T_TidPath,
+	T_TidRangePath,
 	T_SubqueryScanPath,
 	T_ForeignPath,
 	T_CustomPath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 441e64eca9..a0b44c6de8 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -703,6 +703,7 @@ typedef struct RelOptInfo
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
 	bool		has_eclass_joins;	/* T means joininfo is incomplete */
+	bool		has_scan_setlimits; /* Rel's table AM has scan_setlimits */
 
 	/* used by partitionwise joins: */
 	bool		consider_partitionwise_join;	/* consider partitionwise join
@@ -1285,6 +1286,18 @@ typedef struct TidPath
 	List	   *tidquals;		/* qual(s) involving CTID = something */
 } TidPath;
 
+/*
+ * TidRangePath represents a scan by a continguous range of TIDs
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ */
+typedef struct TidRangePath
+{
+	Path		path;
+	List	   *tidrangequals;
+} TidRangePath;
+
 /*
  * SubqueryScanPath represents a scan of an unflattened subquery-in-FROM
  *
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 70f8b8e22b..5b97ac5484 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -489,6 +489,19 @@ typedef struct TidScan
 	List	   *tidquals;		/* qual(s) involving CTID = something */
 } TidScan;
 
+/* ----------------
+ *		tid range scan node
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ * ----------------
+ */
+typedef struct TidRangeScan
+{
+	Scan		scan;
+	List	   *tidrangequals;	/* qual(s) involving CTID op something */
+} TidRangeScan;
+
 /* ----------------
  *		subquery scan node
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index b3d0b4f6fb..c352a7c1a7 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -82,6 +82,9 @@ extern void cost_bitmap_or_node(BitmapOrPath *path, PlannerInfo *root);
 extern void cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec);
 extern void cost_tidscan(Path *path, PlannerInfo *root,
 						 RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info);
+extern void cost_tidrangescan(Path *path, PlannerInfo *root,
+							  RelOptInfo *baserel, List *tidquals,
+							  ParamPathInfo *param_info);
 extern void cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
 							  RelOptInfo *baserel, ParamPathInfo *param_info);
 extern void cost_functionscan(Path *path, PlannerInfo *root,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 182ffeef4b..dff8fdd126 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,6 +63,10 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 										   List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 									List *tidquals, Relids required_outer);
+extern TidRangePath *create_tidrangescan_path(PlannerInfo *root,
+											  RelOptInfo *rel,
+											  List *tidrangequals,
+											  Relids required_outer);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 									  List *subpaths, List *partial_subpaths,
 									  List *pathkeys, Relids required_outer,
diff --git a/src/test/regress/expected/tidrangescan.out b/src/test/regress/expected/tidrangescan.out
new file mode 100644
index 0000000000..fc11894c8e
--- /dev/null
+++ b/src/test/regress/expected/tidrangescan.out
@@ -0,0 +1,245 @@
+-- tests for tidrangescans
+SET enable_seqscan TO off;
+CREATE TABLE tidrangescan(id integer, data text);
+-- insert enough tuples to fill at least two pages
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,200) AS s(i);
+-- remove all tuples after the 10th tuple on each page.  Trying to ensure
+-- we get the same layout with all CPU architectures and smaller than standard
+-- page sizes.
+DELETE FROM tidrangescan
+WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer > 2;
+VACUUM tidrangescan;
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid <= '(1,5)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+ (1,1)
+ (1,2)
+ (1,3)
+ (1,4)
+ (1,5)
+(15 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(0,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid > '(2,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+  ctid  
+--------
+ (2,9)
+ (2,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ('(2,8)'::tid < ctid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+  ctid  
+--------
+ (2,9)
+ (2,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(2,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+  ctid  
+--------
+ (2,8)
+ (2,9)
+ (2,10)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(100,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ((ctid > '(1,4)'::tid) AND ('(1,7)'::tid >= ctid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+ ctid  
+-------
+ (1,5)
+ (1,6)
+ (1,7)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (('(1,7)'::tid >= ctid) AND (ctid > '(1,4)'::tid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+ ctid  
+-------
+ (1,5)
+ (1,6)
+ (1,7)
+(3 rows)
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan where ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan where ctid < '(0,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+ ctid 
+------
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid > '(9,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+ ctid 
+------
+(0 rows)
+
+-- cursors
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH NEXT c;
+ ctid  
+-------
+ (0,2)
+(1 row)
+
+FETCH PRIOR c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH FIRST c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH LAST c;
+  ctid  
+--------
+ (0,10)
+(1 row)
+
+COMMIT;
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
+RESET enable_seqscan;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 8fb55f045e..86b42ce21d 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -78,7 +78,7 @@ test: brin gin gist spgist privileges init_privs security_label collate matview
 # ----------
 # Another group of parallel tests
 # ----------
-test: create_table_like alter_generic alter_operator misc async dbsize misc_functions sysviews tsrf tidscan
+test: create_table_like alter_generic alter_operator misc async dbsize misc_functions sysviews tsrf tidscan tidrangescan
 
 # rules cannot run concurrently with any test that creates
 # a view or rule in the public schema
diff --git a/src/test/regress/sql/tidrangescan.sql b/src/test/regress/sql/tidrangescan.sql
new file mode 100644
index 0000000000..d60439d56c
--- /dev/null
+++ b/src/test/regress/sql/tidrangescan.sql
@@ -0,0 +1,83 @@
+-- tests for tidrangescans
+
+SET enable_seqscan TO off;
+CREATE TABLE tidrangescan(id integer, data text);
+
+-- insert enough tuples to fill at least two pages
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,200) AS s(i);
+
+-- remove all tuples after the 10th tuple on each page.  Trying to ensure
+-- we get the same layout with all CPU architectures and smaller than standard
+-- page sizes.
+DELETE FROM tidrangescan
+WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer > 2;
+VACUUM tidrangescan;
+
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan where ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+SELECT ctid FROM tidrangescan where ctid < '(0,0)' LIMIT 1;
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+
+-- cursors
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+FETCH NEXT c;
+FETCH PRIOR c;
+FETCH FIRST c;
+FETCH LAST c;
+COMMIT;
+
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
+
+RESET enable_seqscan;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 432d2d812e..85796462ec 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2426,8 +2426,13 @@ TextPositionState
 TheLexeme
 TheSubstitute
 TidExpr
+TidExprType
 TidHashKey
+TidOpExpr
 TidPath
+TidRangePath
+TidRangeScan
+TidRangeScanState
 TidScan
 TidScanState
 TimeADT

#65

Edmund Horner

ejrh00@gmail.com

over 6 years ago

In reply to: David Rowley (#64)

Re: Tid scan improvements

Thanks for the edits and fixing that pretty glaring copy-paste bug.

Regarding enable_tidscan, I couldn't decide whether we really need it,
and erred on the side of not adding yet another setting.

The current patch only creates a tid range path if there's at least
one ctid qual. But during development of earlier patches I was a bit
concerned about the possibility of tid range scan being picked instead
of seq scan when the whole table is scanned, perhaps due to a tiny
discrepency in costing. Both scans will scan the whole table, but seq
scan is preferred since it can be parallellised, synchronised with
other scans, and has a bit less overhead with tuple checking. If a
future change creates tid range paths for more queries, for instance
for MIN/MAX(ctid) or ORDER BY ctid, then it might be more important to
have a separate setting for it.

Show quoted text

On Wed, 17 Jul 2019 at 23:11, David Rowley <david.rowley@2ndquadrant.com> wrote:

On Mon, 15 Jul 2019 at 17:54, Edmund Horner <ejrh00@gmail.com> wrote:

Summary of changes compared to last time:
- I've added the additional "scan_setlimits" table AM method. To
check whether it's implemented in the planner, I have added an
additional "has_scan_setlimits" flag to RelOptInfo. It seems to work.
- I've also changed nodeTidrangescan to not require anything from heapam now.
- To simply the patch and avoid changing heapam, I've removed the
backward scan support (which was needed for FETCH LAST/PRIOR) and made
ExecSupportsBackwardScan return false for this plan type.
- I've removed the vestigial passing of "direction" through
nodeTidrangescan. If my understanding is correct, the direction
passed to TidRangeNext will always be forward. But I did leave FETCH
LAST/PRIOR in the regression tests (after adding SCROLL to the
cursor).

I spent some time today hacking at this. I fixed a bug in how
has_scan_setlimits set, rewrite a few comments and simplified some of
the code.

When I mentioned up-thread about the optional scan_setlimits table AM
callback, I'd forgotten that you'd not have access to check that
directly during planning. As you mention above, you've added
RelOptInfo has_scan_setlimits so the planner knows if it can use TID
Range scans or not. It would be nice to not have to add this flag, but
that would require either:

1. Making scan_setlimits a non-optional callback function in table AM, or;
2. Allowing the planner to have access to the opened Relation.

#2 is not for this patch, but there has been some talk about it. It
was done for the executor last year in d73f4c74dd3.

I wonder if Andres has any thoughts on #1?

The other thing I was thinking about was if enable_tidscan should be
in charge of TID Range scans too. I see you have it that way, but
should we be adding enable_tidrangescan? The docs claim that
enable_tidscan: "Enables or disables the query planner's use of TID
scan plan types.". Note: "types" is plural. Maybe we could call that
fate and keep it the way the patch has it already. Does anyone have
another idea about that?

I've attached a delta of the changes I made and also a complete v9 patch.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#66

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: David Rowley (#64)

Re: Tid scan improvements

Hi,

On 2019-07-17 23:10:52 +1200, David Rowley wrote:

When I mentioned up-thread about the optional scan_setlimits table AM
callback, I'd forgotten that you'd not have access to check that
directly during planning. As you mention above, you've added
RelOptInfo has_scan_setlimits so the planner knows if it can use TID
Range scans or not. It would be nice to not have to add this flag, but
that would require either:

Is it really a problem to add that flag? We've obviously so far not
care about space in RelOptInfo, otherwise it'd have union members for
the per reloptinfo contents...

1. Making scan_setlimits a non-optional callback function in table AM, or;
2. Allowing the planner to have access to the opened Relation.

#2 is not for this patch, but there has been some talk about it. It
was done for the executor last year in d73f4c74dd3.

I wonder if Andres has any thoughts on #1?

I'm inclined to think that 1) isn't a good idea. I'd very much like to
avoid adding further dependencies on BlockNumber in non-optional APIs
(we ought to go the other way). Most of the current ones are at least
semi-reasonably implementable for most AMs (e.g. converting to PG
pagesize for relation_estimate_size isn't a problem), but it doesn't
seem to make sense to implement this for scan limits: Many AMs won't use
the BlockNumber/Offset split as heap does.

I think the AM part of this patch might be the wrong approach - it won't
do anything meaningful for an AM that doesn't directly map the ctid to a
specific location in a block (e.g. zedstore). To me it seems the
callback ought to be to get a range of tids, and the tidrange scan
shouldn't do anything but determine the range of tids the AM should
return.

- Andres

#67

David Rowley

david.rowley@2ndquadrant.com

over 6 years ago

In reply to: Andres Freund (#66)

Re: Tid scan improvements

On Thu, 18 Jul 2019 at 14:30, Andres Freund <andres@anarazel.de> wrote:

I think the AM part of this patch might be the wrong approach - it won't
do anything meaningful for an AM that doesn't directly map the ctid to a
specific location in a block (e.g. zedstore). To me it seems the
callback ought to be to get a range of tids, and the tidrange scan
shouldn't do anything but determine the range of tids the AM should
return.

Sounds like that's going to require adding some new fields to
HeapScanDescData, then some callback similar to heap_setscanlimits to
set those fields.

Then, we'd either need to:

1. Make the table_scan_getnextslot() implementations check the tuple
falls within the range, or
2. add another callback that pays attention to the set TID range.

The problem with #1 is that would add overhead to normal seqscans,
which seems like a bad idea.

Did you imagined two additional callbacks, 1 to set the TID range,
then one to scan it? Duplicating the logic in heapgettup_pagemode()
and heapgettup() looks pretty horrible, but I guess we could add a
wrapper around it that loops until it gets the first tuple and bails
once it scans beyond the final tuple.

Is that what you had in mind?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#68

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: David Rowley (#67)

Re: Tid scan improvements

Hi,

On 2019-07-19 13:54:59 +1200, David Rowley wrote:

On Thu, 18 Jul 2019 at 14:30, Andres Freund <andres@anarazel.de> wrote:

I think the AM part of this patch might be the wrong approach - it won't
do anything meaningful for an AM that doesn't directly map the ctid to a
specific location in a block (e.g. zedstore). To me it seems the
callback ought to be to get a range of tids, and the tidrange scan
shouldn't do anything but determine the range of tids the AM should
return.

Sounds like that's going to require adding some new fields to
HeapScanDescData, then some callback similar to heap_setscanlimits to
set those fields.

Then, we'd either need to:

1. Make the table_scan_getnextslot() implementations check the tuple
falls within the range, or
2. add another callback that pays attention to the set TID range.

The problem with #1 is that would add overhead to normal seqscans,
which seems like a bad idea.

Did you imagined two additional callbacks, 1 to set the TID range,
then one to scan it? Duplicating the logic in heapgettup_pagemode()
and heapgettup() looks pretty horrible, but I guess we could add a
wrapper around it that loops until it gets the first tuple and bails
once it scans beyond the final tuple.

Is that what you had in mind?

Yea, I was thinking of something like 2. We already have a few extra
types of scan nodes (bitmap heap and sample scan), it'd not be bad to
add another. And as you say, they can just share most of the guts: For
heap I'd just implement pagemode, and perhaps split heapgettup_pagemode
into two parts (one to do the page processing, the other to determine
the relevant page).

You say that we'd need new fields in HeapScanDescData - not so sure
about that, it seems feasible to just provide the boundaries in the
call? But I think it'd also just be fine to have the additional fields.

Greetings,

Andres Freund

#69

David Rowley

david.rowley@2ndquadrant.com

over 6 years ago

In reply to: Andres Freund (#68)

Re: Tid scan improvements

On Sat, 20 Jul 2019 at 06:21, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-07-19 13:54:59 +1200, David Rowley wrote:

Did you imagined two additional callbacks, 1 to set the TID range,
then one to scan it? Duplicating the logic in heapgettup_pagemode()
and heapgettup() looks pretty horrible, but I guess we could add a
wrapper around it that loops until it gets the first tuple and bails
once it scans beyond the final tuple.

Is that what you had in mind?

Yea, I was thinking of something like 2. We already have a few extra
types of scan nodes (bitmap heap and sample scan), it'd not be bad to
add another. And as you say, they can just share most of the guts: For
heap I'd just implement pagemode, and perhaps split heapgettup_pagemode
into two parts (one to do the page processing, the other to determine
the relevant page).

You say that we'd need new fields in HeapScanDescData - not so sure
about that, it seems feasible to just provide the boundaries in the
call? But I think it'd also just be fine to have the additional fields.

Thanks for explaining.

I've set the CF entry for the patch back to waiting on author.

I think if we get this part the way Andres would like it, then we're
pretty close.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#70

Edmund Horner

ejrh00@gmail.com

over 6 years ago

In reply to: David Rowley (#69)

Re: Tid scan improvements

On Mon, 22 Jul 2019 at 19:25, David Rowley <david.rowley@2ndquadrant.com>

On Sat, 20 Jul 2019 at 06:21, Andres Freund <andres@anarazel.de> wrote:

Yea, I was thinking of something like 2. We already have a few extra
types of scan nodes (bitmap heap and sample scan), it'd not be bad to
add another. And as you say, they can just share most of the guts: For
heap I'd just implement pagemode, and perhaps split heapgettup_pagemode
into two parts (one to do the page processing, the other to determine
the relevant page).

You say that we'd need new fields in HeapScanDescData - not so sure
about that, it seems feasible to just provide the boundaries in the
call? But I think it'd also just be fine to have the additional fields.

Thanks for explaining.

I've set the CF entry for the patch back to waiting on author.

I think if we get this part the way Andres would like it, then we're
pretty close.

I've moved the code in question into heapam, with:

* a new scan type SO_TYPE_TIDRANGE (renumbering some of the other
enums).

* a wrapper table_beginscan_tidrange(Relation rel, Snapshot snapshot);
I'm not sure whether we need scankeys and the other flags?

* a new optional callback scan_settidrange(TableScanDesc scan,
ItemPointer startTid, ItemPointer endTid) with wrapper
table_scan_settidrange.
I thought about combining it with table_beginscan_tidrange but we're not
really doing that with any of the other beginscan methods.

* another optional callback scan_getnexttidrangeslot. The presence of
these two callbacks indicates that TidRangeScan is supported for
this relation.

Let me know if you can think of better names.

However, the heap_getnexttidrangeslot function is just the same
iterative code calling heap_getnextslot and checking the tuples
against the tid range (two fields added to the ScanDesc).

I'll have to spend a bit of time looking at heapgettup_pagemode to
figure how to split it as Andres suggests.

Thanks,
Edmund

#71

Edmund Horner

ejrh00@gmail.com

over 6 years ago

In reply to: Edmund Horner (#70)

Re: Tid scan improvements

On Mon, 22 Jul 2019 at 19:44, Edmund Horner <ejrh00@gmail.com> wrote:

On Mon, 22 Jul 2019 at 19:25, David Rowley <david.rowley@2ndquadrant.com>

On Sat, 20 Jul 2019 at 06:21, Andres Freund <andres@anarazel.de> wrote:

Yea, I was thinking of something like 2. We already have a few extra
types of scan nodes (bitmap heap and sample scan), it'd not be bad to
add another. And as you say, they can just share most of the guts: For
heap I'd just implement pagemode, and perhaps split heapgettup_pagemode
into two parts (one to do the page processing, the other to determine
the relevant page).

You say that we'd need new fields in HeapScanDescData - not so sure
about that, it seems feasible to just provide the boundaries in the
call? But I think it'd also just be fine to have the additional fields.

Thanks for explaining.

I've set the CF entry for the patch back to waiting on author.

I think if we get this part the way Andres would like it, then we're
pretty close.

[...]

I'll have to spend a bit of time looking at heapgettup_pagemode to
figure how to split it as Andres suggests.

Hi everyone,

Sadly it is the end of the CF and I have not had much time to work on
this. I'll probably get some time in the next month to look at the
heapam changes.

Should we move to CF? We have been in the CF cycle for almost a year now...

Edmund

#72

Thomas Munro

thomas.munro@gmail.com

over 6 years ago

In reply to: Edmund Horner (#71)

Re: Tid scan improvements

On Thu, Aug 1, 2019 at 5:34 PM Edmund Horner <ejrh00@gmail.com> wrote:

Sadly it is the end of the CF and I have not had much time to work on
this. I'll probably get some time in the next month to look at the
heapam changes.

Should we move to CF? We have been in the CF cycle for almost a year now...

Hi Edmund,

No worries, that's how it goes sometimes. Please move it to the next
CF if you think you'll find some time before September. Don't worry
if it might have to be moved again. We want the feature, and good
things take time!

--
Thomas Munro
https://enterprisedb.com

#73

Edmund Horner

ejrh00@gmail.com

over 6 years ago

In reply to: Thomas Munro (#72)

Re: Tid scan improvements

On Thu, 1 Aug 2019 at 17:47, Thomas Munro <thomas.munro@gmail.com> wrote:

On Thu, Aug 1, 2019 at 5:34 PM Edmund Horner <ejrh00@gmail.com> wrote:

Should we move to CF? We have been in the CF cycle for almost a year now...

Hi Edmund,

No worries, that's how it goes sometimes. Please move it to the next
CF if you think you'll find some time before September. Don't worry
if it might have to be moved again. We want the feature, and good
things take time!

Ok thanks.

I tried moving it to the new commitfest, but it seems I cannot from
its current state.

If it's ok, I'll just leave it to you in 7 hours' time!

Edmund

#74

Thomas Munro

thomas.munro@gmail.com

over 6 years ago

In reply to: Edmund Horner (#73)

Re: Tid scan improvements

On Thu, Aug 1, 2019 at 5:51 PM Edmund Horner <ejrh00@gmail.com> wrote:

On Thu, 1 Aug 2019 at 17:47, Thomas Munro <thomas.munro@gmail.com> wrote:

On Thu, Aug 1, 2019 at 5:34 PM Edmund Horner <ejrh00@gmail.com> wrote:

Should we move to CF? We have been in the CF cycle for almost a year now...

Hi Edmund,

No worries, that's how it goes sometimes. Please move it to the next
CF if you think you'll find some time before September. Don't worry
if it might have to be moved again. We want the feature, and good
things take time!

Ok thanks.

I tried moving it to the new commitfest, but it seems I cannot from
its current state.

Done. You have to move it to "Needs review" first, and then move to
next. (Perhaps we should change that... I don't think that obstacle
achieves anything?)

--
Thomas Munro
https://enterprisedb.com

#75

David Rowley

david.rowley@2ndquadrant.com

over 6 years ago

In reply to: Thomas Munro (#74)

Re: Tid scan improvements

On Thu, 1 Aug 2019 at 17:57, Thomas Munro <thomas.munro@gmail.com> wrote:

On Thu, Aug 1, 2019 at 5:51 PM Edmund Horner <ejrh00@gmail.com> wrote:

I tried moving it to the new commitfest, but it seems I cannot from
its current state.

Done. You have to move it to "Needs review" first, and then move to
next. (Perhaps we should change that... I don't think that obstacle
achieves anything?)

I think it's there as a measure to try to trim down the number of
patches that are constantly bounced to the nest 'fest that are still
waiting on author. In my experience, it's a little annoying since if
you don't set it to "needs review" first, then it means closing the
patch and having to create a new CF entry when you're ready, with all
the history of the previous one lost.

It seems reasonable to me to keep the patch in the queue if the author
is still actively working on the patch and it seems pretty unfair if a
last-minute review came in just before the end of the CF and that
means their patch must be "returned with feedback" instead of pushed
to the next 'fest.

Perhaps there are other measures we could take to reduce the number of
patches getting kicked out to the next CF all the time. Maybe some
icons that appear if it's been waiting on author for more than 2
months, or if it went through an entire CF as waiting on author.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#76

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Edmund Horner (#71)

Re: Tid scan improvements

On 2019-Aug-01, Edmund Horner wrote:

Hi everyone,

Sadly it is the end of the CF and I have not had much time to work on
this. I'll probably get some time in the next month to look at the
heapam changes.

Should we move to CF? We have been in the CF cycle for almost a year now...

Hello, do we have an update on this patch? Last version that was posted
was v9 from David on July 17th; you said you had made some changes on
July 22nd but didn't attach any patch. v9 doesn't apply anymore.

Thanks

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#77

Edmund Horner

ejrh00@gmail.com

over 6 years ago

In reply to: Alvaro Herrera (#76)

Re: Tid scan improvements

On Wed, 4 Sep 2019 at 10:34, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Aug-01, Edmund Horner wrote:

Hi everyone,

Sadly it is the end of the CF and I have not had much time to work on
this. I'll probably get some time in the next month to look at the
heapam changes.

Should we move to CF? We have been in the CF cycle for almost a year now...

Hello, do we have an update on this patch? Last version that was posted
was v9 from David on July 17th; you said you had made some changes on
July 22nd but didn't attach any patch. v9 doesn't apply anymore.

Hi pgsql-hackers,

I have had a lot of difficulty making the changes to heapam.c and I
think it's becoming obvious I won't get them done by myself.

The last *working* patch pushed the work into heapam.c, but it was
just a naive loop over the whole table. I haven't found how to
rewrite heapgettup_pagemode in the way Andres suggests.

So, I think we need to either get some help from someone familiar with
heapam.c, or maybe shelve the patch. It has been work in progress for
over a year now.

Edmund

#78

Michael Paquier

michael@paquier.xyz

about 6 years ago

In reply to: Edmund Horner (#77)

Re: Tid scan improvements

On Thu, Sep 05, 2019 at 01:06:56PM +1200, Edmund Horner wrote:

So, I think we need to either get some help from someone familiar with
heapam.c, or maybe shelve the patch. It has been work in progress for
over a year now.

Okay, still nothing has happened after two months. Once this is
solved a new patch submission could be done. For now I have marked
the entry as returned with feedback.
--
Michael

#79

David Fetter

david@fetter.org

about 5 years ago

In reply to: Michael Paquier (#78)

1 attachment(s)

Re: Tid scan improvements

On Sun, Dec 01, 2019 at 11:34:16AM +0900, Michael Paquier wrote:

On Thu, Sep 05, 2019 at 01:06:56PM +1200, Edmund Horner wrote:

So, I think we need to either get some help from someone familiar with
heapam.c, or maybe shelve the patch. It has been work in progress for
over a year now.

Okay, still nothing has happened after two months. Once this is
solved a new patch submission could be done. For now I have marked
the entry as returned with feedback.

I dusted off the last version of the patch without implementing the
suggestions in this thread between here and there.

I think this is a capability worth having, as I was surprised when it
turned out that it didn't exist when I was looking to make an
improvement to pg_dump. My idea, which I'll get back to when this is
in, was to use special knowledge of heap AM tables to make it possible
to parallelize dumps of large tables by working separately on each
underlying file, of which there could be quite a few for a large one.

Will try to understand the suggestions upthread better and implement
same.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachments:

v10-0001-first-cut.patchtext/x-diff; charset=us-asciiDownload

From 96046239014de8a7dec62e2f60b5210deb1bd32a Mon Sep 17 00:00:00 2001
From: David Fetter <dfetter@appen.com>
Date: Thu, 31 Dec 2020 16:42:07 -0800
Subject: [PATCH v10] first cut
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.29.2"

This is a multi-part message in MIME format.
--------------2.29.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


 create mode 100644 src/include/executor/nodeTidrangescan.h
 create mode 100644 src/backend/executor/nodeTidrangescan.c
 create mode 100644 src/test/regress/expected/tidrangescan.out
 create mode 100644 src/test/regress/sql/tidrangescan.sql


--------------2.29.2
Content-Type: text/x-patch; name="v10-0001-first-cut.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v10-0001-first-cut.patch"

diff --git src/include/access/tableam.h src/include/access/tableam.h
index 387eb34a61..5776f8ba6e 100644
--- src/include/access/tableam.h
+++ src/include/access/tableam.h
@@ -218,6 +218,15 @@ typedef struct TableAmRoutine
 								bool set_params, bool allow_strat,
 								bool allow_sync, bool allow_pagemode);
 
+	/*
+	 * Set the range of a scan.
+	 *
+	 * Optional callback: A table AM can implement this to enable TID range
+	 * scans.
+	 */
+	void		(*scan_setlimits) (TableScanDesc scan,
+								   BlockNumber startBlk, BlockNumber numBlks);
+
 	/*
 	 * Return next tuple from `scan`, store in slot.
 	 */
@@ -875,6 +884,16 @@ table_rescan(TableScanDesc scan,
 	scan->rs_rd->rd_tableam->scan_rescan(scan, key, false, false, false, false);
 }
 
+/*
+ * Set the range of a scan.
+ */
+static inline void
+table_scan_setlimits(TableScanDesc scan,
+					 BlockNumber startBlk, BlockNumber numBlks)
+{
+	scan->rs_rd->rd_tableam->scan_setlimits(scan, startBlk, numBlks);
+}
+
 /*
  * Restart a relation scan after changing params.
  *
diff --git src/include/catalog/pg_operator.dat src/include/catalog/pg_operator.dat
index 9c6bf6c9d1..bb7193b9e7 100644
--- src/include/catalog/pg_operator.dat
+++ src/include/catalog/pg_operator.dat
@@ -237,15 +237,15 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git src/include/executor/nodeTidrangescan.h src/include/executor/nodeTidrangescan.h
new file mode 100644
index 0000000000..f0bbcc6a04
--- /dev/null
+++ src/include/executor/nodeTidrangescan.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeTidrangescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODETIDRANGESCAN_H
+#define NODETIDRANGESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern TidRangeScanState *ExecInitTidRangeScan(TidRangeScan *node,
+											   EState *estate, int eflags);
+extern void ExecEndTidRangeScan(TidRangeScanState *node);
+extern void ExecReScanTidRangeScan(TidRangeScanState *node);
+
+#endif							/* NODETIDRANGESCAN_H */
diff --git src/include/nodes/execnodes.h src/include/nodes/execnodes.h
index 61ba4c3666..ae58ea9eb6 100644
--- src/include/nodes/execnodes.h
+++ src/include/nodes/execnodes.h
@@ -1611,6 +1611,29 @@ typedef struct TidScanState
 	HeapTupleData tss_htup;
 } TidScanState;
 
+/* ----------------
+ *	 TidRangeScanState information
+ *
+ *		trss_tidexprs		list of TidOpExpr structs (see nodeTidrangescan.c)
+ *		trss_startBlock		first block to scan
+ *		trss_endBlock		last block to scan (inclusive)
+ *		trss_startOffset	first offset in first block to scan or InvalidBlockNumber
+ *							when the range is not set
+ *		trss_endOffset		last offset in last block to scan (inclusive)
+ *		trss_inScan			is a scan currently in progress?
+ * ----------------
+ */
+typedef struct TidRangeScanState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	List	   *trss_tidexprs;
+	BlockNumber trss_startBlock;
+	BlockNumber trss_endBlock;
+	OffsetNumber trss_startOffset;
+	OffsetNumber trss_endOffset;
+	bool		trss_inScan;
+} TidRangeScanState;
+
 /* ----------------
  *	 SubqueryScanState information
  *
diff --git src/include/nodes/nodes.h src/include/nodes/nodes.h
index 3684f87a88..46d8cddfee 100644
--- src/include/nodes/nodes.h
+++ src/include/nodes/nodes.h
@@ -59,6 +59,7 @@ typedef enum NodeTag
 	T_BitmapIndexScan,
 	T_BitmapHeapScan,
 	T_TidScan,
+	T_TidRangeScan,
 	T_SubqueryScan,
 	T_FunctionScan,
 	T_ValuesScan,
@@ -116,6 +117,7 @@ typedef enum NodeTag
 	T_BitmapIndexScanState,
 	T_BitmapHeapScanState,
 	T_TidScanState,
+	T_TidRangeScanState,
 	T_SubqueryScanState,
 	T_FunctionScanState,
 	T_TableFuncScanState,
@@ -229,6 +231,7 @@ typedef enum NodeTag
 	T_BitmapAndPath,
 	T_BitmapOrPath,
 	T_TidPath,
+	T_TidRangePath,
 	T_SubqueryScanPath,
 	T_ForeignPath,
 	T_CustomPath,
diff --git src/include/nodes/pathnodes.h src/include/nodes/pathnodes.h
index b4059895de..79c5f77c82 100644
--- src/include/nodes/pathnodes.h
+++ src/include/nodes/pathnodes.h
@@ -732,6 +732,7 @@ typedef struct RelOptInfo
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
 	bool		has_eclass_joins;	/* T means joininfo is incomplete */
+	bool		has_scan_setlimits; /* Rel's table AM has scan_setlimits */
 
 	/* used by partitionwise joins: */
 	bool		consider_partitionwise_join;	/* consider partitionwise join
@@ -1323,6 +1324,18 @@ typedef struct TidPath
 	List	   *tidquals;		/* qual(s) involving CTID = something */
 } TidPath;
 
+/*
+ * TidRangePath represents a scan by a continguous range of TIDs
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ */
+typedef struct TidRangePath
+{
+	Path		path;
+	List	   *tidrangequals;
+} TidRangePath;
+
 /*
  * SubqueryScanPath represents a scan of an unflattened subquery-in-FROM
  *
diff --git src/include/nodes/plannodes.h src/include/nodes/plannodes.h
index 7e6b10f86b..011fad0ac7 100644
--- src/include/nodes/plannodes.h
+++ src/include/nodes/plannodes.h
@@ -485,6 +485,19 @@ typedef struct TidScan
 	List	   *tidquals;		/* qual(s) involving CTID = something */
 } TidScan;
 
+/* ----------------
+ *		tid range scan node
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ * ----------------
+ */
+typedef struct TidRangeScan
+{
+	Scan		scan;
+	List	   *tidrangequals;	/* qual(s) involving CTID op something */
+} TidRangeScan;
+
 /* ----------------
  *		subquery scan node
  *
diff --git src/include/optimizer/cost.h src/include/optimizer/cost.h
index 8e621d2f76..be980ea6dc 100644
--- src/include/optimizer/cost.h
+++ src/include/optimizer/cost.h
@@ -83,6 +83,9 @@ extern void cost_bitmap_or_node(BitmapOrPath *path, PlannerInfo *root);
 extern void cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec);
 extern void cost_tidscan(Path *path, PlannerInfo *root,
 						 RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info);
+extern void cost_tidrangescan(Path *path, PlannerInfo *root,
+							  RelOptInfo *baserel, List *tidquals,
+							  ParamPathInfo *param_info);
 extern void cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
 							  RelOptInfo *baserel, ParamPathInfo *param_info);
 extern void cost_functionscan(Path *path, PlannerInfo *root,
diff --git src/include/optimizer/pathnode.h src/include/optimizer/pathnode.h
index 3bd7072ae8..0105f1fac4 100644
--- src/include/optimizer/pathnode.h
+++ src/include/optimizer/pathnode.h
@@ -63,6 +63,10 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 										   List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 									List *tidquals, Relids required_outer);
+extern TidRangePath *create_tidrangescan_path(PlannerInfo *root,
+											  RelOptInfo *rel,
+											  List *tidrangequals,
+											  Relids required_outer);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 									  List *subpaths, List *partial_subpaths,
 									  List *pathkeys, Relids required_outer,
diff --git src/backend/access/heap/heapam_handler.c src/backend/access/heap/heapam_handler.c
index 3eea215b85..df9e14234f 100644
--- src/backend/access/heap/heapam_handler.c
+++ src/backend/access/heap/heapam_handler.c
@@ -2539,6 +2539,7 @@ static const TableAmRoutine heapam_methods = {
 	.scan_begin = heap_beginscan,
 	.scan_end = heap_endscan,
 	.scan_rescan = heap_rescan,
+	.scan_setlimits = heap_setscanlimits,
 	.scan_getnextslot = heap_getnextslot,
 
 	.parallelscan_estimate = table_block_parallelscan_estimate,
diff --git src/backend/commands/explain.c src/backend/commands/explain.c
index d797b5f53e..f4930ca8a5 100644
--- src/backend/commands/explain.c
+++ src/backend/commands/explain.c
@@ -1057,6 +1057,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1223,6 +1224,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_TidScan:
 			pname = sname = "Tid Scan";
 			break;
+		case T_TidRangeScan:
+			pname = sname = "Tid Range Scan";
+			break;
 		case T_SubqueryScan:
 			pname = sname = "Subquery Scan";
 			break;
@@ -1417,6 +1421,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SampleScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1871,6 +1876,23 @@ ExplainNode(PlanState *planstate, List *ancestors,
 											   planstate, es);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				/*
+				 * The tidrangequals list has AND semantics, so be sure to
+				 * show it as an AND condition.
+				 */
+				List	   *tidquals = ((TidRangeScan *) plan)->tidrangequals;
+
+				if (list_length(tidquals) > 1)
+					tidquals = list_make1(make_andclause(tidquals));
+				show_scan_qual(tidquals, "TID Cond", planstate, ancestors, es);
+				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+				if (plan->qual)
+					show_instrumentation_count("Rows Removed by Filter", 1,
+											   planstate, es);
+			}
+			break;
 		case T_ForeignScan:
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
@@ -3558,6 +3580,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
diff --git src/backend/executor/Makefile src/backend/executor/Makefile
index f990c6473a..74ac59faa1 100644
--- src/backend/executor/Makefile
+++ src/backend/executor/Makefile
@@ -67,6 +67,7 @@ OBJS = \
 	nodeSubplan.o \
 	nodeSubqueryscan.o \
 	nodeTableFuncscan.o \
+	nodeTidrangescan.o \
 	nodeTidscan.o \
 	nodeUnique.o \
 	nodeValuesscan.o \
diff --git src/backend/executor/execAmi.c src/backend/executor/execAmi.c
index 0c10f1d35c..5de60a36ac 100644
--- src/backend/executor/execAmi.c
+++ src/backend/executor/execAmi.c
@@ -51,6 +51,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecReScanTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecReScanSubqueryScan((SubqueryScanState *) node);
 			break;
diff --git src/backend/executor/execProcnode.c src/backend/executor/execProcnode.c
index 01b7b926bf..a0576ac41a 100644
--- src/backend/executor/execProcnode.c
+++ src/backend/executor/execProcnode.c
@@ -109,6 +109,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -238,6 +239,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 												   estate, eflags);
 			break;
 
+		case T_TidRangeScan:
+			result = (PlanState *) ExecInitTidRangeScan((TidRangeScan *) node,
+														estate, eflags);
+			break;
+
 		case T_SubqueryScan:
 			result = (PlanState *) ExecInitSubqueryScan((SubqueryScan *) node,
 														estate, eflags);
@@ -637,6 +643,10 @@ ExecEndNode(PlanState *node)
 			ExecEndTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecEndTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecEndSubqueryScan((SubqueryScanState *) node);
 			break;
diff --git src/backend/executor/nodeTidrangescan.c src/backend/executor/nodeTidrangescan.c
new file mode 100644
index 0000000000..8a72f52074
--- /dev/null
+++ src/backend/executor/nodeTidrangescan.c
@@ -0,0 +1,580 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.c
+ *	  Routines to support tid range scans of relations
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeTidrangescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "access/tableam.h"
+#include "catalog/pg_operator.h"
+#include "executor/execdebug.h"
+#include "executor/nodeTidrangescan.h"
+#include "nodes/nodeFuncs.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+
+
+#define IsCTIDVar(node)  \
+	((node) != NULL && \
+	 IsA((node), Var) && \
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
+	 ((Var *) (node))->varlevelsup == 0)
+
+typedef enum
+{
+	TIDEXPR_UPPER_BOUND,
+	TIDEXPR_LOWER_BOUND
+} TidExprType;
+
+/* Upper or lower range bound for scan */
+typedef struct TidOpExpr
+{
+	TidExprType exprtype;		/* type of op */
+	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
+	bool		inclusive;		/* whether op is inclusive */
+} TidOpExpr;
+
+/*
+ * For the given 'expr', build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.
+ */
+static TidOpExpr *
+MakeTidOpExpr(OpExpr *expr, TidRangeScanState *tidstate)
+{
+	Node	   *arg1 = get_leftop((Expr *) expr);
+	Node	   *arg2 = get_rightop((Expr *) expr);
+	ExprState  *exprstate = NULL;
+	bool		invert = false;
+	TidOpExpr  *tidopexpr;
+
+	if (IsCTIDVar(arg1))
+		exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
+	else if (IsCTIDVar(arg2))
+	{
+		exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
+		invert = true;
+	}
+	else
+		elog(ERROR, "could not identify CTID variable");
+
+	tidopexpr = (TidOpExpr *) palloc0(sizeof(TidOpExpr));
+
+	switch (expr->opno)
+	{
+		case TIDLessEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDLessOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
+			break;
+		case TIDGreaterEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDGreaterOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
+			break;
+		default:
+			elog(ERROR, "could not identify CTID operator");
+	}
+
+	tidopexpr->exprstate = exprstate;
+
+	return tidopexpr;
+}
+
+/*
+ * Extract the qual subexpressions that yield TIDs to search for,
+ * and compile them into ExprStates if they're ordinary expressions.
+ */
+static void
+TidExprListCreate(TidRangeScanState *tidrangestate)
+{
+	TidRangeScan *node = (TidRangeScan *) tidrangestate->ss.ps.plan;
+	List	   *tidexprs = NIL;
+	ListCell   *l;
+
+	foreach(l, node->tidrangequals)
+	{
+		OpExpr	   *opexpr = lfirst(l);
+		TidOpExpr  *tidopexpr;
+
+		if (!IsA(opexpr, OpExpr))
+			elog(ERROR, "could not identify CTID expression");
+
+		tidopexpr = MakeTidOpExpr(opexpr, tidrangestate);
+		tidexprs = lappend(tidexprs, tidopexpr);
+	}
+
+	tidrangestate->trss_tidexprs = tidexprs;
+}
+
+/*
+ * Set 'lowerBound' based on 'tid'.  If 'inclusive' is false then the
+ * lowerBound is incremented to the next tid value so that it becomes
+ * inclusive.  If there is no valid next tid value then we return false,
+ * otherwise we return true.
+ */
+static bool
+SetTidLowerBound(ItemPointer tid, bool inclusive, ItemPointer lowerBound)
+{
+	OffsetNumber offset;
+
+	*lowerBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	if (!inclusive)
+	{
+		/* Check if the lower bound is actually in the next block. */
+		if (offset >= MaxOffsetNumber)
+		{
+			BlockNumber block = ItemPointerGetBlockNumberNoCheck(lowerBound);
+
+			/*
+			 * If the lower bound was already at or above the maximum block
+			 * number, then there is no valid value for it be set to.
+			 */
+			if (block >= MaxBlockNumber)
+				return false;
+
+			/* Set the lowerBound to the first offset in the next block */
+			ItemPointerSet(lowerBound, block + 1, 1);
+		}
+		else
+			ItemPointerSetOffsetNumber(lowerBound, OffsetNumberNext(offset));
+	}
+	else if (offset == 0)
+		ItemPointerSetOffsetNumber(lowerBound, 1);
+
+	return true;
+}
+
+/*
+ * Set 'upperBound' based on 'tid'.  If 'inclusive' is false then the
+ * upperBound is decremented to the previous tid value so that it becomes
+ * inclusive.  If there is no valid previous tid value then we return false,
+ * otherwise we return true.
+ */
+static bool
+SetTidUpperBound(ItemPointer tid, bool inclusive, ItemPointer upperBound)
+{
+	OffsetNumber offset;
+
+	*upperBound = *tid;
+	offset = ItemPointerGetOffsetNumberNoCheck(tid);
+
+	/*
+	 * Since TID offsets start at 1, an inclusive upper bound with offset 0
+	 * can be treated as an exclusive bound.  This has the benefit of
+	 * eliminating that block from the scan range.
+	 */
+	if (inclusive && offset == 0)
+		inclusive = false;
+
+	if (!inclusive)
+	{
+		/* Check if the upper bound is actually in the previous block. */
+		if (offset == 0)
+		{
+			BlockNumber block = ItemPointerGetBlockNumberNoCheck(upperBound);
+
+			/*
+			 * If the upper bound was already in block 0, then there is no
+			 * valid value for it to be set to.
+			 */
+			if (block == 0)
+				return false;
+
+			ItemPointerSet(upperBound, block - 1, MaxOffsetNumber);
+		}
+		else
+			ItemPointerSetOffsetNumber(upperBound, OffsetNumberPrev(offset));
+	}
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeEval
+ *
+ *		Compute and set node's block and offset range to scan by evaluating
+ *		the trss_tidexprs.  If we detect an invalid range that cannot yield
+ *		any rows, the range is left unset.
+ * ----------------------------------------------------------------
+ */
+static void
+TidRangeEval(TidRangeScanState *node)
+{
+	ExprContext *econtext = node->ss.ps.ps_ExprContext;
+	BlockNumber nblocks;
+	ItemPointerData lowerBound;
+	ItemPointerData upperBound;
+	ListCell   *l;
+
+	/*
+	 * We silently discard any TIDs that are out of range at the time of scan
+	 * start.  (Since we hold at least AccessShareLock on the table, it won't
+	 * be possible for someone to truncate away the blocks we intend to
+	 * visit.)
+	 */
+	nblocks = RelationGetNumberOfBlocks(node->ss.ss_currentRelation);
+
+	/* The biggest range on an empty table is empty; just skip it. */
+	if (nblocks == 0)
+		return;
+
+	/* Set the lower and upper bound to scan the whole table. */
+	ItemPointerSet(&lowerBound, 0, 1);
+	ItemPointerSet(&upperBound, nblocks - 1, MaxOffsetNumber);
+
+	foreach(l, node->trss_tidexprs)
+	{
+		TidOpExpr  *tidopexpr = (TidOpExpr *) lfirst(l);
+		ItemPointer itemptr;
+		bool		isNull;
+
+		/* Evaluate this bound. */
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidopexpr->exprstate,
+													  econtext,
+													  &isNull));
+
+		/* If the bound is NULL, *nothing* matches the qual. */
+		if (isNull)
+			return;
+
+		if (tidopexpr->exprtype == TIDEXPR_LOWER_BOUND)
+		{
+			ItemPointerData lb;
+
+			/*
+			 * If the lower bound is beyond the maximum value for ctid, then
+			 * just bail without setting the range.  No rows can match.
+			 */
+			if (!SetTidLowerBound(itemptr, tidopexpr->inclusive, &lb))
+				return;
+
+			if (ItemPointerCompare(&lb, &lowerBound) > 0)
+				lowerBound = lb;
+		}
+
+		if (tidopexpr->exprtype == TIDEXPR_UPPER_BOUND)
+		{
+			ItemPointerData ub;
+
+			/*
+			 * If the upper bound is below the minimum value for ctid, then
+			 * just bail without setting the range.  No rows can match.
+			 */
+			if (!SetTidUpperBound(itemptr, tidopexpr->inclusive, &ub))
+				return;
+
+			if (ItemPointerCompare(&ub, &upperBound) < 0)
+				upperBound = ub;
+		}
+	}
+
+	/* If the resulting range is not empty, set it. */
+	if (ItemPointerCompare(&lowerBound, &upperBound) <= 0)
+	{
+		node->trss_startBlock = ItemPointerGetBlockNumberNoCheck(&lowerBound);
+		node->trss_endBlock = ItemPointerGetBlockNumberNoCheck(&upperBound);
+		node->trss_startOffset = ItemPointerGetOffsetNumberNoCheck(&lowerBound);
+		node->trss_endOffset = ItemPointerGetOffsetNumberNoCheck(&upperBound);
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		NextInTidRange
+ *
+ *		Fetch the next tuple when scanning a range of TIDs.
+ *
+ *		Since the table access method may return tuples that are in the scan
+ *		limit, but not within the required TID range, this function will
+ *		check for such tuples and skip over them.
+ * ----------------------------------------------------------------
+ */
+static bool
+NextInTidRange(TidRangeScanState *node, TableScanDesc scandesc,
+			   TupleTableSlot *slot)
+{
+	for (;;)
+	{
+		BlockNumber block;
+		OffsetNumber offset;
+
+		if (!table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+			return false;
+
+		/* Check that the tuple is within the required range. */
+		block = ItemPointerGetBlockNumber(&slot->tts_tid);
+		offset = ItemPointerGetOffsetNumber(&slot->tts_tid);
+
+		/* The tuple should never come from outside the scan limits. */
+		Assert(block >= node->trss_startBlock &&
+			   block <= node->trss_endBlock);
+
+		/*
+		 * If the tuple is in the first block of the range and before the
+		 * first requested offset, then we can skip it.
+		 */
+		if (block == node->trss_startBlock && offset < node->trss_startOffset)
+		{
+			ExecClearTuple(slot);
+			continue;
+		}
+
+		/*
+		 * Similarly, if the tuple is in the last block and after the last
+		 * requested offset, we can end the scan.
+		 */
+		if (block == node->trss_endBlock && offset > node->trss_endOffset)
+		{
+			ExecClearTuple(slot);
+			return false;
+		}
+
+		return true;
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeNext
+ *
+ *		Retrieve a tuple from the TidRangeScan node's currentRelation
+ *		using the tids in the TidRangeScanState information.
+ *
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+TidRangeNext(TidRangeScanState *node)
+{
+	TableScanDesc scandesc;
+	EState	   *estate;
+	TupleTableSlot *slot;
+	bool		foundTuple;
+
+	/*
+	 * extract necessary information from tid scan node
+	 */
+	scandesc = node->ss.ss_currentScanDesc;
+	estate = node->ss.ps.state;
+	slot = node->ss.ss_ScanTupleSlot;
+
+	Assert(ScanDirectionIsForward(estate->es_direction));
+
+	if (!node->trss_inScan)
+	{
+		BlockNumber blocks_to_scan;
+
+		/* First time through, compute the list of TID ranges to be visited */
+		if (node->trss_startBlock == InvalidBlockNumber)
+			TidRangeEval(node);
+
+		if (scandesc == NULL)
+		{
+			scandesc = table_beginscan_strat(node->ss.ss_currentRelation,
+											 estate->es_snapshot,
+											 0, NULL,
+											 false, false);
+			node->ss.ss_currentScanDesc = scandesc;
+		}
+
+		/* Compute the number of blocks to scan and set the scan limits. */
+		if (node->trss_startBlock == InvalidBlockNumber)
+		{
+			/* If the range is empty, set the scan limits to zero blocks. */
+			node->trss_startBlock = 0;
+			blocks_to_scan = 0;
+		}
+		else
+			blocks_to_scan = node->trss_endBlock - node->trss_startBlock + 1;
+
+		table_scan_setlimits(scandesc, node->trss_startBlock, blocks_to_scan);
+		node->trss_inScan = true;
+	}
+
+	/* Fetch the next tuple. */
+	foundTuple = NextInTidRange(node, scandesc, slot);
+
+	/*
+	 * If we've exhausted all the tuples in the range, reset the inScan flag.
+	 * This will cause the heap to be rescanned for any subsequent fetches,
+	 * which is important for some cursor operations: for instance, FETCH LAST
+	 * fetches all the tuples in order and then fetches one tuple in reverse.
+	 */
+	if (!foundTuple)
+		node->trss_inScan = false;
+
+	return slot;
+}
+
+/*
+ * TidRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot)
+{
+	/*
+	 * XXX shouldn't we check here to make sure tuple is in TID range? In
+	 * runtime-key case this is not certain, is it?
+	 */
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecTidRangeScan(node)
+ *
+ *		Scans the relation using tids and returns the next qualifying tuple.
+ *		We call the ExecScan() routine and pass it the appropriate
+ *		access method functions.
+ *
+ *		Conditions:
+ *		  -- the "cursor" maintained by the AMI is positioned at the tuple
+ *			 returned previously.
+ *
+ *		Initial States:
+ *		  -- the relation indicated is opened for scanning so that the
+ *			 "cursor" is positioned before the first qualifying tuple.
+ *		  -- trss_startBlock is InvalidBlockNumber
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecTidRangeScan(PlanState *pstate)
+{
+	TidRangeScanState *node = castNode(TidRangeScanState, pstate);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) TidRangeNext,
+					(ExecScanRecheckMtd) TidRangeRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanTidRangeScan(node)
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanTidRangeScan(TidRangeScanState *node)
+{
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+
+	if (scan != NULL)
+		table_rescan(scan, NULL);
+
+	/* mark scan as not in progress, and tid range list as not computed yet */
+	node->trss_inScan = false;
+	node->trss_startBlock = InvalidBlockNumber;
+
+	ExecScanReScan(&node->ss);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecEndTidRangeScan
+ *
+ *		Releases any storage allocated through C routines.
+ *		Returns nothing.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndTidRangeScan(TidRangeScanState *node)
+{
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+
+	if (scan != NULL)
+		table_endscan(scan);
+
+	/*
+	 * Free the exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitTidRangeScan
+ *
+ *		Initializes the tid range scan's state information, creates
+ *		scan keys, and opens the base and tid relations.
+ *
+ *		Parameters:
+ *		  node: TidRangeScan node produced by the planner.
+ *		  estate: the execution state initialized in InitPlan.
+ * ----------------------------------------------------------------
+ */
+TidRangeScanState *
+ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
+{
+	TidRangeScanState *tidrangestate;
+	Relation	currentRelation;
+
+	/*
+	 * create state structure
+	 */
+	tidrangestate = makeNode(TidRangeScanState);
+	tidrangestate->ss.ps.plan = (Plan *) node;
+	tidrangestate->ss.ps.state = estate;
+	tidrangestate->ss.ps.ExecProcNode = ExecTidRangeScan;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &tidrangestate->ss.ps);
+
+	/*
+	 * mark scan as not in progress, and tid range as not computed yet
+	 */
+	tidrangestate->trss_inScan = false;
+	tidrangestate->trss_startBlock = InvalidBlockNumber;
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	tidrangestate->ss.ss_currentRelation = currentRelation;
+	tidrangestate->ss.ss_currentScanDesc = NULL;	/* no table scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &tidrangestate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTypeTL(&tidrangestate->ss.ps);
+	ExecAssignScanProjectionInfo(&tidrangestate->ss);
+
+	/*
+	 * initialize child expressions
+	 */
+	tidrangestate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) tidrangestate);
+
+	TidExprListCreate(tidrangestate);
+
+	/*
+	 * all done.
+	 */
+	return tidrangestate;
+}
diff --git src/backend/nodes/copyfuncs.c src/backend/nodes/copyfuncs.c
index 70f8b718e0..2abc276e1c 100644
--- src/backend/nodes/copyfuncs.c
+++ src/backend/nodes/copyfuncs.c
@@ -585,6 +585,27 @@ _copyTidScan(const TidScan *from)
 	return newnode;
 }
 
+/*
+ * _copyTidRangeScan
+ */
+static TidRangeScan *
+_copyTidRangeScan(const TidRangeScan *from)
+{
+	TidRangeScan *newnode = makeNode(TidRangeScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_NODE_FIELD(tidrangequals);
+
+	return newnode;
+}
+
 /*
  * _copySubqueryScan
  */
@@ -4889,6 +4910,9 @@ copyObjectImpl(const void *from)
 		case T_TidScan:
 			retval = _copyTidScan(from);
 			break;
+		case T_TidRangeScan:
+			retval = _copyTidRangeScan(from);
+			break;
 		case T_SubqueryScan:
 			retval = _copySubqueryScan(from);
 			break;
diff --git src/backend/nodes/outfuncs.c src/backend/nodes/outfuncs.c
index d78b16ed1d..93163e3a2f 100644
--- src/backend/nodes/outfuncs.c
+++ src/backend/nodes/outfuncs.c
@@ -608,6 +608,16 @@ _outTidScan(StringInfo str, const TidScan *node)
 	WRITE_NODE_FIELD(tidquals);
 }
 
+static void
+_outTidRangeScan(StringInfo str, const TidRangeScan *node)
+{
+	WRITE_NODE_TYPE("TIDRANGESCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_NODE_FIELD(tidrangequals);
+}
+
 static void
 _outSubqueryScan(StringInfo str, const SubqueryScan *node)
 {
@@ -3770,6 +3780,9 @@ outNode(StringInfo str, const void *obj)
 			case T_TidScan:
 				_outTidScan(str, obj);
 				break;
+			case T_TidRangeScan:
+				_outTidRangeScan(str, obj);
+				break;
 			case T_SubqueryScan:
 				_outSubqueryScan(str, obj);
 				break;
diff --git src/backend/optimizer/README src/backend/optimizer/README
index efb52858c8..4a6c348162 100644
--- src/backend/optimizer/README
+++ src/backend/optimizer/README
@@ -374,6 +374,7 @@ RelOptInfo      - a relation or joined relations
   IndexPath     - index scan
   BitmapHeapPath - top of a bitmapped index scan
   TidPath       - scan by CTID
+  TidRangePath  - scan a contiguous range of CTIDs
   SubqueryScanPath - scan a subquery-in-FROM
   ForeignPath   - scan a foreign table, foreign join or foreign upper-relation
   CustomPath    - for custom scan providers
diff --git src/backend/optimizer/path/costsize.c src/backend/optimizer/path/costsize.c
index 22d6935824..40cd6fe460 100644
--- src/backend/optimizer/path/costsize.c
+++ src/backend/optimizer/path/costsize.c
@@ -1283,6 +1283,101 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_tidrangescan
+ *	  Determines and sets the costs of scanning a relation using a range of
+ *	  TIDs for 'path'
+ *
+ * 'baserel' is the relation to be scanned
+ * 'tidrangequals' is the list of TID-checkable range quals
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_tidrangescan(Path *path, PlannerInfo *root,
+				  RelOptInfo *baserel, List *tidrangequals,
+				  ParamPathInfo *param_info)
+{
+	Selectivity selectivity;
+	double		pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	QualCost	tid_qual_cost;
+	double		ntuples;
+	double		nseqpages;
+	double		spc_random_page_cost;
+	double		spc_seq_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (param_info)
+		path->rows = param_info->ppi_rows;
+	else
+		path->rows = baserel->rows;
+
+	/* Count how many tuples and pages we expect to scan */
+	selectivity = clauselist_selectivity(root, tidrangequals, baserel->relid,
+										 JOIN_INNER, NULL);
+	pages = ceil(selectivity * baserel->pages);
+
+	if (pages <= 0.0)
+		pages = 1.0;
+
+	/*
+	 * The first page in a range requires a random seek, but each subsequent
+	 * page is just a normal sequential page read. NOTE: it's desirable for
+	 * Tid Range Scans to cost more than the equivalent Sequential Scans,
+	 * because Seq Scans have some performance advantages such as scan
+	 * synchronization and parallelizability, and we'd prefer one of them to
+	 * be picked unless a Tid Range Scan really is better.
+	 */
+	ntuples = selectivity * baserel->tuples;
+	nseqpages = pages - 1.0;
+
+	if (!enable_tidscan)
+		startup_cost += disable_cost;
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrieved tuple.
+	 */
+	cost_qual_eval(&tid_qual_cost, tidrangequals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/* disk costs; 1 random page and the remainder as seq pages */
+	run_cost += spc_random_page_cost + spc_seq_page_cost * nseqpages;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+	/*
+	 * XXX currently we assume TID quals are a subset of qpquals at this
+	 * point; they will be removed (if possible) when we create the plan, so
+	 * we subtract their cost from the total qpqual cost.  (If the TID quals
+	 * can't be removed, this is a mistake and we're going to underestimate
+	 * the CPU cost a bit.)
+	 */
+	startup_cost += qpqual_cost.startup + tid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		tid_qual_cost.per_tuple;
+	run_cost += cpu_per_tuple * ntuples;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->pathtarget->cost.startup;
+	run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+	path->startup_cost = startup_cost;
+	path->total_cost = startup_cost + run_cost;
+}
+
 /*
  * cost_subqueryscan
  *	  Determines and returns the cost of scanning a subquery RTE.
diff --git src/backend/optimizer/path/tidpath.c src/backend/optimizer/path/tidpath.c
index 1463a82be8..aa4d6aefad 100644
--- src/backend/optimizer/path/tidpath.c
+++ src/backend/optimizer/path/tidpath.c
@@ -2,9 +2,9 @@
  *
  * tidpath.c
  *	  Routines to determine which TID conditions are usable for scanning
- *	  a given relation, and create TidPaths accordingly.
+ *	  a given relation, and create TidPaths and TidRangePaths accordingly.
  *
- * What we are looking for here is WHERE conditions of the form
+ * For TidPaths, we look for WHERE conditions of the form
  * "CTID = pseudoconstant", which can be implemented by just fetching
  * the tuple directly via heap_fetch().  We can also handle OR'd conditions
  * such as (CTID = const1) OR (CTID = const2), as well as ScalarArrayOpExpr
@@ -23,6 +23,9 @@
  * a function, but in practice it works better to keep the special node
  * representation all the way through to execution.
  *
+ * Additionally, TidRangePaths may be created for conditions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=, and
+ * AND-clauses composed of such conditions.
  *
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -63,14 +66,14 @@ IsCTIDVar(Var *var, RelOptInfo *rel)
 
 /*
  * Check to see if a RestrictInfo is of the form
- *		CTID = pseudoconstant
+ *		CTID OP pseudoconstant
  * or
- *		pseudoconstant = CTID
- * where the CTID Var belongs to relation "rel", and nothing on the
- * other side of the clause does.
+ *		pseudoconstant OP CTID
+ * where OP is a binary operation, the CTID Var belongs to relation "rel",
+ * and nothing on the other side of the clause does.
  */
 static bool
-IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+IsBinaryTidClause(RestrictInfo *rinfo, RelOptInfo *rel)
 {
 	OpExpr	   *node;
 	Node	   *arg1,
@@ -83,10 +86,9 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 		return false;
 	node = (OpExpr *) rinfo->clause;
 
-	/* Operator must be tideq */
-	if (node->opno != TIDEqualOperator)
+	/* OpExpr must have two arguments */
+	if (list_length(node->args) != 2)
 		return false;
-	Assert(list_length(node->args) == 2);
 	arg1 = linitial(node->args);
 	arg2 = lsecond(node->args);
 
@@ -116,6 +118,50 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 	return true;				/* success */
 }
 
+/*
+ * Check to see if a RestrictInfo is of the form
+ *		CTID = pseudoconstant
+ * or
+ *		pseudoconstant = CTID
+ * where the CTID Var belongs to relation "rel", and nothing on the
+ * other side of the clause does.
+ */
+static bool
+IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	if (!IsBinaryTidClause(rinfo, rel))
+		return false;
+
+	if (((OpExpr *) rinfo->clause)->opno == TIDEqualOperator)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check to see if a RestrictInfo is of the form
+ *		CTID OP pseudoconstant
+ * or
+ *		pseudoconstant OP CTID
+ * where OP is a range operator such as <, <=, >, or >=, the CTID Var belongs
+ * to relation "rel", and nothing on the other side of the clause does.
+ */
+static bool
+IsTidRangeClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	Oid			opno;
+
+	if (!IsBinaryTidClause(rinfo, rel))
+		return false;
+	opno = ((OpExpr *) rinfo->clause)->opno;
+
+	if (opno == TIDLessOperator || opno == TIDLessEqOperator ||
+		opno == TIDGreaterOperator || opno == TIDGreaterEqOperator)
+		return true;
+
+	return false;
+}
+
 /*
  * Check to see if a RestrictInfo is of the form
  *		CTID = ANY (pseudoconstant_array)
@@ -222,7 +268,7 @@ TidQualFromRestrictInfo(RestrictInfo *rinfo, RelOptInfo *rel)
  *
  * Returns a List of CTID qual RestrictInfos for the specified rel (with
  * implicit OR semantics across the list), or NIL if there are no usable
- * conditions.
+ * equality conditions.
  *
  * This function is just concerned with handling AND/OR recursion.
  */
@@ -301,6 +347,33 @@ TidQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
 	return rlst;
 }
 
+/*
+ * Extract a set of CTID range conditions from implicit-AND List of RestrictInfos
+ *
+ * Returns a List of CTID range qual RestrictInfos for the specified rel
+ * (with implicit AND semantics across the list), or NIL if there are no
+ * usable range conditions.
+ */
+static List *
+TidRangeQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
+{
+	List	   *rlst = NIL;
+	ListCell   *l;
+
+	if (!rel->has_scan_setlimits)
+		return NIL;
+
+	foreach(l, rlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (IsTidRangeClause(rinfo, rel))
+			rlst = lappend(rlst, rinfo);
+	}
+
+	return rlst;
+}
+
 /*
  * Given a list of join clauses involving our rel, create a parameterized
  * TidPath for each one that is a suitable TidEqual clause.
@@ -385,6 +458,7 @@ void
 create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 {
 	List	   *tidquals;
+	List	   *tidrangequals;
 
 	/*
 	 * If any suitable quals exist in the rel's baserestrict list, generate a
@@ -404,6 +478,26 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 												   required_outer));
 	}
 
+	/*
+	 * If there are range quals in the baserestrict list, generate a
+	 * TidRangePath.
+	 */
+	tidrangequals = TidRangeQualFromRestrictInfoList(rel->baserestrictinfo,
+													 rel);
+
+	if (tidrangequals)
+	{
+		/*
+		 * This path uses no join clauses, but it could still have required
+		 * parameterization due to LATERAL refs in its tlist.
+		 */
+		Relids		required_outer = rel->lateral_relids;
+
+		add_path(rel, (Path *) create_tidrangescan_path(root, rel,
+														tidrangequals,
+														required_outer));
+	}
+
 	/*
 	 * Try to generate parameterized TidPaths using equality clauses extracted
 	 * from EquivalenceClasses.  (This is important since simple "t1.ctid =
diff --git src/backend/optimizer/plan/createplan.c src/backend/optimizer/plan/createplan.c
index f7a8dae3c6..bdfee9cc61 100644
--- src/backend/optimizer/plan/createplan.c
+++ src/backend/optimizer/plan/createplan.c
@@ -129,6 +129,10 @@ static Plan *create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
 static void bitmap_subplan_mark_shared(Plan *plan);
 static TidScan *create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 									List *tlist, List *scan_clauses);
+static TidRangeScan *create_tidrangescan_plan(PlannerInfo *root,
+											  TidRangePath *best_path,
+											  List *tlist,
+											  List *scan_clauses);
 static SubqueryScan *create_subqueryscan_plan(PlannerInfo *root,
 											  SubqueryScanPath *best_path,
 											  List *tlist, List *scan_clauses);
@@ -193,6 +197,8 @@ static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 											Index scanrelid);
 static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
 							 List *tidquals);
+static TidRangeScan *make_tidrangescan(List *qptlist, List *qpqual,
+									   Index scanrelid, List *tidrangequals);
 static SubqueryScan *make_subqueryscan(List *qptlist,
 									   List *qpqual,
 									   Index scanrelid,
@@ -384,6 +390,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -679,6 +686,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												scan_clauses);
 			break;
 
+		case T_TidRangeScan:
+			plan = (Plan *) create_tidrangescan_plan(root,
+													 (TidRangePath *) best_path,
+													 tlist,
+													 scan_clauses);
+			break;
+
 		case T_SubqueryScan:
 			plan = (Plan *) create_subqueryscan_plan(root,
 													 (SubqueryScanPath *) best_path,
@@ -3440,6 +3454,71 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_tidrangescan_plan
+ *	 Returns a tidrangescan plan for the base relation scanned by 'best_path'
+ *	 with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static TidRangeScan *
+create_tidrangescan_plan(PlannerInfo *root, TidRangePath *best_path,
+						 List *tlist, List *scan_clauses)
+{
+	TidRangeScan *scan_plan;
+	Index		scan_relid = best_path->path.parent->relid;
+	List	   *tidrangequals = best_path->tidrangequals;
+
+	/* it should be a base rel... */
+	Assert(scan_relid > 0);
+	Assert(best_path->path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * The qpqual list must contain all restrictions not enforced by the
+	 * tidrangequals list.  tidrangequals has AND semantics, so we can simply
+	 * remove any qual that appears in it.
+	 */
+	{
+		List	   *qpqual = NIL;
+		ListCell   *l;
+
+		foreach(l, scan_clauses)
+		{
+			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+			if (rinfo->pseudoconstant)
+				continue;		/* we may drop pseudoconstants here */
+			if (list_member_ptr(tidrangequals, rinfo))
+				continue;		/* simple duplicate */
+			qpqual = lappend(qpqual, rinfo);
+		}
+		scan_clauses = qpqual;
+	}
+
+	/* Sort clauses into best execution order */
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+
+	/* Reduce RestrictInfo lists to bare expressions; ignore pseudoconstants */
+	tidrangequals = extract_actual_clauses(tidrangequals, false);
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/* Replace any outer-relation variables with nestloop params */
+	if (best_path->path.param_info)
+	{
+		tidrangequals = (List *)
+			replace_nestloop_params(root, (Node *) tidrangequals);
+		scan_clauses = (List *)
+			replace_nestloop_params(root, (Node *) scan_clauses);
+	}
+
+	scan_plan = make_tidrangescan(tlist,
+								  scan_clauses,
+								  scan_relid,
+								  tidrangequals);
+
+	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
+
+	return scan_plan;
+}
+
 /*
  * create_subqueryscan_plan
  *	 Returns a subqueryscan plan for the base relation scanned by 'best_path'
@@ -5373,6 +5452,25 @@ make_tidscan(List *qptlist,
 	return node;
 }
 
+static TidRangeScan *
+make_tidrangescan(List *qptlist,
+				  List *qpqual,
+				  Index scanrelid,
+				  List *tidrangequals)
+{
+	TidRangeScan *node = makeNode(TidRangeScan);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->tidrangequals = tidrangequals;
+
+	return node;
+}
+
 static SubqueryScan *
 make_subqueryscan(List *qptlist,
 				  List *qpqual,
diff --git src/backend/optimizer/plan/setrefs.c src/backend/optimizer/plan/setrefs.c
index 127ea3d856..7ce2d00b2b 100644
--- src/backend/optimizer/plan/setrefs.c
+++ src/backend/optimizer/plan/setrefs.c
@@ -619,6 +619,22 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 								  rtoffset, 1);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				TidRangeScan *splan = (TidRangeScan *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist,
+								  rtoffset, NUM_EXEC_TLIST(plan));
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+				splan->tidrangequals =
+					fix_scan_list(root, splan->tidrangequals,
+								  rtoffset, 1); /* v9_tid XXX Not sure this is right */
+			}
+			break;
 		case T_SubqueryScan:
 			/* Needs special treatment, see comments below */
 			return set_subqueryscan_references(root,
diff --git src/backend/optimizer/plan/subselect.c src/backend/optimizer/plan/subselect.c
index fcce81926b..094d5b50d0 100644
--- src/backend/optimizer/plan/subselect.c
+++ src/backend/optimizer/plan/subselect.c
@@ -2367,6 +2367,12 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_TidRangeScan:
+			finalize_primnode((Node *) ((TidRangeScan *) plan)->tidrangequals,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			break;
+
 		case T_SubqueryScan:
 			{
 				SubqueryScan *sscan = (SubqueryScan *) plan;
diff --git src/backend/optimizer/util/pathnode.c src/backend/optimizer/util/pathnode.c
index 51478957fb..e28d74afe9 100644
--- src/backend/optimizer/util/pathnode.c
+++ src/backend/optimizer/util/pathnode.c
@@ -1203,6 +1203,35 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 	return pathnode;
 }
 
+/*
+ * create_tidscan_path
+ *	  Creates a path corresponding to a scan by a range of TIDs, returning
+ *	  the pathnode.
+ */
+TidRangePath *
+create_tidrangescan_path(PlannerInfo *root, RelOptInfo *rel,
+						 List *tidrangequals, Relids required_outer)
+{
+	TidRangePath *pathnode = makeNode(TidRangePath);
+
+	pathnode->path.pathtype = T_TidRangeScan;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_workers = 0;
+	pathnode->path.pathkeys = NIL;	/* always unordered */
+
+	pathnode->tidrangequals = tidrangequals;
+
+	cost_tidrangescan(&pathnode->path, root, rel, tidrangequals,
+					  pathnode->path.param_info);
+
+	return pathnode;
+}
+
 /*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
diff --git src/backend/optimizer/util/plancat.c src/backend/optimizer/util/plancat.c
index daf1759623..4333f6c4c2 100644
--- src/backend/optimizer/util/plancat.c
+++ src/backend/optimizer/util/plancat.c
@@ -466,6 +466,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	/* Collect info about relation's foreign keys, if relevant */
 	get_relation_foreign_keys(root, rel, relation, inhparent);
 
+	/* Collect info about functions implemented by the rel's table AM. */
+	rel->has_scan_setlimits = relation->rd_tableam &&
+							  relation->rd_tableam->scan_setlimits != NULL;
+
 	/*
 	 * Collect info about relation's partitioning scheme, if any. Only
 	 * inheritance parents may be partitioned.
diff --git src/backend/optimizer/util/relnode.c src/backend/optimizer/util/relnode.c
index 9c9a738c80..9536c238fb 100644
--- src/backend/optimizer/util/relnode.c
+++ src/backend/optimizer/util/relnode.c
@@ -247,6 +247,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->baserestrict_min_security = UINT_MAX;
 	rel->joininfo = NIL;
 	rel->has_eclass_joins = false;
+	rel->has_scan_setlimits = false;
 	rel->consider_partitionwise_join = false;	/* might get changed later */
 	rel->part_scheme = NULL;
 	rel->nparts = -1;
@@ -659,6 +660,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->baserestrict_min_security = UINT_MAX;
 	joinrel->joininfo = NIL;
 	joinrel->has_eclass_joins = false;
+	joinrel->has_scan_setlimits = false;
 	joinrel->consider_partitionwise_join = false;	/* might get changed later */
 	joinrel->top_parent_relids = NULL;
 	joinrel->part_scheme = NULL;
@@ -836,6 +838,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	joinrel->baserestrictcost.per_tuple = 0;
 	joinrel->joininfo = NIL;
 	joinrel->has_eclass_joins = false;
+	joinrel->has_scan_setlimits = false;
 	joinrel->consider_partitionwise_join = false;	/* might get changed later */
 	joinrel->top_parent_relids = NULL;
 	joinrel->part_scheme = NULL;
diff --git src/test/regress/expected/tidrangescan.out src/test/regress/expected/tidrangescan.out
new file mode 100644
index 0000000000..fc11894c8e
--- /dev/null
+++ src/test/regress/expected/tidrangescan.out
@@ -0,0 +1,245 @@
+-- tests for tidrangescans
+SET enable_seqscan TO off;
+CREATE TABLE tidrangescan(id integer, data text);
+-- insert enough tuples to fill at least two pages
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,200) AS s(i);
+-- remove all tuples after the 10th tuple on each page.  Trying to ensure
+-- we get the same layout with all CPU architectures and smaller than standard
+-- page sizes.
+DELETE FROM tidrangescan
+WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer > 2;
+VACUUM tidrangescan;
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid <= '(1,5)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+ (1,1)
+ (1,2)
+ (1,3)
+ (1,4)
+ (1,5)
+(15 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(0,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid > '(2,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+  ctid  
+--------
+ (2,9)
+ (2,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ('(2,8)'::tid < ctid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+  ctid  
+--------
+ (2,9)
+ (2,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(2,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+  ctid  
+--------
+ (2,8)
+ (2,9)
+ (2,10)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(100,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ((ctid > '(1,4)'::tid) AND ('(1,7)'::tid >= ctid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+ ctid  
+-------
+ (1,5)
+ (1,6)
+ (1,7)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (('(1,7)'::tid >= ctid) AND (ctid > '(1,4)'::tid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+ ctid  
+-------
+ (1,5)
+ (1,6)
+ (1,7)
+(3 rows)
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan where ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan where ctid < '(0,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+ ctid 
+------
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid > '(9,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+ ctid 
+------
+(0 rows)
+
+-- cursors
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH NEXT c;
+ ctid  
+-------
+ (0,2)
+(1 row)
+
+FETCH PRIOR c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH FIRST c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH LAST c;
+  ctid  
+--------
+ (0,10)
+(1 row)
+
+COMMIT;
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
+RESET enable_seqscan;
diff --git src/test/regress/parallel_schedule src/test/regress/parallel_schedule
index e0e1ef71dd..2b9763a869 100644
--- src/test/regress/parallel_schedule
+++ src/test/regress/parallel_schedule
@@ -80,7 +80,7 @@ test: brin gin gist spgist privileges init_privs security_label collate matview
 # ----------
 # Another group of parallel tests
 # ----------
-test: create_table_like alter_generic alter_operator misc async dbsize misc_functions sysviews tsrf tid tidscan collate.icu.utf8 incremental_sort
+test: create_table_like alter_generic alter_operator misc async dbsize misc_functions sysviews tsrf tid tidscan tidrangescan collate.icu.utf8 incremental_sort
 
 # rules cannot run concurrently with any test that creates
 # a view or rule in the public schema
diff --git src/test/regress/sql/tidrangescan.sql src/test/regress/sql/tidrangescan.sql
new file mode 100644
index 0000000000..d60439d56c
--- /dev/null
+++ src/test/regress/sql/tidrangescan.sql
@@ -0,0 +1,83 @@
+-- tests for tidrangescans
+
+SET enable_seqscan TO off;
+CREATE TABLE tidrangescan(id integer, data text);
+
+-- insert enough tuples to fill at least two pages
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,200) AS s(i);
+
+-- remove all tuples after the 10th tuple on each page.  Trying to ensure
+-- we get the same layout with all CPU architectures and smaller than standard
+-- page sizes.
+DELETE FROM tidrangescan
+WHERE substring(ctid::text from ',(\d+)\)')::integer > 10 OR substring(ctid::text from '\((\d+),')::integer > 2;
+VACUUM tidrangescan;
+
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan where ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+SELECT ctid FROM tidrangescan where ctid < '(0,0)' LIMIT 1;
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+
+-- cursors
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+FETCH NEXT c;
+FETCH PRIOR c;
+FETCH FIRST c;
+FETCH LAST c;
+COMMIT;
+
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
+
+RESET enable_seqscan;
diff --git src/tools/pgindent/typedefs.list src/tools/pgindent/typedefs.list
index 9cd047ba25..f12d60debf 100644
--- src/tools/pgindent/typedefs.list
+++ src/tools/pgindent/typedefs.list
@@ -2526,8 +2526,13 @@ TextPositionState
 TheLexeme
 TheSubstitute
 TidExpr
+TidExprType
 TidHashKey
+TidOpExpr
 TidPath
+TidRangePath
+TidRangeScan
+TidRangeScanState
 TidScan
 TidScanState
 TimeADT

--------------2.29.2--

#80

Edmund Horner

ejrh00@gmail.com

almost 5 years ago

In reply to: David Fetter (#79)

Re: Tid scan improvements

On Fri, 1 Jan 2021 at 14:30, David Fetter <david@fetter.org> wrote:

On Sun, Dec 01, 2019 at 11:34:16AM +0900, Michael Paquier wrote:

Okay, still nothing has happened after two months. Once this is
solved a new patch submission could be done. For now I have marked
the entry as returned with feedback.

I dusted off the last version of the patch without implementing the
suggestions in this thread between here and there.

I think this is a capability worth having, as I was surprised when it
turned out that it didn't exist when I was looking to make an
improvement to pg_dump. My idea, which I'll get back to when this is
in, was to use special knowledge of heap AM tables to make it possible
to parallelize dumps of large tables by working separately on each
underlying file, of which there could be quite a few for a large one.

Will try to understand the suggestions upthread better and implement
same.

Hi David,

Thanks for updating the patch. I'd be very happy if this got picked up
again, and I'd certainly follow along and do some review.

+                               splan->tidrangequals =
+                                       fix_scan_list(root,
splan->tidrangequals,
+                                                                 rtoffset,
1); /* v9_tid XXX Not sure this is right */

I'm pretty sure the parameter num_exec = 1 is fine; it seems to only affect
correlated subselects, which shouldn't really make their way into the
tidrangequals expressions. It's more or less the same situation as
tidquals for TidPath, anyway. We could put a little comment: /*
correlated subselects shouldn't get into tidquals/tidrangequals */ or
something to that effect.

Edmund

#81

David Rowley

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Edmund Horner (#80)

2 attachment(s)

Re: Tid scan improvements

On Wed, 13 Jan 2021 at 15:38, Edmund Horner <ejrh00@gmail.com> wrote:

Thanks for updating the patch. I'd be very happy if this got picked up again, and I'd certainly follow along and do some review.

Likewise here. I this patch was pretty close so it seems a shame to
let it slip through the cracks.

I spoke to Andres off-list about this patch. He mentioned that he
wasn't too keen on seeing the setscanlimits being baked into the table
AM API. He mentioned that he'd rather not assume too much about table
AMs having all of their tids in a given range consecutively on a set
of pages. That seems reasonable to me. He suggested that we add a
new callback that just allows a range of ItemPointers to scan and
leave it up to the implementation to decide which pages should be
scanned to fetch the tuples within the given range. I didn't argue. I
just went and coded it all, hopefully to Andres' description. The new
table AM callback is optional.

I've implemented this in the attached.

I also took the time to support backwards TID Range scans and added a
few more tests to test rescans. I just found it annoying that TID
Scans supported backwards scans and TID Range scans did not.

The 0002 patch is the guts of it. The 0001 patch is an existing bug
that needs to be fixed before 0002 could go in (backwards TID Range
Scans are broken without this). I've posted separately about this bug
in [1]/messages/by-id/CAApHDvpGc9h0_oVD2CtgBcxCS1N-qDYZSeBRnUh+0CWJA9cMaA@mail.gmail.com

I also didn't really like the idea of adding possibly lots of bool
fields to RelOptInfo to describe what the planner can do in regards to
what the given table AM supports. I know that IndexOptInfo has such
a set of bool fields. I'd rather not repeat that, so I just went with
a single int field named "amflags" and just added a single constant to
define a flag that specifies if the rel supports scanning ranges of
TIDs.

Edmund, will you get time to a look at this?

David

[1]: /messages/by-id/CAApHDvpGc9h0_oVD2CtgBcxCS1N-qDYZSeBRnUh+0CWJA9cMaA@mail.gmail.com

Attachments:

v11-0001-Fix-hypothetical-bug-in-heap-backward-scans.patchtext/plain; charset=US-ASCII; name=v11-0001-Fix-hypothetical-bug-in-heap-backward-scans.patchDownload

From 3a0469df93690889793823fdec235f72c8fb81d7 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 21 Jan 2021 16:27:25 +1300
Subject: [PATCH v11 1/2] Fix hypothetical bug in heap backward scans

Both heapgettup() and heapgettup_pagemode() incorrectly set the first page
to scan in a backward scan in which the pages to scan was specified by
heap_setscanlimits(). In theory, this could result in the incorrect pages
being scanned.  In practice, nowhere in core code performs backward scans
after having used heap_setscanlimits().  However, it's possible an
extension uses the heap functions in this way.

For the bug to manifest, the scan must be limited to fewer than the number
of pages in the relation and start at page 0.  The scan will start on the
final page in the table rather than the final page in the range of pages
to scan.  The correct number of pages is always scanned, it's just the
pages which are scanned which can be incorrect.

This is a precursor fix to a future patch which allows TID Range scans to
scan a subset of a heap table.

Proper adjustment of the heap scan code seems to have been missed when
heap_setscanlimits() was added in 7516f5259.

Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvpGc9h0_oVD2CtgBcxCS1N-qDYZSeBRnUh+0CWJA9cMaA@mail.gmail.com
---
 src/backend/access/heap/heapam.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index faffbb1865..ddd214b7af 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -603,11 +603,15 @@ heapgettup(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_base.rs_flags &= ~SO_ALLOW_SYNC;
-			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
+
+			/*
+			 * Start from last page of the scan.  Ensure we take into account
+			 * rs_numblocks if it's been adjusted by heap_setscanlimits().
+			 */
+			if (scan->rs_numblocks != InvalidBlockNumber)
+				page = (scan->rs_startblock + scan->rs_numblocks - 1) % scan->rs_nblocks;
 			else
-				page = scan->rs_nblocks - 1;
+				page = (scan->rs_startblock + scan->rs_nblocks - 1) % scan->rs_nblocks;
 			heapgetpage((TableScanDesc) scan, page);
 		}
 		else
@@ -918,11 +922,15 @@ heapgettup_pagemode(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_base.rs_flags &= ~SO_ALLOW_SYNC;
-			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-				page = scan->rs_startblock - 1;
+
+			/*
+			 * Start from last page of the scan.  Ensure we take into account
+			 * rs_numblocks if it's been adjusted by heap_setscanlimits().
+			 */
+			if (scan->rs_numblocks != InvalidBlockNumber)
+				page = (scan->rs_startblock + scan->rs_numblocks - 1) % scan->rs_nblocks;
 			else
-				page = scan->rs_nblocks - 1;
+				page = (scan->rs_startblock + scan->rs_nblocks - 1) % scan->rs_nblocks;
 			heapgetpage((TableScanDesc) scan, page);
 		}
 		else
-- 
2.27.0

v11-0002-Add-TID-Range-Scans-to-support-efficient-scannin.patchtext/plain; charset=US-ASCII; name=v11-0002-Add-TID-Range-Scans-to-support-efficient-scannin.patchDownload

From 54654fe357c8370cf2a43de152b533122053c130 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 21 Jan 2021 16:48:15 +1300
Subject: [PATCH v11 2/2] Add TID Range Scans to support efficient scanning
 ranges of TIDs

This adds a new node type named TID Range Scan.  The query planner will
generate paths for TID Range scans when quals are discovered on base
relations which search for ranges of ctid.  These ranges may be open at
either end.

To support this, a new optional callback function has been added to table
AM which is named scan_getnextslot_inrange.  This function accepts a
minimum and maximum ItemPointer to allow efficient retrieval of tuples
within this range.  Table AMs where scanning ranges of TIDs does not make
sense or is difficult to implement efficiently may choose to not implement
this function.

Author: Edmund Horner and David Rowley
Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com
---
 src/backend/access/heap/heapam.c           | 132 +++++++
 src/backend/access/heap/heapam_handler.c   |   1 +
 src/backend/commands/explain.c             |  23 ++
 src/backend/executor/Makefile              |   1 +
 src/backend/executor/execAmi.c             |   6 +
 src/backend/executor/execProcnode.c        |  10 +
 src/backend/executor/nodeTidrangescan.c    | 409 +++++++++++++++++++++
 src/backend/nodes/copyfuncs.c              |  24 ++
 src/backend/nodes/outfuncs.c               |  13 +
 src/backend/optimizer/README               |   1 +
 src/backend/optimizer/path/costsize.c      |  95 +++++
 src/backend/optimizer/path/tidpath.c       | 117 +++++-
 src/backend/optimizer/plan/createplan.c    |  98 +++++
 src/backend/optimizer/plan/setrefs.c       |  16 +
 src/backend/optimizer/plan/subselect.c     |   6 +
 src/backend/optimizer/util/pathnode.c      |  29 ++
 src/backend/optimizer/util/plancat.c       |   4 +
 src/backend/optimizer/util/relnode.c       |   3 +
 src/backend/storage/page/itemptr.c         |  58 +++
 src/include/access/heapam.h                |   4 +
 src/include/access/tableam.h               |  50 +++
 src/include/catalog/pg_operator.dat        |   6 +-
 src/include/executor/nodeTidrangescan.h    |  23 ++
 src/include/nodes/execnodes.h              |  18 +
 src/include/nodes/nodes.h                  |   3 +
 src/include/nodes/pathnodes.h              |  18 +
 src/include/nodes/plannodes.h              |  13 +
 src/include/optimizer/cost.h               |   3 +
 src/include/optimizer/pathnode.h           |   4 +
 src/include/storage/itemptr.h              |   2 +
 src/test/regress/expected/tidrangescan.out | 302 +++++++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/sql/tidrangescan.sql      | 104 ++++++
 src/tools/pgindent/typedefs.list           |   5 +
 34 files changed, 1588 insertions(+), 15 deletions(-)
 create mode 100644 src/backend/executor/nodeTidrangescan.c
 create mode 100644 src/include/executor/nodeTidrangescan.h
 create mode 100644 src/test/regress/expected/tidrangescan.out
 create mode 100644 src/test/regress/sql/tidrangescan.sql

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ddd214b7af..4828cdd1e2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1387,6 +1387,138 @@ heap_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *s
 	return true;
 }
 
+bool
+heap_getnextslot_inrange(TableScanDesc sscan, ScanDirection direction,
+						 TupleTableSlot *slot, ItemPointer mintid,
+						 ItemPointer maxtid)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+
+	if (!scan->rs_inited)
+	{
+		BlockNumber startBlk;
+		BlockNumber numBlks;
+		ItemPointerData highestItem;
+		ItemPointerData lowestItem;
+
+		/* A relation with zero blocks won't have any tuples */
+		if (scan->rs_nblocks == 0)
+			return false;
+
+		/*
+		 * Set up some ItemPointers which point to the first and last possible
+		 * tuples in the heap.
+		 */
+		ItemPointerSet(&highestItem, scan->rs_nblocks - 1, MaxOffsetNumber);
+		ItemPointerSet(&lowestItem, 0, FirstOffsetNumber);
+
+		/*
+		 * If the given maximum TID is below the highest possible TID in the
+		 * relation, then restrict the range to that, otherwise we scan to the
+		 * end of the relation.
+		 */
+		if (ItemPointerCompare(maxtid, &highestItem) < 0)
+			ItemPointerCopy(maxtid, &highestItem);
+
+		/*
+		 * If the given minimum TID is above the lowest possible TID in the
+		 * relation, then restrict the range to only scan for TIDs above that.
+		 */
+		if (ItemPointerCompare(mintid, &lowestItem) > 0)
+			ItemPointerCopy(mintid, &lowestItem);
+
+		/*
+		 * Check for an empty range and protect from would be negative results
+		 * from the numBlks to scan calculation below.
+		 */
+		if (ItemPointerCompare(&highestItem, &lowestItem) < 0)
+			return false;
+
+		/*
+		 * Calculate the first block and the number of blocks we must scan.
+		 * We could be more aggressive here and perform some more validation
+		 * to try and further narrow the scope of blocks to scan by checking
+		 * if the lowerItem has an offset above MaxOffsetNumber.  In this
+		 * case, we could advance startBlk by one.  Likewise if highestItem
+		 * has an offset of 0 we could scan one fewer blocks.  However, such
+		 * an optimization does not seem worth troubling over, currently.
+		 */
+		startBlk = ItemPointerGetBlockNumberNoCheck(&lowestItem);
+
+		numBlks = ItemPointerGetBlockNumberNoCheck(&highestItem) -
+				  ItemPointerGetBlockNumberNoCheck(&lowestItem) + 1;
+
+		/* Set the start block and number of blocks to scan */
+		heap_setscanlimits(sscan, startBlk, numBlks);
+	}
+
+	/* Note: no locking manipulations needed */
+	for (;;)
+	{
+
+		if (sscan->rs_flags & SO_ALLOW_PAGEMODE)
+			heapgettup_pagemode(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+		else
+			heapgettup(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+
+		if (scan->rs_ctup.t_data == NULL)
+		{
+			ExecClearTuple(slot);
+			return false;
+		}
+
+		/*
+		 * We've used heap_setscanlimits above so we only look at pages that
+		 * are likely to contain tuples we're interested in.  We must still
+		 * filter out tuples in the first page that are less than mintid.
+		 */
+		if (ItemPointerCompare(&scan->rs_ctup.t_self, mintid) < 0)
+		{
+			ExecClearTuple(slot);
+
+			/*
+			 * When scanning backwards, the TIDs will be in descending order.
+			 * Future tuples in this direction will be lower still, so we can
+			 * just return false to indicate there will be no more tuples.
+			 */
+			if (ScanDirectionIsBackward(direction))
+				return false;
+
+			continue;
+		}
+
+		/*
+		 * Likewise for the final page, we must filter out tids greater than
+		 * maxtid.
+		 */
+		if (ItemPointerCompare(&scan->rs_ctup.t_self, maxtid) > 0)
+		{
+			ExecClearTuple(slot);
+
+			/*
+			 * When scanning forward, the TIDs will be in ascending order.
+			 * Future tuples in this direction will be higher still, so we can
+			 * just return false to indicate there will be no more tuples.
+			 */
+			if (ScanDirectionIsForward(direction))
+				return false;
+			continue;
+		}
+
+		break;
+	}
+
+	/*
+	 * if we get here it means we have a new current scan tuple, so point to
+	 * the proper return buffer and return the tuple.
+	 */
+
+	pgstat_count_heap_getnext(scan->rs_base.rs_rd);
+
+	ExecStoreBufferHeapTuple(&scan->rs_ctup, slot, scan->rs_cbuf);
+	return true;
+}
+
 /*
  *	heap_fetch		- retrieve tuple with given tid
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 4a70e20a14..f8bbcaf448 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2541,6 +2541,7 @@ static const TableAmRoutine heapam_methods = {
 	.scan_end = heap_endscan,
 	.scan_rescan = heap_rescan,
 	.scan_getnextslot = heap_getnextslot,
+	.scan_getnextslot_inrange = heap_getnextslot_inrange,
 
 	.parallelscan_estimate = table_block_parallelscan_estimate,
 	.parallelscan_initialize = table_block_parallelscan_initialize,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5d7eb3574c..3f2ebd3b72 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1057,6 +1057,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1223,6 +1224,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_TidScan:
 			pname = sname = "Tid Scan";
 			break;
+		case T_TidRangeScan:
+			pname = sname = "Tid Range Scan";
+			break;
 		case T_SubqueryScan:
 			pname = sname = "Subquery Scan";
 			break;
@@ -1417,6 +1421,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SampleScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1871,6 +1876,23 @@ ExplainNode(PlanState *planstate, List *ancestors,
 											   planstate, es);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				/*
+				 * The tidrangequals list has AND semantics, so be sure to
+				 * show it as an AND condition.
+				 */
+				List	   *tidquals = ((TidRangeScan *) plan)->tidrangequals;
+
+				if (list_length(tidquals) > 1)
+					tidquals = list_make1(make_andclause(tidquals));
+				show_scan_qual(tidquals, "TID Cond", planstate, ancestors, es);
+				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+				if (plan->qual)
+					show_instrumentation_count("Rows Removed by Filter", 1,
+											   planstate, es);
+			}
+			break;
 		case T_ForeignScan:
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
@@ -3558,6 +3580,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..74ac59faa1 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -67,6 +67,7 @@ OBJS = \
 	nodeSubplan.o \
 	nodeSubqueryscan.o \
 	nodeTableFuncscan.o \
+	nodeTidrangescan.o \
 	nodeTidscan.o \
 	nodeUnique.o \
 	nodeValuesscan.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 23bdb53cd1..4543ac79ed 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -51,6 +51,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecReScanTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecReScanSubqueryScan((SubqueryScanState *) node);
 			break;
@@ -562,6 +567,7 @@ ExecSupportsBackwardScan(Plan *node)
 
 		case T_SeqScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_FunctionScan:
 		case T_ValuesScan:
 		case T_CteScan:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 414df50a05..29766d8196 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -109,6 +109,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -238,6 +239,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 												   estate, eflags);
 			break;
 
+		case T_TidRangeScan:
+			result = (PlanState *) ExecInitTidRangeScan((TidRangeScan *) node,
+														estate, eflags);
+			break;
+
 		case T_SubqueryScan:
 			result = (PlanState *) ExecInitSubqueryScan((SubqueryScan *) node,
 														estate, eflags);
@@ -637,6 +643,10 @@ ExecEndNode(PlanState *node)
 			ExecEndTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecEndTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecEndSubqueryScan((SubqueryScanState *) node);
 			break;
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
new file mode 100644
index 0000000000..e2a92754da
--- /dev/null
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -0,0 +1,409 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.c
+ *	  Routines to support tid range scans of relations
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeTidrangescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "access/tableam.h"
+#include "catalog/pg_operator.h"
+#include "executor/execdebug.h"
+#include "executor/nodeTidrangescan.h"
+#include "nodes/nodeFuncs.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+
+
+#define IsCTIDVar(node)  \
+	((node) != NULL && \
+	 IsA((node), Var) && \
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
+	 ((Var *) (node))->varlevelsup == 0)
+
+typedef enum
+{
+	TIDEXPR_UPPER_BOUND,
+	TIDEXPR_LOWER_BOUND
+} TidExprType;
+
+/* Upper or lower range bound for scan */
+typedef struct TidOpExpr
+{
+	TidExprType exprtype;		/* type of op; lower or upper */
+	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
+	bool		inclusive;		/* whether op is inclusive */
+} TidOpExpr;
+
+/*
+ * For the given 'expr', build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.
+ */
+static TidOpExpr *
+MakeTidOpExpr(OpExpr *expr, TidRangeScanState *tidstate)
+{
+	Node	   *arg1 = get_leftop((Expr *) expr);
+	Node	   *arg2 = get_rightop((Expr *) expr);
+	ExprState  *exprstate = NULL;
+	bool		invert = false;
+	TidOpExpr  *tidopexpr;
+
+	if (IsCTIDVar(arg1))
+		exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
+	else if (IsCTIDVar(arg2))
+	{
+		exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
+		invert = true;
+	}
+	else
+		elog(ERROR, "could not identify CTID variable");
+
+	tidopexpr = (TidOpExpr *) palloc(sizeof(TidOpExpr));
+	tidopexpr->inclusive = false;		/* for now */
+
+	switch (expr->opno)
+	{
+		case TIDLessEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDLessOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
+			break;
+		case TIDGreaterEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDGreaterOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
+			break;
+		default:
+			elog(ERROR, "could not identify CTID operator");
+	}
+
+	tidopexpr->exprstate = exprstate;
+
+	return tidopexpr;
+}
+
+/*
+ * Extract the qual subexpressions that yield TIDs to search for,
+ * and compile them into ExprStates if they're ordinary expressions.
+ */
+static void
+TidExprListCreate(TidRangeScanState *tidrangestate)
+{
+	TidRangeScan *node = (TidRangeScan *) tidrangestate->ss.ps.plan;
+	List	   *tidexprs = NIL;
+	ListCell   *l;
+
+	foreach(l, node->tidrangequals)
+	{
+		OpExpr	   *opexpr = lfirst(l);
+		TidOpExpr  *tidopexpr;
+
+		if (!IsA(opexpr, OpExpr))
+			elog(ERROR, "could not identify CTID expression");
+
+		tidopexpr = MakeTidOpExpr(opexpr, tidrangestate);
+		tidexprs = lappend(tidexprs, tidopexpr);
+	}
+
+	tidrangestate->trss_tidexprs = tidexprs;
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeEval
+ *
+ *		Compute and set node's block and offset range to scan by evaluating
+ *		the trss_tidexprs.  Returns false if we detect the range cannot
+ *		contain any tuples.  Returns true if it's possible for the range to
+ *		contain tuples.
+ * ----------------------------------------------------------------
+ */
+static bool
+TidRangeEval(TidRangeScanState *node)
+{
+	ExprContext *econtext = node->ss.ps.ps_ExprContext;
+	ItemPointerData lowerBound;
+	ItemPointerData upperBound;
+	ListCell   *l;
+
+	/*
+	 * Set the upper and lower bounds to the absolute limits of the range of
+	 * the ItemPointer type.  Below we'll try to narrow this range on either
+	 * side by looking at the TidOpExprs.
+	 */
+	ItemPointerSet(&lowerBound, 0, 0);
+	ItemPointerSet(&upperBound, InvalidBlockNumber, PG_UINT16_MAX);
+
+	foreach(l, node->trss_tidexprs)
+	{
+		TidOpExpr  *tidopexpr = (TidOpExpr *) lfirst(l);
+		ItemPointer itemptr;
+		bool		isNull;
+
+		/* Evaluate this bound. */
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidopexpr->exprstate,
+													  econtext,
+													  &isNull));
+
+		/* If the bound is NULL, *nothing* matches the qual. */
+		if (isNull)
+			return false;
+
+		if (tidopexpr->exprtype == TIDEXPR_LOWER_BOUND)
+		{
+			ItemPointerData lb;
+
+			ItemPointerCopy(itemptr, &lb);
+
+			/*
+			 * Normalize non-inclusive ranges to become inclusive.  The
+			 * resulting ItemPointer here may not be a valid item pointer.
+			 */
+			if (!tidopexpr->inclusive)
+				ItemPointerInc(&lb);
+
+			/* Check if we can narrow the range using this qual */
+			if (ItemPointerCompare(&lb, &lowerBound) > 0)
+				ItemPointerCopy(&lb, &lowerBound);
+		}
+
+		if (tidopexpr->exprtype == TIDEXPR_UPPER_BOUND)
+		{
+			ItemPointerData ub;
+
+			ItemPointerCopy(itemptr, &ub);
+
+			/*
+			 * Normalize non-inclusive ranges to become inclusive.  The
+			 * resulting ItemPointer here may not be a valid item pointer.
+			 */
+			if (!tidopexpr->inclusive)
+				ItemPointerDec(&ub);
+
+			/* Check if we can narrow the range using this qual */
+			if (ItemPointerCompare(&ub, &upperBound) < 0)
+				ItemPointerCopy(&ub, &upperBound);
+		}
+	}
+
+	ItemPointerCopy(&lowerBound, &node->trss_mintid);
+	ItemPointerCopy(&upperBound, &node->trss_maxtid);
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeNext
+ *
+ *		Retrieve a tuple from the TidRangeScan node's currentRelation
+ *		using the tids in the TidRangeScanState information.
+ *
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+TidRangeNext(TidRangeScanState *node)
+{
+	TableScanDesc scandesc;
+	EState	   *estate;
+	ScanDirection direction;
+	TupleTableSlot *slot;
+
+	/*
+	 * extract necessary information from tid scan node
+	 */
+	scandesc = node->ss.ss_currentScanDesc;
+	estate = node->ss.ps.state;
+	slot = node->ss.ss_ScanTupleSlot;
+	direction = estate->es_direction;
+
+	if (!node->trss_inScan)
+	{
+		/* First time through, compute the list of TID ranges to be visited */
+		if (!TidRangeEval(node))
+			return NULL;
+
+		if (scandesc == NULL)
+		{
+			scandesc = table_beginscan_strat(node->ss.ss_currentRelation,
+											 estate->es_snapshot,
+											 0, NULL,
+											 false, false);
+			node->ss.ss_currentScanDesc = scandesc;
+		}
+
+		node->trss_inScan = true;
+	}
+
+	/* Fetch the next tuple. */
+	if (!table_scan_getnextslot_inrange(scandesc, direction, slot,
+										&node->trss_mintid,
+										&node->trss_maxtid))
+		ExecClearTuple(slot);
+
+	return slot;
+}
+
+/*
+ * TidRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecTidRangeScan(node)
+ *
+ *		Scans the relation using tids and returns the next qualifying tuple.
+ *		We call the ExecScan() routine and pass it the appropriate
+ *		access method functions.
+ *
+ *		Conditions:
+ *		  -- the "cursor" maintained by the AMI is positioned at the tuple
+ *			 returned previously.
+ *
+ *		Initial States:
+ *		  -- the relation indicated is opened for scanning so that the
+ *			 "cursor" is positioned before the first qualifying tuple.
+ *		  -- trss_startBlock is InvalidBlockNumber
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecTidRangeScan(PlanState *pstate)
+{
+	TidRangeScanState *node = castNode(TidRangeScanState, pstate);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) TidRangeNext,
+					(ExecScanRecheckMtd) TidRangeRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanTidRangeScan(node)
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanTidRangeScan(TidRangeScanState *node)
+{
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+
+	if (scan != NULL)
+		table_rescan(scan, NULL);
+
+	/* mark scan as not in progress, and tid range list as not computed yet */
+	node->trss_inScan = false;
+
+	ExecScanReScan(&node->ss);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecEndTidRangeScan
+ *
+ *		Releases any storage allocated through C routines.
+ *		Returns nothing.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndTidRangeScan(TidRangeScanState *node)
+{
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+
+	if (scan != NULL)
+		table_endscan(scan);
+
+	/*
+	 * Free the exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitTidRangeScan
+ *
+ *		Initializes the tid range scan's state information, creates
+ *		scan keys, and opens the base and tid relations.
+ *
+ *		Parameters:
+ *		  node: TidRangeScan node produced by the planner.
+ *		  estate: the execution state initialized in InitPlan.
+ * ----------------------------------------------------------------
+ */
+TidRangeScanState *
+ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
+{
+	TidRangeScanState *tidrangestate;
+	Relation	currentRelation;
+
+	/*
+	 * create state structure
+	 */
+	tidrangestate = makeNode(TidRangeScanState);
+	tidrangestate->ss.ps.plan = (Plan *) node;
+	tidrangestate->ss.ps.state = estate;
+	tidrangestate->ss.ps.ExecProcNode = ExecTidRangeScan;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &tidrangestate->ss.ps);
+
+	/*
+	 * mark scan as not in progress, and tid range as not computed yet
+	 */
+	tidrangestate->trss_inScan = false;
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	tidrangestate->ss.ss_currentRelation = currentRelation;
+	tidrangestate->ss.ss_currentScanDesc = NULL;	/* no table scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &tidrangestate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTypeTL(&tidrangestate->ss.ps);
+	ExecAssignScanProjectionInfo(&tidrangestate->ss);
+
+	/*
+	 * initialize child expressions
+	 */
+	tidrangestate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) tidrangestate);
+
+	TidExprListCreate(tidrangestate);
+
+	/*
+	 * all done.
+	 */
+	return tidrangestate;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index ba3ccc712c..3e6e70e8aa 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -585,6 +585,27 @@ _copyTidScan(const TidScan *from)
 	return newnode;
 }
 
+/*
+ * _copyTidRangeScan
+ */
+static TidRangeScan *
+_copyTidRangeScan(const TidRangeScan *from)
+{
+	TidRangeScan *newnode = makeNode(TidRangeScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_NODE_FIELD(tidrangequals);
+
+	return newnode;
+}
+
 /*
  * _copySubqueryScan
  */
@@ -4903,6 +4924,9 @@ copyObjectImpl(const void *from)
 		case T_TidScan:
 			retval = _copyTidScan(from);
 			break;
+		case T_TidRangeScan:
+			retval = _copyTidRangeScan(from);
+			break;
 		case T_SubqueryScan:
 			retval = _copySubqueryScan(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 8392be6d44..3bbbe42ae8 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -608,6 +608,16 @@ _outTidScan(StringInfo str, const TidScan *node)
 	WRITE_NODE_FIELD(tidquals);
 }
 
+static void
+_outTidRangeScan(StringInfo str, const TidRangeScan *node)
+{
+	WRITE_NODE_TYPE("TIDRANGESCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_NODE_FIELD(tidrangequals);
+}
+
 static void
 _outSubqueryScan(StringInfo str, const SubqueryScan *node)
 {
@@ -3782,6 +3792,9 @@ outNode(StringInfo str, const void *obj)
 			case T_TidScan:
 				_outTidScan(str, obj);
 				break;
+			case T_TidRangeScan:
+				_outTidRangeScan(str, obj);
+				break;
 			case T_SubqueryScan:
 				_outSubqueryScan(str, obj);
 				break;
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index efb52858c8..4a6c348162 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -374,6 +374,7 @@ RelOptInfo      - a relation or joined relations
   IndexPath     - index scan
   BitmapHeapPath - top of a bitmapped index scan
   TidPath       - scan by CTID
+  TidRangePath  - scan a contiguous range of CTIDs
   SubqueryScanPath - scan a subquery-in-FROM
   ForeignPath   - scan a foreign table, foreign join or foreign upper-relation
   CustomPath    - for custom scan providers
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 380336518f..0b3ef32506 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1283,6 +1283,101 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_tidrangescan
+ *	  Determines and sets the costs of scanning a relation using a range of
+ *	  TIDs for 'path'
+ *
+ * 'baserel' is the relation to be scanned
+ * 'tidrangequals' is the list of TID-checkable range quals
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_tidrangescan(Path *path, PlannerInfo *root,
+				  RelOptInfo *baserel, List *tidrangequals,
+				  ParamPathInfo *param_info)
+{
+	Selectivity selectivity;
+	double		pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	QualCost	tid_qual_cost;
+	double		ntuples;
+	double		nseqpages;
+	double		spc_random_page_cost;
+	double		spc_seq_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (param_info)
+		path->rows = param_info->ppi_rows;
+	else
+		path->rows = baserel->rows;
+
+	/* Count how many tuples and pages we expect to scan */
+	selectivity = clauselist_selectivity(root, tidrangequals, baserel->relid,
+										 JOIN_INNER, NULL);
+	pages = ceil(selectivity * baserel->pages);
+
+	if (pages <= 0.0)
+		pages = 1.0;
+
+	/*
+	 * The first page in a range requires a random seek, but each subsequent
+	 * page is just a normal sequential page read. NOTE: it's desirable for
+	 * Tid Range Scans to cost more than the equivalent Sequential Scans,
+	 * because Seq Scans have some performance advantages such as scan
+	 * synchronization and parallelizability, and we'd prefer one of them to
+	 * be picked unless a Tid Range Scan really is better.
+	 */
+	ntuples = selectivity * baserel->tuples;
+	nseqpages = pages - 1.0;
+
+	if (!enable_tidscan)
+		startup_cost += disable_cost;
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrieved tuple.
+	 */
+	cost_qual_eval(&tid_qual_cost, tidrangequals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/* disk costs; 1 random page and the remainder as seq pages */
+	run_cost += spc_random_page_cost + spc_seq_page_cost * nseqpages;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+	/*
+	 * XXX currently we assume TID quals are a subset of qpquals at this
+	 * point; they will be removed (if possible) when we create the plan, so
+	 * we subtract their cost from the total qpqual cost.  (If the TID quals
+	 * can't be removed, this is a mistake and we're going to underestimate
+	 * the CPU cost a bit.)
+	 */
+	startup_cost += qpqual_cost.startup + tid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		tid_qual_cost.per_tuple;
+	run_cost += cpu_per_tuple * ntuples;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->pathtarget->cost.startup;
+	run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+	path->startup_cost = startup_cost;
+	path->total_cost = startup_cost + run_cost;
+}
+
 /*
  * cost_subqueryscan
  *	  Determines and returns the cost of scanning a subquery RTE.
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 8ef0406057..9642e96f7a 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -2,9 +2,9 @@
  *
  * tidpath.c
  *	  Routines to determine which TID conditions are usable for scanning
- *	  a given relation, and create TidPaths accordingly.
+ *	  a given relation, and create TidPaths and TidRangePaths accordingly.
  *
- * What we are looking for here is WHERE conditions of the form
+ * For TidPaths, we look for WHERE conditions of the form
  * "CTID = pseudoconstant", which can be implemented by just fetching
  * the tuple directly via heap_fetch().  We can also handle OR'd conditions
  * such as (CTID = const1) OR (CTID = const2), as well as ScalarArrayOpExpr
@@ -23,6 +23,9 @@
  * a function, but in practice it works better to keep the special node
  * representation all the way through to execution.
  *
+ * Additionally, TidRangePaths may be created for conditions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=, and
+ * AND-clauses composed of such conditions.
  *
  * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -63,14 +66,14 @@ IsCTIDVar(Var *var, RelOptInfo *rel)
 
 /*
  * Check to see if a RestrictInfo is of the form
- *		CTID = pseudoconstant
+ *		CTID OP pseudoconstant
  * or
- *		pseudoconstant = CTID
- * where the CTID Var belongs to relation "rel", and nothing on the
- * other side of the clause does.
+ *		pseudoconstant OP CTID
+ * where OP is a binary operation, the CTID Var belongs to relation "rel",
+ * and nothing on the other side of the clause does.
  */
 static bool
-IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+IsBinaryTidClause(RestrictInfo *rinfo, RelOptInfo *rel)
 {
 	OpExpr	   *node;
 	Node	   *arg1,
@@ -83,10 +86,9 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 		return false;
 	node = (OpExpr *) rinfo->clause;
 
-	/* Operator must be tideq */
-	if (node->opno != TIDEqualOperator)
+	/* OpExpr must have two arguments */
+	if (list_length(node->args) != 2)
 		return false;
-	Assert(list_length(node->args) == 2);
 	arg1 = linitial(node->args);
 	arg2 = lsecond(node->args);
 
@@ -116,6 +118,50 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 	return true;				/* success */
 }
 
+/*
+ * Check to see if a RestrictInfo is of the form
+ *		CTID = pseudoconstant
+ * or
+ *		pseudoconstant = CTID
+ * where the CTID Var belongs to relation "rel", and nothing on the
+ * other side of the clause does.
+ */
+static bool
+IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	if (!IsBinaryTidClause(rinfo, rel))
+		return false;
+
+	if (((OpExpr *) rinfo->clause)->opno == TIDEqualOperator)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check to see if a RestrictInfo is of the form
+ *		CTID OP pseudoconstant
+ * or
+ *		pseudoconstant OP CTID
+ * where OP is a range operator such as <, <=, >, or >=, the CTID Var belongs
+ * to relation "rel", and nothing on the other side of the clause does.
+ */
+static bool
+IsTidRangeClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	Oid			opno;
+
+	if (!IsBinaryTidClause(rinfo, rel))
+		return false;
+	opno = ((OpExpr *) rinfo->clause)->opno;
+
+	if (opno == TIDLessOperator || opno == TIDLessEqOperator ||
+		opno == TIDGreaterOperator || opno == TIDGreaterEqOperator)
+		return true;
+
+	return false;
+}
+
 /*
  * Check to see if a RestrictInfo is of the form
  *		CTID = ANY (pseudoconstant_array)
@@ -222,7 +268,7 @@ TidQualFromRestrictInfo(RestrictInfo *rinfo, RelOptInfo *rel)
  *
  * Returns a List of CTID qual RestrictInfos for the specified rel (with
  * implicit OR semantics across the list), or NIL if there are no usable
- * conditions.
+ * equality conditions.
  *
  * This function is just concerned with handling AND/OR recursion.
  */
@@ -301,6 +347,34 @@ TidQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
 	return rlst;
 }
 
+/*
+ * Extract a set of CTID range conditions from implicit-AND List of RestrictInfos
+ *
+ * Returns a List of CTID range qual RestrictInfos for the specified rel
+ * (with implicit AND semantics across the list), or NIL if there are no
+ * usable range conditions or if the rel's table AM does not support TID range
+ * scans.
+ */
+static List *
+TidRangeQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
+{
+	List	   *rlst = NIL;
+	ListCell   *l;
+
+	if ((rel->amflags & AMFLAG_HAS_TID_RANGE) == 0)
+		return NIL;
+
+	foreach(l, rlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (IsTidRangeClause(rinfo, rel))
+			rlst = lappend(rlst, rinfo);
+	}
+
+	return rlst;
+}
+
 /*
  * Given a list of join clauses involving our rel, create a parameterized
  * TidPath for each one that is a suitable TidEqual clause.
@@ -385,6 +459,7 @@ void
 create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 {
 	List	   *tidquals;
+	List	   *tidrangequals;
 
 	/*
 	 * If any suitable quals exist in the rel's baserestrict list, generate a
@@ -404,6 +479,26 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 												   required_outer));
 	}
 
+	/*
+	 * If there are range quals in the baserestrict list, generate a
+	 * TidRangePath.
+	 */
+	tidrangequals = TidRangeQualFromRestrictInfoList(rel->baserestrictinfo,
+													 rel);
+
+	if (tidrangequals)
+	{
+		/*
+		 * This path uses no join clauses, but it could still have required
+		 * parameterization due to LATERAL refs in its tlist.
+		 */
+		Relids		required_outer = rel->lateral_relids;
+
+		add_path(rel, (Path *) create_tidrangescan_path(root, rel,
+														tidrangequals,
+														required_outer));
+	}
+
 	/*
 	 * Try to generate parameterized TidPaths using equality clauses extracted
 	 * from EquivalenceClasses.  (This is important since simple "t1.ctid =
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 25d4750ca6..c5653221b7 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -129,6 +129,10 @@ static Plan *create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
 static void bitmap_subplan_mark_shared(Plan *plan);
 static TidScan *create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 									List *tlist, List *scan_clauses);
+static TidRangeScan *create_tidrangescan_plan(PlannerInfo *root,
+											  TidRangePath *best_path,
+											  List *tlist,
+											  List *scan_clauses);
 static SubqueryScan *create_subqueryscan_plan(PlannerInfo *root,
 											  SubqueryScanPath *best_path,
 											  List *tlist, List *scan_clauses);
@@ -193,6 +197,8 @@ static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 											Index scanrelid);
 static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
 							 List *tidquals);
+static TidRangeScan *make_tidrangescan(List *qptlist, List *qpqual,
+									   Index scanrelid, List *tidrangequals);
 static SubqueryScan *make_subqueryscan(List *qptlist,
 									   List *qpqual,
 									   Index scanrelid,
@@ -384,6 +390,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -679,6 +686,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												scan_clauses);
 			break;
 
+		case T_TidRangeScan:
+			plan = (Plan *) create_tidrangescan_plan(root,
+													 (TidRangePath *) best_path,
+													 tlist,
+													 scan_clauses);
+			break;
+
 		case T_SubqueryScan:
 			plan = (Plan *) create_subqueryscan_plan(root,
 													 (SubqueryScanPath *) best_path,
@@ -3440,6 +3454,71 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_tidrangescan_plan
+ *	 Returns a tidrangescan plan for the base relation scanned by 'best_path'
+ *	 with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static TidRangeScan *
+create_tidrangescan_plan(PlannerInfo *root, TidRangePath *best_path,
+						 List *tlist, List *scan_clauses)
+{
+	TidRangeScan *scan_plan;
+	Index		scan_relid = best_path->path.parent->relid;
+	List	   *tidrangequals = best_path->tidrangequals;
+
+	/* it should be a base rel... */
+	Assert(scan_relid > 0);
+	Assert(best_path->path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * The qpqual list must contain all restrictions not enforced by the
+	 * tidrangequals list.  tidrangequals has AND semantics, so we can simply
+	 * remove any qual that appears in it.
+	 */
+	{
+		List	   *qpqual = NIL;
+		ListCell   *l;
+
+		foreach(l, scan_clauses)
+		{
+			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+			if (rinfo->pseudoconstant)
+				continue;		/* we may drop pseudoconstants here */
+			if (list_member_ptr(tidrangequals, rinfo))
+				continue;		/* simple duplicate */
+			qpqual = lappend(qpqual, rinfo);
+		}
+		scan_clauses = qpqual;
+	}
+
+	/* Sort clauses into best execution order */
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+
+	/* Reduce RestrictInfo lists to bare expressions; ignore pseudoconstants */
+	tidrangequals = extract_actual_clauses(tidrangequals, false);
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/* Replace any outer-relation variables with nestloop params */
+	if (best_path->path.param_info)
+	{
+		tidrangequals = (List *)
+			replace_nestloop_params(root, (Node *) tidrangequals);
+		scan_clauses = (List *)
+			replace_nestloop_params(root, (Node *) scan_clauses);
+	}
+
+	scan_plan = make_tidrangescan(tlist,
+								  scan_clauses,
+								  scan_relid,
+								  tidrangequals);
+
+	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
+
+	return scan_plan;
+}
+
 /*
  * create_subqueryscan_plan
  *	 Returns a subqueryscan plan for the base relation scanned by 'best_path'
@@ -5373,6 +5452,25 @@ make_tidscan(List *qptlist,
 	return node;
 }
 
+static TidRangeScan *
+make_tidrangescan(List *qptlist,
+				  List *qpqual,
+				  Index scanrelid,
+				  List *tidrangequals)
+{
+	TidRangeScan *node = makeNode(TidRangeScan);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->tidrangequals = tidrangequals;
+
+	return node;
+}
+
 static SubqueryScan *
 make_subqueryscan(List *qptlist,
 				  List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index c3c36be13e..42f088ad71 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -619,6 +619,22 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 								  rtoffset, 1);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				TidRangeScan *splan = (TidRangeScan *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist,
+								  rtoffset, NUM_EXEC_TLIST(plan));
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+				splan->tidrangequals =
+					fix_scan_list(root, splan->tidrangequals,
+								  rtoffset, 1);
+			}
+			break;
 		case T_SubqueryScan:
 			/* Needs special treatment, see comments below */
 			return set_subqueryscan_references(root,
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 6d4cc1bcce..0c8d8318cd 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2367,6 +2367,12 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_TidRangeScan:
+			finalize_primnode((Node *) ((TidRangeScan *) plan)->tidrangequals,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			break;
+
 		case T_SubqueryScan:
 			{
 				SubqueryScan *sscan = (SubqueryScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d465b9e213..8a552812f5 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1203,6 +1203,35 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 	return pathnode;
 }
 
+/*
+ * create_tidscan_path
+ *	  Creates a path corresponding to a scan by a range of TIDs, returning
+ *	  the pathnode.
+ */
+TidRangePath *
+create_tidrangescan_path(PlannerInfo *root, RelOptInfo *rel,
+						 List *tidrangequals, Relids required_outer)
+{
+	TidRangePath *pathnode = makeNode(TidRangePath);
+
+	pathnode->path.pathtype = T_TidRangeScan;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_workers = 0;
+	pathnode->path.pathkeys = NIL;	/* always unordered */
+
+	pathnode->tidrangequals = tidrangequals;
+
+	cost_tidrangescan(&pathnode->path, root, rel, tidrangequals,
+					  pathnode->path.param_info);
+
+	return pathnode;
+}
+
 /*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index da322b453e..3f0d25fba8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -466,6 +466,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	/* Collect info about relation's foreign keys, if relevant */
 	get_relation_foreign_keys(root, rel, relation, inhparent);
 
+	/* Collect info about functions implemented by the rel's table AM. */
+	if (relation->rd_tableam && relation->rd_tableam->scan_getnextslot_inrange != NULL)
+		rel->amflags |= AMFLAG_HAS_TID_RANGE;
+
 	/*
 	 * Collect info about relation's partitioning scheme, if any. Only
 	 * inheritance parents may be partitioned.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 731ff708b9..345c877aeb 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -234,6 +234,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->subroot = NULL;
 	rel->subplan_params = NIL;
 	rel->rel_parallel_workers = -1; /* set up in get_relation_info */
+	rel->amflags = 0;
 	rel->serverid = InvalidOid;
 	rel->userid = rte->checkAsUser;
 	rel->useridiscurrent = false;
@@ -646,6 +647,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->subroot = NULL;
 	joinrel->subplan_params = NIL;
 	joinrel->rel_parallel_workers = -1;
+	joinrel->amflags = 0;
 	joinrel->serverid = InvalidOid;
 	joinrel->userid = InvalidOid;
 	joinrel->useridiscurrent = false;
@@ -826,6 +828,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	joinrel->eclass_indexes = NULL;
 	joinrel->subroot = NULL;
 	joinrel->subplan_params = NIL;
+	joinrel->amflags = 0;
 	joinrel->serverid = InvalidOid;
 	joinrel->userid = InvalidOid;
 	joinrel->useridiscurrent = false;
diff --git a/src/backend/storage/page/itemptr.c b/src/backend/storage/page/itemptr.c
index 55759c383b..c7aec7dbd9 100644
--- a/src/backend/storage/page/itemptr.c
+++ b/src/backend/storage/page/itemptr.c
@@ -71,3 +71,61 @@ ItemPointerCompare(ItemPointer arg1, ItemPointer arg2)
 	else
 		return 0;
 }
+
+/*
+ * ItemPointerInc
+ *		Increment 'pointer' by 1 only paying attention to the ItemPointer's
+ *		type's range limits and not MaxOffsetNumber and FirstOffsetNumber.
+ *		This may result in 'pointer' becoming !OffsetNumberIsValid.
+ *
+ * If the pointer is already the maximum possible values permitted by the
+ * range of the ItemPointer's types, then do nothing.
+ */
+void
+ItemPointerInc(ItemPointer pointer)
+{
+	BlockNumber blk = ItemPointerGetBlockNumberNoCheck(pointer);
+	OffsetNumber off = ItemPointerGetOffsetNumberNoCheck(pointer);
+
+	if (off == PG_UINT16_MAX)
+	{
+		if (blk != InvalidBlockNumber)
+		{
+			off = 0;
+			blk++;
+		}
+	}
+	else
+		off++;
+
+	ItemPointerSet(pointer, blk, off);
+}
+
+/*
+ * ItemPointerDec
+ *		Decrement 'pointer' by 1 only paying attention to the ItemPointer's
+ *		type's range limits and not MaxOffsetNumber and FirstOffsetNumber.
+ *		This may result in 'pointer' becoming !OffsetNumberIsValid.
+ *
+ * If the pointer is already the minimum possible values permitted by the
+ * range of the ItemPointer's types, then do nothing.
+ */
+void
+ItemPointerDec(ItemPointer pointer)
+{
+	BlockNumber blk = ItemPointerGetBlockNumberNoCheck(pointer);
+	OffsetNumber off = ItemPointerGetOffsetNumberNoCheck(pointer);
+
+	if (off == 0)
+	{
+		if (blk != 0)
+		{
+			off = PG_UINT16_MAX;
+			blk--;
+		}
+	}
+	else
+		off--;
+
+	ItemPointerSet(pointer, blk, off);
+}
\ No newline at end of file
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index d96a47b1ce..8f21fb4ba9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -121,6 +121,10 @@ extern void heap_endscan(TableScanDesc scan);
 extern HeapTuple heap_getnext(TableScanDesc scan, ScanDirection direction);
 extern bool heap_getnextslot(TableScanDesc sscan,
 							 ScanDirection direction, struct TupleTableSlot *slot);
+extern bool heap_getnextslot_inrange(TableScanDesc sscan,
+									 ScanDirection direction,
+									 TupleTableSlot *slot, ItemPointer mintid,
+									 ItemPointer maxtid);
 
 extern bool heap_fetch(Relation relation, Snapshot snapshot,
 					   HeapTuple tuple, Buffer *userbuf);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 33bffb6815..d1c608b176 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -325,6 +325,26 @@ typedef struct TableAmRoutine
 									 ScanDirection direction,
 									 TupleTableSlot *slot);
 
+	/*
+	 * Return next tuple from `scan` where TID is within the defined range.
+	 * This behaves like scan_getnextslot but only returns tuples from the
+	 * given range of TIDs.  Ranges are inclusive.  This function is optional
+	 * and may be set to NULL if TID range scans are not supported by the AM.
+	 *
+	 * Implementations of this function must themselves handle ItemPointers
+	 * of any value. i.e, they must handle each of the following:
+	 *
+	 * 1) mintid or maxtid is beyond the end of the table; and
+	 * 2) mintid is above maxtid; and
+	 * 3) item offset for mintid or maxtid is beyond the maximum offset
+	 * allowed by the AM.
+	 */
+	bool		(*scan_getnextslot_inrange) (TableScanDesc scan,
+											 ScanDirection direction,
+											 TupleTableSlot *slot,
+											 ItemPointer mintid,
+											 ItemPointer maxtid);
+
 
 	/* ------------------------------------------------------------------------
 	 * Parallel table scan related functions.
@@ -1015,6 +1035,36 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 	return sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, slot);
 }
 
+/*
+ * Return next tuple from defined TID range from `scan` and store in slot.
+ */
+static inline bool
+table_scan_getnextslot_inrange(TableScanDesc sscan, ScanDirection direction,
+							   TupleTableSlot *slot, ItemPointer mintid,
+							   ItemPointer maxtid)
+{
+	/*
+	 * The planner should never make a plan which uses this function when the
+	 * table AM has not defined any function for this callback.
+	 */
+	Assert(sscan->rs_rd->rd_tableam->scan_getnextslot_inrange != NULL);
+
+	slot->tts_tableOid = RelationGetRelid(sscan->rs_rd);
+
+	/*
+	 * We don't expect direct calls to table_scan_getnextslot_inrange with
+	 * valid CheckXidAlive for catalog or regular tables.  See detailed
+	 * comments in xact.c where these variables are declared.
+	 */
+	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
+		elog(ERROR, "unexpected table_scan_getnextslot_inrange call during logical decoding");
+
+	return sscan->rs_rd->rd_tableam->scan_getnextslot_inrange(sscan,
+															  direction,
+															  slot,
+															  mintid,
+															  maxtid);
+}
 
 /* ----------------------------------------------------------------------------
  * Parallel table scan related functions.
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 0d4eac8f96..85395a81ee 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -237,15 +237,15 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git a/src/include/executor/nodeTidrangescan.h b/src/include/executor/nodeTidrangescan.h
new file mode 100644
index 0000000000..e53783a3bf
--- /dev/null
+++ b/src/include/executor/nodeTidrangescan.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeTidrangescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODETIDRANGESCAN_H
+#define NODETIDRANGESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern TidRangeScanState *ExecInitTidRangeScan(TidRangeScan *node,
+											   EState *estate, int eflags);
+extern void ExecEndTidRangeScan(TidRangeScanState *node);
+extern void ExecReScanTidRangeScan(TidRangeScanState *node);
+
+#endif							/* NODETIDRANGESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d65099c94a..dba1cea745 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1617,6 +1617,24 @@ typedef struct TidScanState
 	HeapTupleData tss_htup;
 } TidScanState;
 
+/* ----------------
+ *	 TidRangeScanState information
+ *
+ *		trss_tidexprs		list of TidOpExpr structs (see nodeTidrangescan.c)
+ *		trss_mintid			the lowest TID in the scan range
+ *		trss_maxtid			the highest TID in the scan range
+ *		trss_inScan			is a scan currently in progress?
+ * ----------------
+ */
+typedef struct TidRangeScanState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	List	   *trss_tidexprs;
+	ItemPointerData trss_mintid;
+	ItemPointerData trss_maxtid;
+	bool		trss_inScan;
+} TidRangeScanState;
+
 /* ----------------
  *	 SubqueryScanState information
  *
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index caed683ba9..3016836ede 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -59,6 +59,7 @@ typedef enum NodeTag
 	T_BitmapIndexScan,
 	T_BitmapHeapScan,
 	T_TidScan,
+	T_TidRangeScan,
 	T_SubqueryScan,
 	T_FunctionScan,
 	T_ValuesScan,
@@ -116,6 +117,7 @@ typedef enum NodeTag
 	T_BitmapIndexScanState,
 	T_BitmapHeapScanState,
 	T_TidScanState,
+	T_TidRangeScanState,
 	T_SubqueryScanState,
 	T_FunctionScanState,
 	T_TableFuncScanState,
@@ -229,6 +231,7 @@ typedef enum NodeTag
 	T_BitmapAndPath,
 	T_BitmapOrPath,
 	T_TidPath,
+	T_TidRangePath,
 	T_SubqueryScanPath,
 	T_ForeignPath,
 	T_CustomPath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index cde2637798..5f93364116 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -621,6 +621,10 @@ typedef struct PartitionSchemeData *PartitionScheme;
  * to simplify matching join clauses to those lists.
  *----------
  */
+
+/* Bitmask of flags supported by table AMs */
+#define AMFLAG_HAS_TID_RANGE (1 << 0)
+
 typedef enum RelOptKind
 {
 	RELOPT_BASEREL,
@@ -710,6 +714,8 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	int			amflags;		/* Bitmask of optional features supported by
+								 * the table AM */
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
@@ -1323,6 +1329,18 @@ typedef struct TidPath
 	List	   *tidquals;		/* qual(s) involving CTID = something */
 } TidPath;
 
+/*
+ * TidRangePath represents a scan by a continguous range of TIDs
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ */
+typedef struct TidRangePath
+{
+	Path		path;
+	List	   *tidrangequals;
+} TidRangePath;
+
 /*
  * SubqueryScanPath represents a scan of an unflattened subquery-in-FROM
  *
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 43160439f0..6e62104d0b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -485,6 +485,19 @@ typedef struct TidScan
 	List	   *tidquals;		/* qual(s) involving CTID = something */
 } TidScan;
 
+/* ----------------
+ *		tid range scan node
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ * ----------------
+ */
+typedef struct TidRangeScan
+{
+	Scan		scan;
+	List	   *tidrangequals;	/* qual(s) involving CTID op something */
+} TidRangeScan;
+
 /* ----------------
  *		subquery scan node
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ed2e4af4be..1be93be098 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -83,6 +83,9 @@ extern void cost_bitmap_or_node(BitmapOrPath *path, PlannerInfo *root);
 extern void cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec);
 extern void cost_tidscan(Path *path, PlannerInfo *root,
 						 RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info);
+extern void cost_tidrangescan(Path *path, PlannerInfo *root,
+							  RelOptInfo *baserel, List *tidrangequals,
+							  ParamPathInfo *param_info);
 extern void cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
 							  RelOptInfo *baserel, ParamPathInfo *param_info);
 extern void cost_functionscan(Path *path, PlannerInfo *root,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 23dec14cbd..22c6d4c4fd 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,6 +63,10 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 										   List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 									List *tidquals, Relids required_outer);
+extern TidRangePath *create_tidrangescan_path(PlannerInfo *root,
+											  RelOptInfo *rel,
+											  List *tidrangequals,
+											  Relids required_outer);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 									  List *subpaths, List *partial_subpaths,
 									  List *pathkeys, Relids required_outer,
diff --git a/src/include/storage/itemptr.h b/src/include/storage/itemptr.h
index 0e6990140b..cd4b8fbacb 100644
--- a/src/include/storage/itemptr.h
+++ b/src/include/storage/itemptr.h
@@ -202,5 +202,7 @@ typedef ItemPointerData *ItemPointer;
 
 extern bool ItemPointerEquals(ItemPointer pointer1, ItemPointer pointer2);
 extern int32 ItemPointerCompare(ItemPointer arg1, ItemPointer arg2);
+extern void ItemPointerInc(ItemPointer pointer);
+extern void ItemPointerDec(ItemPointer pointer);
 
 #endif							/* ITEMPTR_H */
diff --git a/src/test/regress/expected/tidrangescan.out b/src/test/regress/expected/tidrangescan.out
new file mode 100644
index 0000000000..0384304c7f
--- /dev/null
+++ b/src/test/regress/expected/tidrangescan.out
@@ -0,0 +1,302 @@
+-- tests for tidrangescans
+SET enable_seqscan TO off;
+CREATE TABLE tidrangescan(id integer, data text);
+-- insert enough tuples to fill at least two pages
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,200) AS s(i);
+-- remove all tuples after the 10th tuple on each page.  Trying to ensure
+-- we get the same layout with all CPU architectures and smaller than standard
+-- page sizes.
+DELETE FROM tidrangescan
+WHERE substring(ctid::text FROM ',(\d+)\)')::integer > 10 OR substring(ctid::text FROM '\((\d+),')::integer > 2;
+VACUUM tidrangescan;
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid <= '(1,5)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+ (1,1)
+ (1,2)
+ (1,3)
+ (1,4)
+ (1,5)
+(15 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(0,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid > '(2,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+  ctid  
+--------
+ (2,9)
+ (2,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ('(2,8)'::tid < ctid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+  ctid  
+--------
+ (2,9)
+ (2,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(2,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+  ctid  
+--------
+ (2,8)
+ (2,9)
+ (2,10)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(100,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ((ctid > '(1,4)'::tid) AND ('(1,7)'::tid >= ctid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+ ctid  
+-------
+ (1,5)
+ (1,6)
+ (1,7)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (('(1,7)'::tid >= ctid) AND (ctid > '(1,4)'::tid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+ ctid  
+-------
+ (1,5)
+ (1,6)
+ (1,7)
+(3 rows)
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan WHERE ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(4294967295,65535)';
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid 
+------
+(0 rows)
+
+-- NULLs in the range cannot return tuples
+SELECT ctid FROM tidrangescan WHERE ctid >= (SELECT NULL::tid);
+ ctid 
+------
+(0 rows)
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+ ctid 
+------
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid > '(9,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+ ctid 
+------
+(0 rows)
+
+-- rescans
+EXPLAIN (COSTS OFF)
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+                  QUERY PLAN                   
+-----------------------------------------------
+ Nested Loop
+   ->  Tid Range Scan on tidrangescan t
+         TID Cond: (ctid < '(1,0)'::tid)
+   ->  Aggregate
+         ->  Tid Range Scan on tidrangescan t2
+               TID Cond: (ctid <= t.ctid)
+(6 rows)
+
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+  ctid  | c  
+--------+----
+ (0,1)  |  1
+ (0,2)  |  2
+ (0,3)  |  3
+ (0,4)  |  4
+ (0,5)  |  5
+ (0,6)  |  6
+ (0,7)  |  7
+ (0,8)  |  8
+ (0,9)  |  9
+ (0,10) | 10
+(10 rows)
+
+-- cursors
+-- Ensure we get a TID Range scan without a Materialize node.
+EXPLAIN (COSTS OFF)
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH NEXT c;
+ ctid  
+-------
+ (0,2)
+(1 row)
+
+FETCH PRIOR c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH FIRST c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH LAST c;
+  ctid  
+--------
+ (0,10)
+(1 row)
+
+COMMIT;
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
+RESET enable_seqscan;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index e0e1ef71dd..2b9763a869 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -80,7 +80,7 @@ test: brin gin gist spgist privileges init_privs security_label collate matview
 # ----------
 # Another group of parallel tests
 # ----------
-test: create_table_like alter_generic alter_operator misc async dbsize misc_functions sysviews tsrf tid tidscan collate.icu.utf8 incremental_sort
+test: create_table_like alter_generic alter_operator misc async dbsize misc_functions sysviews tsrf tid tidscan tidrangescan collate.icu.utf8 incremental_sort
 
 # rules cannot run concurrently with any test that creates
 # a view or rule in the public schema
diff --git a/src/test/regress/sql/tidrangescan.sql b/src/test/regress/sql/tidrangescan.sql
new file mode 100644
index 0000000000..2da35807ff
--- /dev/null
+++ b/src/test/regress/sql/tidrangescan.sql
@@ -0,0 +1,104 @@
+-- tests for tidrangescans
+
+SET enable_seqscan TO off;
+CREATE TABLE tidrangescan(id integer, data text);
+
+-- insert enough tuples to fill at least two pages
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,200) AS s(i);
+
+-- remove all tuples after the 10th tuple on each page.  Trying to ensure
+-- we get the same layout with all CPU architectures and smaller than standard
+-- page sizes.
+DELETE FROM tidrangescan
+WHERE substring(ctid::text FROM ',(\d+)\)')::integer > 10 OR substring(ctid::text FROM '\((\d+),')::integer > 2;
+VACUUM tidrangescan;
+
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan WHERE ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)' LIMIT 1;
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(4294967295,65535)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- NULLs in the range cannot return tuples
+SELECT ctid FROM tidrangescan WHERE ctid >= (SELECT NULL::tid);
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+
+-- rescans
+EXPLAIN (COSTS OFF)
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+
+-- cursors
+
+-- Ensure we get a TID Range scan without a Materialize node.
+EXPLAIN (COSTS OFF)
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+FETCH NEXT c;
+FETCH PRIOR c;
+FETCH FIRST c;
+FETCH LAST c;
+COMMIT;
+
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
+
+RESET enable_seqscan;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 943142ced8..5794498b46 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2529,8 +2529,13 @@ TextPositionState
 TheLexeme
 TheSubstitute
 TidExpr
+TidExprType
 TidHashKey
+TidOpExpr
 TidPath
+TidRangePath
+TidRangeScan
+TidRangeScanState
 TidScan
 TidScanState
 TimeADT
-- 
2.27.0

#82

David Rowley

dgrowleyml@gmail.com

almost 5 years ago

In reply to: David Rowley (#81)

1 attachment(s)

Re: Tid scan improvements

On Thu, 21 Jan 2021 at 18:16, David Rowley <dgrowleyml@gmail.com> wrote:

I've implemented this in the attached.

The bug fix in 0001 is now committed, so I'm just attaching the 0002
patch again after having rebased... This is mostly just to keep the
CFbot happy.

David

Attachments:

v12-0001-Add-TID-Range-Scans-to-support-efficient-scannin.patchtext/plain; charset=US-ASCII; name=v12-0001-Add-TID-Range-Scans-to-support-efficient-scannin.patchDownload

From e459b522d0599602188fcb1cc9ee6062ac8a4aee Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 21 Jan 2021 16:48:15 +1300
Subject: [PATCH v12] Add TID Range Scans to support efficient scanning ranges
 of TIDs

This adds a new node type named TID Range Scan.  The query planner will
generate paths for TID Range scans when quals are discovered on base
relations which search for ranges of ctid.  These ranges may be open at
either end.

To support this, a new optional callback function has been added to table
AM which is named scan_getnextslot_inrange.  This function accepts a
minimum and maximum ItemPointer to allow efficient retrieval of tuples
within this range.  Table AMs where scanning ranges of TIDs does not make
sense or is difficult to implement efficiently may choose to not implement
this function.

Author: Edmund Horner and David Rowley
Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com
---
 src/backend/access/heap/heapam.c           | 132 +++++++
 src/backend/access/heap/heapam_handler.c   |   1 +
 src/backend/commands/explain.c             |  23 ++
 src/backend/executor/Makefile              |   1 +
 src/backend/executor/execAmi.c             |   6 +
 src/backend/executor/execProcnode.c        |  10 +
 src/backend/executor/nodeTidrangescan.c    | 409 +++++++++++++++++++++
 src/backend/nodes/copyfuncs.c              |  24 ++
 src/backend/nodes/outfuncs.c               |  13 +
 src/backend/optimizer/README               |   1 +
 src/backend/optimizer/path/costsize.c      |  95 +++++
 src/backend/optimizer/path/tidpath.c       | 117 +++++-
 src/backend/optimizer/plan/createplan.c    |  98 +++++
 src/backend/optimizer/plan/setrefs.c       |  16 +
 src/backend/optimizer/plan/subselect.c     |   6 +
 src/backend/optimizer/util/pathnode.c      |  29 ++
 src/backend/optimizer/util/plancat.c       |   4 +
 src/backend/optimizer/util/relnode.c       |   3 +
 src/backend/storage/page/itemptr.c         |  58 +++
 src/include/access/heapam.h                |   4 +
 src/include/access/tableam.h               |  50 +++
 src/include/catalog/pg_operator.dat        |   6 +-
 src/include/executor/nodeTidrangescan.h    |  23 ++
 src/include/nodes/execnodes.h              |  18 +
 src/include/nodes/nodes.h                  |   3 +
 src/include/nodes/pathnodes.h              |  18 +
 src/include/nodes/plannodes.h              |  13 +
 src/include/optimizer/cost.h               |   3 +
 src/include/optimizer/pathnode.h           |   4 +
 src/include/storage/itemptr.h              |   2 +
 src/test/regress/expected/tidrangescan.out | 302 +++++++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/sql/tidrangescan.sql      | 104 ++++++
 src/tools/pgindent/typedefs.list           |   5 +
 34 files changed, 1588 insertions(+), 15 deletions(-)
 create mode 100644 src/backend/executor/nodeTidrangescan.c
 create mode 100644 src/include/executor/nodeTidrangescan.h
 create mode 100644 src/test/regress/expected/tidrangescan.out
 create mode 100644 src/test/regress/sql/tidrangescan.sql

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9926e2bd54..164a044715 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1391,6 +1391,138 @@ heap_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *s
 	return true;
 }
 
+bool
+heap_getnextslot_inrange(TableScanDesc sscan, ScanDirection direction,
+						 TupleTableSlot *slot, ItemPointer mintid,
+						 ItemPointer maxtid)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+
+	if (!scan->rs_inited)
+	{
+		BlockNumber startBlk;
+		BlockNumber numBlks;
+		ItemPointerData highestItem;
+		ItemPointerData lowestItem;
+
+		/* A relation with zero blocks won't have any tuples */
+		if (scan->rs_nblocks == 0)
+			return false;
+
+		/*
+		 * Set up some ItemPointers which point to the first and last possible
+		 * tuples in the heap.
+		 */
+		ItemPointerSet(&highestItem, scan->rs_nblocks - 1, MaxOffsetNumber);
+		ItemPointerSet(&lowestItem, 0, FirstOffsetNumber);
+
+		/*
+		 * If the given maximum TID is below the highest possible TID in the
+		 * relation, then restrict the range to that, otherwise we scan to the
+		 * end of the relation.
+		 */
+		if (ItemPointerCompare(maxtid, &highestItem) < 0)
+			ItemPointerCopy(maxtid, &highestItem);
+
+		/*
+		 * If the given minimum TID is above the lowest possible TID in the
+		 * relation, then restrict the range to only scan for TIDs above that.
+		 */
+		if (ItemPointerCompare(mintid, &lowestItem) > 0)
+			ItemPointerCopy(mintid, &lowestItem);
+
+		/*
+		 * Check for an empty range and protect from would be negative results
+		 * from the numBlks to scan calculation below.
+		 */
+		if (ItemPointerCompare(&highestItem, &lowestItem) < 0)
+			return false;
+
+		/*
+		 * Calculate the first block and the number of blocks we must scan.
+		 * We could be more aggressive here and perform some more validation
+		 * to try and further narrow the scope of blocks to scan by checking
+		 * if the lowerItem has an offset above MaxOffsetNumber.  In this
+		 * case, we could advance startBlk by one.  Likewise if highestItem
+		 * has an offset of 0 we could scan one fewer blocks.  However, such
+		 * an optimization does not seem worth troubling over, currently.
+		 */
+		startBlk = ItemPointerGetBlockNumberNoCheck(&lowestItem);
+
+		numBlks = ItemPointerGetBlockNumberNoCheck(&highestItem) -
+				  ItemPointerGetBlockNumberNoCheck(&lowestItem) + 1;
+
+		/* Set the start block and number of blocks to scan */
+		heap_setscanlimits(sscan, startBlk, numBlks);
+	}
+
+	/* Note: no locking manipulations needed */
+	for (;;)
+	{
+
+		if (sscan->rs_flags & SO_ALLOW_PAGEMODE)
+			heapgettup_pagemode(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+		else
+			heapgettup(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+
+		if (scan->rs_ctup.t_data == NULL)
+		{
+			ExecClearTuple(slot);
+			return false;
+		}
+
+		/*
+		 * We've used heap_setscanlimits above so we only look at pages that
+		 * are likely to contain tuples we're interested in.  We must still
+		 * filter out tuples in the first page that are less than mintid.
+		 */
+		if (ItemPointerCompare(&scan->rs_ctup.t_self, mintid) < 0)
+		{
+			ExecClearTuple(slot);
+
+			/*
+			 * When scanning backwards, the TIDs will be in descending order.
+			 * Future tuples in this direction will be lower still, so we can
+			 * just return false to indicate there will be no more tuples.
+			 */
+			if (ScanDirectionIsBackward(direction))
+				return false;
+
+			continue;
+		}
+
+		/*
+		 * Likewise for the final page, we must filter out tids greater than
+		 * maxtid.
+		 */
+		if (ItemPointerCompare(&scan->rs_ctup.t_self, maxtid) > 0)
+		{
+			ExecClearTuple(slot);
+
+			/*
+			 * When scanning forward, the TIDs will be in ascending order.
+			 * Future tuples in this direction will be higher still, so we can
+			 * just return false to indicate there will be no more tuples.
+			 */
+			if (ScanDirectionIsForward(direction))
+				return false;
+			continue;
+		}
+
+		break;
+	}
+
+	/*
+	 * if we get here it means we have a new current scan tuple, so point to
+	 * the proper return buffer and return the tuple.
+	 */
+
+	pgstat_count_heap_getnext(scan->rs_base.rs_rd);
+
+	ExecStoreBufferHeapTuple(&scan->rs_ctup, slot, scan->rs_cbuf);
+	return true;
+}
+
 /*
  *	heap_fetch		- retrieve tuple with given tid
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 4a70e20a14..f8bbcaf448 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2541,6 +2541,7 @@ static const TableAmRoutine heapam_methods = {
 	.scan_end = heap_endscan,
 	.scan_rescan = heap_rescan,
 	.scan_getnextslot = heap_getnextslot,
+	.scan_getnextslot_inrange = heap_getnextslot_inrange,
 
 	.parallelscan_estimate = table_block_parallelscan_estimate,
 	.parallelscan_initialize = table_block_parallelscan_initialize,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5d7eb3574c..3f2ebd3b72 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1057,6 +1057,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1223,6 +1224,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_TidScan:
 			pname = sname = "Tid Scan";
 			break;
+		case T_TidRangeScan:
+			pname = sname = "Tid Range Scan";
+			break;
 		case T_SubqueryScan:
 			pname = sname = "Subquery Scan";
 			break;
@@ -1417,6 +1421,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SampleScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1871,6 +1876,23 @@ ExplainNode(PlanState *planstate, List *ancestors,
 											   planstate, es);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				/*
+				 * The tidrangequals list has AND semantics, so be sure to
+				 * show it as an AND condition.
+				 */
+				List	   *tidquals = ((TidRangeScan *) plan)->tidrangequals;
+
+				if (list_length(tidquals) > 1)
+					tidquals = list_make1(make_andclause(tidquals));
+				show_scan_qual(tidquals, "TID Cond", planstate, ancestors, es);
+				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+				if (plan->qual)
+					show_instrumentation_count("Rows Removed by Filter", 1,
+											   planstate, es);
+			}
+			break;
 		case T_ForeignScan:
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
@@ -3558,6 +3580,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..74ac59faa1 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -67,6 +67,7 @@ OBJS = \
 	nodeSubplan.o \
 	nodeSubqueryscan.o \
 	nodeTableFuncscan.o \
+	nodeTidrangescan.o \
 	nodeTidscan.o \
 	nodeUnique.o \
 	nodeValuesscan.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 23bdb53cd1..4543ac79ed 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -51,6 +51,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecReScanTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecReScanSubqueryScan((SubqueryScanState *) node);
 			break;
@@ -562,6 +567,7 @@ ExecSupportsBackwardScan(Plan *node)
 
 		case T_SeqScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_FunctionScan:
 		case T_ValuesScan:
 		case T_CteScan:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 414df50a05..29766d8196 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -109,6 +109,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -238,6 +239,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 												   estate, eflags);
 			break;
 
+		case T_TidRangeScan:
+			result = (PlanState *) ExecInitTidRangeScan((TidRangeScan *) node,
+														estate, eflags);
+			break;
+
 		case T_SubqueryScan:
 			result = (PlanState *) ExecInitSubqueryScan((SubqueryScan *) node,
 														estate, eflags);
@@ -637,6 +643,10 @@ ExecEndNode(PlanState *node)
 			ExecEndTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecEndTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecEndSubqueryScan((SubqueryScanState *) node);
 			break;
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
new file mode 100644
index 0000000000..e2a92754da
--- /dev/null
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -0,0 +1,409 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.c
+ *	  Routines to support tid range scans of relations
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeTidrangescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "access/tableam.h"
+#include "catalog/pg_operator.h"
+#include "executor/execdebug.h"
+#include "executor/nodeTidrangescan.h"
+#include "nodes/nodeFuncs.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+
+
+#define IsCTIDVar(node)  \
+	((node) != NULL && \
+	 IsA((node), Var) && \
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
+	 ((Var *) (node))->varlevelsup == 0)
+
+typedef enum
+{
+	TIDEXPR_UPPER_BOUND,
+	TIDEXPR_LOWER_BOUND
+} TidExprType;
+
+/* Upper or lower range bound for scan */
+typedef struct TidOpExpr
+{
+	TidExprType exprtype;		/* type of op; lower or upper */
+	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
+	bool		inclusive;		/* whether op is inclusive */
+} TidOpExpr;
+
+/*
+ * For the given 'expr', build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.
+ */
+static TidOpExpr *
+MakeTidOpExpr(OpExpr *expr, TidRangeScanState *tidstate)
+{
+	Node	   *arg1 = get_leftop((Expr *) expr);
+	Node	   *arg2 = get_rightop((Expr *) expr);
+	ExprState  *exprstate = NULL;
+	bool		invert = false;
+	TidOpExpr  *tidopexpr;
+
+	if (IsCTIDVar(arg1))
+		exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
+	else if (IsCTIDVar(arg2))
+	{
+		exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
+		invert = true;
+	}
+	else
+		elog(ERROR, "could not identify CTID variable");
+
+	tidopexpr = (TidOpExpr *) palloc(sizeof(TidOpExpr));
+	tidopexpr->inclusive = false;		/* for now */
+
+	switch (expr->opno)
+	{
+		case TIDLessEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDLessOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
+			break;
+		case TIDGreaterEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDGreaterOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
+			break;
+		default:
+			elog(ERROR, "could not identify CTID operator");
+	}
+
+	tidopexpr->exprstate = exprstate;
+
+	return tidopexpr;
+}
+
+/*
+ * Extract the qual subexpressions that yield TIDs to search for,
+ * and compile them into ExprStates if they're ordinary expressions.
+ */
+static void
+TidExprListCreate(TidRangeScanState *tidrangestate)
+{
+	TidRangeScan *node = (TidRangeScan *) tidrangestate->ss.ps.plan;
+	List	   *tidexprs = NIL;
+	ListCell   *l;
+
+	foreach(l, node->tidrangequals)
+	{
+		OpExpr	   *opexpr = lfirst(l);
+		TidOpExpr  *tidopexpr;
+
+		if (!IsA(opexpr, OpExpr))
+			elog(ERROR, "could not identify CTID expression");
+
+		tidopexpr = MakeTidOpExpr(opexpr, tidrangestate);
+		tidexprs = lappend(tidexprs, tidopexpr);
+	}
+
+	tidrangestate->trss_tidexprs = tidexprs;
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeEval
+ *
+ *		Compute and set node's block and offset range to scan by evaluating
+ *		the trss_tidexprs.  Returns false if we detect the range cannot
+ *		contain any tuples.  Returns true if it's possible for the range to
+ *		contain tuples.
+ * ----------------------------------------------------------------
+ */
+static bool
+TidRangeEval(TidRangeScanState *node)
+{
+	ExprContext *econtext = node->ss.ps.ps_ExprContext;
+	ItemPointerData lowerBound;
+	ItemPointerData upperBound;
+	ListCell   *l;
+
+	/*
+	 * Set the upper and lower bounds to the absolute limits of the range of
+	 * the ItemPointer type.  Below we'll try to narrow this range on either
+	 * side by looking at the TidOpExprs.
+	 */
+	ItemPointerSet(&lowerBound, 0, 0);
+	ItemPointerSet(&upperBound, InvalidBlockNumber, PG_UINT16_MAX);
+
+	foreach(l, node->trss_tidexprs)
+	{
+		TidOpExpr  *tidopexpr = (TidOpExpr *) lfirst(l);
+		ItemPointer itemptr;
+		bool		isNull;
+
+		/* Evaluate this bound. */
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidopexpr->exprstate,
+													  econtext,
+													  &isNull));
+
+		/* If the bound is NULL, *nothing* matches the qual. */
+		if (isNull)
+			return false;
+
+		if (tidopexpr->exprtype == TIDEXPR_LOWER_BOUND)
+		{
+			ItemPointerData lb;
+
+			ItemPointerCopy(itemptr, &lb);
+
+			/*
+			 * Normalize non-inclusive ranges to become inclusive.  The
+			 * resulting ItemPointer here may not be a valid item pointer.
+			 */
+			if (!tidopexpr->inclusive)
+				ItemPointerInc(&lb);
+
+			/* Check if we can narrow the range using this qual */
+			if (ItemPointerCompare(&lb, &lowerBound) > 0)
+				ItemPointerCopy(&lb, &lowerBound);
+		}
+
+		if (tidopexpr->exprtype == TIDEXPR_UPPER_BOUND)
+		{
+			ItemPointerData ub;
+
+			ItemPointerCopy(itemptr, &ub);
+
+			/*
+			 * Normalize non-inclusive ranges to become inclusive.  The
+			 * resulting ItemPointer here may not be a valid item pointer.
+			 */
+			if (!tidopexpr->inclusive)
+				ItemPointerDec(&ub);
+
+			/* Check if we can narrow the range using this qual */
+			if (ItemPointerCompare(&ub, &upperBound) < 0)
+				ItemPointerCopy(&ub, &upperBound);
+		}
+	}
+
+	ItemPointerCopy(&lowerBound, &node->trss_mintid);
+	ItemPointerCopy(&upperBound, &node->trss_maxtid);
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeNext
+ *
+ *		Retrieve a tuple from the TidRangeScan node's currentRelation
+ *		using the tids in the TidRangeScanState information.
+ *
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+TidRangeNext(TidRangeScanState *node)
+{
+	TableScanDesc scandesc;
+	EState	   *estate;
+	ScanDirection direction;
+	TupleTableSlot *slot;
+
+	/*
+	 * extract necessary information from tid scan node
+	 */
+	scandesc = node->ss.ss_currentScanDesc;
+	estate = node->ss.ps.state;
+	slot = node->ss.ss_ScanTupleSlot;
+	direction = estate->es_direction;
+
+	if (!node->trss_inScan)
+	{
+		/* First time through, compute the list of TID ranges to be visited */
+		if (!TidRangeEval(node))
+			return NULL;
+
+		if (scandesc == NULL)
+		{
+			scandesc = table_beginscan_strat(node->ss.ss_currentRelation,
+											 estate->es_snapshot,
+											 0, NULL,
+											 false, false);
+			node->ss.ss_currentScanDesc = scandesc;
+		}
+
+		node->trss_inScan = true;
+	}
+
+	/* Fetch the next tuple. */
+	if (!table_scan_getnextslot_inrange(scandesc, direction, slot,
+										&node->trss_mintid,
+										&node->trss_maxtid))
+		ExecClearTuple(slot);
+
+	return slot;
+}
+
+/*
+ * TidRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecTidRangeScan(node)
+ *
+ *		Scans the relation using tids and returns the next qualifying tuple.
+ *		We call the ExecScan() routine and pass it the appropriate
+ *		access method functions.
+ *
+ *		Conditions:
+ *		  -- the "cursor" maintained by the AMI is positioned at the tuple
+ *			 returned previously.
+ *
+ *		Initial States:
+ *		  -- the relation indicated is opened for scanning so that the
+ *			 "cursor" is positioned before the first qualifying tuple.
+ *		  -- trss_startBlock is InvalidBlockNumber
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecTidRangeScan(PlanState *pstate)
+{
+	TidRangeScanState *node = castNode(TidRangeScanState, pstate);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) TidRangeNext,
+					(ExecScanRecheckMtd) TidRangeRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanTidRangeScan(node)
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanTidRangeScan(TidRangeScanState *node)
+{
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+
+	if (scan != NULL)
+		table_rescan(scan, NULL);
+
+	/* mark scan as not in progress, and tid range list as not computed yet */
+	node->trss_inScan = false;
+
+	ExecScanReScan(&node->ss);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecEndTidRangeScan
+ *
+ *		Releases any storage allocated through C routines.
+ *		Returns nothing.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndTidRangeScan(TidRangeScanState *node)
+{
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+
+	if (scan != NULL)
+		table_endscan(scan);
+
+	/*
+	 * Free the exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitTidRangeScan
+ *
+ *		Initializes the tid range scan's state information, creates
+ *		scan keys, and opens the base and tid relations.
+ *
+ *		Parameters:
+ *		  node: TidRangeScan node produced by the planner.
+ *		  estate: the execution state initialized in InitPlan.
+ * ----------------------------------------------------------------
+ */
+TidRangeScanState *
+ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
+{
+	TidRangeScanState *tidrangestate;
+	Relation	currentRelation;
+
+	/*
+	 * create state structure
+	 */
+	tidrangestate = makeNode(TidRangeScanState);
+	tidrangestate->ss.ps.plan = (Plan *) node;
+	tidrangestate->ss.ps.state = estate;
+	tidrangestate->ss.ps.ExecProcNode = ExecTidRangeScan;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &tidrangestate->ss.ps);
+
+	/*
+	 * mark scan as not in progress, and tid range as not computed yet
+	 */
+	tidrangestate->trss_inScan = false;
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	tidrangestate->ss.ss_currentRelation = currentRelation;
+	tidrangestate->ss.ss_currentScanDesc = NULL;	/* no table scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &tidrangestate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTypeTL(&tidrangestate->ss.ps);
+	ExecAssignScanProjectionInfo(&tidrangestate->ss);
+
+	/*
+	 * initialize child expressions
+	 */
+	tidrangestate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) tidrangestate);
+
+	TidExprListCreate(tidrangestate);
+
+	/*
+	 * all done.
+	 */
+	return tidrangestate;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index ba3ccc712c..3e6e70e8aa 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -585,6 +585,27 @@ _copyTidScan(const TidScan *from)
 	return newnode;
 }
 
+/*
+ * _copyTidRangeScan
+ */
+static TidRangeScan *
+_copyTidRangeScan(const TidRangeScan *from)
+{
+	TidRangeScan *newnode = makeNode(TidRangeScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_NODE_FIELD(tidrangequals);
+
+	return newnode;
+}
+
 /*
  * _copySubqueryScan
  */
@@ -4903,6 +4924,9 @@ copyObjectImpl(const void *from)
 		case T_TidScan:
 			retval = _copyTidScan(from);
 			break;
+		case T_TidRangeScan:
+			retval = _copyTidRangeScan(from);
+			break;
 		case T_SubqueryScan:
 			retval = _copySubqueryScan(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 8392be6d44..3bbbe42ae8 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -608,6 +608,16 @@ _outTidScan(StringInfo str, const TidScan *node)
 	WRITE_NODE_FIELD(tidquals);
 }
 
+static void
+_outTidRangeScan(StringInfo str, const TidRangeScan *node)
+{
+	WRITE_NODE_TYPE("TIDRANGESCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_NODE_FIELD(tidrangequals);
+}
+
 static void
 _outSubqueryScan(StringInfo str, const SubqueryScan *node)
 {
@@ -3782,6 +3792,9 @@ outNode(StringInfo str, const void *obj)
 			case T_TidScan:
 				_outTidScan(str, obj);
 				break;
+			case T_TidRangeScan:
+				_outTidRangeScan(str, obj);
+				break;
 			case T_SubqueryScan:
 				_outSubqueryScan(str, obj);
 				break;
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index efb52858c8..4a6c348162 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -374,6 +374,7 @@ RelOptInfo      - a relation or joined relations
   IndexPath     - index scan
   BitmapHeapPath - top of a bitmapped index scan
   TidPath       - scan by CTID
+  TidRangePath  - scan a contiguous range of CTIDs
   SubqueryScanPath - scan a subquery-in-FROM
   ForeignPath   - scan a foreign table, foreign join or foreign upper-relation
   CustomPath    - for custom scan providers
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index aab06c7d21..744a9aed3e 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1283,6 +1283,101 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_tidrangescan
+ *	  Determines and sets the costs of scanning a relation using a range of
+ *	  TIDs for 'path'
+ *
+ * 'baserel' is the relation to be scanned
+ * 'tidrangequals' is the list of TID-checkable range quals
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_tidrangescan(Path *path, PlannerInfo *root,
+				  RelOptInfo *baserel, List *tidrangequals,
+				  ParamPathInfo *param_info)
+{
+	Selectivity selectivity;
+	double		pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	QualCost	tid_qual_cost;
+	double		ntuples;
+	double		nseqpages;
+	double		spc_random_page_cost;
+	double		spc_seq_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (param_info)
+		path->rows = param_info->ppi_rows;
+	else
+		path->rows = baserel->rows;
+
+	/* Count how many tuples and pages we expect to scan */
+	selectivity = clauselist_selectivity(root, tidrangequals, baserel->relid,
+										 JOIN_INNER, NULL);
+	pages = ceil(selectivity * baserel->pages);
+
+	if (pages <= 0.0)
+		pages = 1.0;
+
+	/*
+	 * The first page in a range requires a random seek, but each subsequent
+	 * page is just a normal sequential page read. NOTE: it's desirable for
+	 * Tid Range Scans to cost more than the equivalent Sequential Scans,
+	 * because Seq Scans have some performance advantages such as scan
+	 * synchronization and parallelizability, and we'd prefer one of them to
+	 * be picked unless a Tid Range Scan really is better.
+	 */
+	ntuples = selectivity * baserel->tuples;
+	nseqpages = pages - 1.0;
+
+	if (!enable_tidscan)
+		startup_cost += disable_cost;
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrieved tuple.
+	 */
+	cost_qual_eval(&tid_qual_cost, tidrangequals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/* disk costs; 1 random page and the remainder as seq pages */
+	run_cost += spc_random_page_cost + spc_seq_page_cost * nseqpages;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+	/*
+	 * XXX currently we assume TID quals are a subset of qpquals at this
+	 * point; they will be removed (if possible) when we create the plan, so
+	 * we subtract their cost from the total qpqual cost.  (If the TID quals
+	 * can't be removed, this is a mistake and we're going to underestimate
+	 * the CPU cost a bit.)
+	 */
+	startup_cost += qpqual_cost.startup + tid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		tid_qual_cost.per_tuple;
+	run_cost += cpu_per_tuple * ntuples;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->pathtarget->cost.startup;
+	run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+	path->startup_cost = startup_cost;
+	path->total_cost = startup_cost + run_cost;
+}
+
 /*
  * cost_subqueryscan
  *	  Determines and returns the cost of scanning a subquery RTE.
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 0845b460e2..41d86e42e0 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -2,9 +2,9 @@
  *
  * tidpath.c
  *	  Routines to determine which TID conditions are usable for scanning
- *	  a given relation, and create TidPaths accordingly.
+ *	  a given relation, and create TidPaths and TidRangePaths accordingly.
  *
- * What we are looking for here is WHERE conditions of the form
+ * For TidPaths, we look for WHERE conditions of the form
  * "CTID = pseudoconstant", which can be implemented by just fetching
  * the tuple directly via heap_fetch().  We can also handle OR'd conditions
  * such as (CTID = const1) OR (CTID = const2), as well as ScalarArrayOpExpr
@@ -23,6 +23,9 @@
  * a function, but in practice it works better to keep the special node
  * representation all the way through to execution.
  *
+ * Additionally, TidRangePaths may be created for conditions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=, and
+ * AND-clauses composed of such conditions.
  *
  * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -63,14 +66,14 @@ IsCTIDVar(Var *var, RelOptInfo *rel)
 
 /*
  * Check to see if a RestrictInfo is of the form
- *		CTID = pseudoconstant
+ *		CTID OP pseudoconstant
  * or
- *		pseudoconstant = CTID
- * where the CTID Var belongs to relation "rel", and nothing on the
- * other side of the clause does.
+ *		pseudoconstant OP CTID
+ * where OP is a binary operation, the CTID Var belongs to relation "rel",
+ * and nothing on the other side of the clause does.
  */
 static bool
-IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+IsBinaryTidClause(RestrictInfo *rinfo, RelOptInfo *rel)
 {
 	OpExpr	   *node;
 	Node	   *arg1,
@@ -83,10 +86,9 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 		return false;
 	node = (OpExpr *) rinfo->clause;
 
-	/* Operator must be tideq */
-	if (node->opno != TIDEqualOperator)
+	/* OpExpr must have two arguments */
+	if (list_length(node->args) != 2)
 		return false;
-	Assert(list_length(node->args) == 2);
 	arg1 = linitial(node->args);
 	arg2 = lsecond(node->args);
 
@@ -116,6 +118,50 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 	return true;				/* success */
 }
 
+/*
+ * Check to see if a RestrictInfo is of the form
+ *		CTID = pseudoconstant
+ * or
+ *		pseudoconstant = CTID
+ * where the CTID Var belongs to relation "rel", and nothing on the
+ * other side of the clause does.
+ */
+static bool
+IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	if (!IsBinaryTidClause(rinfo, rel))
+		return false;
+
+	if (((OpExpr *) rinfo->clause)->opno == TIDEqualOperator)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check to see if a RestrictInfo is of the form
+ *		CTID OP pseudoconstant
+ * or
+ *		pseudoconstant OP CTID
+ * where OP is a range operator such as <, <=, >, or >=, the CTID Var belongs
+ * to relation "rel", and nothing on the other side of the clause does.
+ */
+static bool
+IsTidRangeClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	Oid			opno;
+
+	if (!IsBinaryTidClause(rinfo, rel))
+		return false;
+	opno = ((OpExpr *) rinfo->clause)->opno;
+
+	if (opno == TIDLessOperator || opno == TIDLessEqOperator ||
+		opno == TIDGreaterOperator || opno == TIDGreaterEqOperator)
+		return true;
+
+	return false;
+}
+
 /*
  * Check to see if a RestrictInfo is of the form
  *		CTID = ANY (pseudoconstant_array)
@@ -222,7 +268,7 @@ TidQualFromRestrictInfo(PlannerInfo *root, RestrictInfo *rinfo, RelOptInfo *rel)
  *
  * Returns a List of CTID qual RestrictInfos for the specified rel (with
  * implicit OR semantics across the list), or NIL if there are no usable
- * conditions.
+ * equality conditions.
  *
  * This function is just concerned with handling AND/OR recursion.
  */
@@ -301,6 +347,34 @@ TidQualFromRestrictInfoList(PlannerInfo *root, List *rlist, RelOptInfo *rel)
 	return rlst;
 }
 
+/*
+ * Extract a set of CTID range conditions from implicit-AND List of RestrictInfos
+ *
+ * Returns a List of CTID range qual RestrictInfos for the specified rel
+ * (with implicit AND semantics across the list), or NIL if there are no
+ * usable range conditions or if the rel's table AM does not support TID range
+ * scans.
+ */
+static List *
+TidRangeQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
+{
+	List	   *rlst = NIL;
+	ListCell   *l;
+
+	if ((rel->amflags & AMFLAG_HAS_TID_RANGE) == 0)
+		return NIL;
+
+	foreach(l, rlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (IsTidRangeClause(rinfo, rel))
+			rlst = lappend(rlst, rinfo);
+	}
+
+	return rlst;
+}
+
 /*
  * Given a list of join clauses involving our rel, create a parameterized
  * TidPath for each one that is a suitable TidEqual clause.
@@ -385,6 +459,7 @@ void
 create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 {
 	List	   *tidquals;
+	List	   *tidrangequals;
 
 	/*
 	 * If any suitable quals exist in the rel's baserestrict list, generate a
@@ -404,6 +479,26 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 												   required_outer));
 	}
 
+	/*
+	 * If there are range quals in the baserestrict list, generate a
+	 * TidRangePath.
+	 */
+	tidrangequals = TidRangeQualFromRestrictInfoList(rel->baserestrictinfo,
+													 rel);
+
+	if (tidrangequals)
+	{
+		/*
+		 * This path uses no join clauses, but it could still have required
+		 * parameterization due to LATERAL refs in its tlist.
+		 */
+		Relids		required_outer = rel->lateral_relids;
+
+		add_path(rel, (Path *) create_tidrangescan_path(root, rel,
+														tidrangequals,
+														required_outer));
+	}
+
 	/*
 	 * Try to generate parameterized TidPaths using equality clauses extracted
 	 * from EquivalenceClasses.  (This is important since simple "t1.ctid =
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 25d4750ca6..c5653221b7 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -129,6 +129,10 @@ static Plan *create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
 static void bitmap_subplan_mark_shared(Plan *plan);
 static TidScan *create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 									List *tlist, List *scan_clauses);
+static TidRangeScan *create_tidrangescan_plan(PlannerInfo *root,
+											  TidRangePath *best_path,
+											  List *tlist,
+											  List *scan_clauses);
 static SubqueryScan *create_subqueryscan_plan(PlannerInfo *root,
 											  SubqueryScanPath *best_path,
 											  List *tlist, List *scan_clauses);
@@ -193,6 +197,8 @@ static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 											Index scanrelid);
 static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
 							 List *tidquals);
+static TidRangeScan *make_tidrangescan(List *qptlist, List *qpqual,
+									   Index scanrelid, List *tidrangequals);
 static SubqueryScan *make_subqueryscan(List *qptlist,
 									   List *qpqual,
 									   Index scanrelid,
@@ -384,6 +390,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -679,6 +686,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												scan_clauses);
 			break;
 
+		case T_TidRangeScan:
+			plan = (Plan *) create_tidrangescan_plan(root,
+													 (TidRangePath *) best_path,
+													 tlist,
+													 scan_clauses);
+			break;
+
 		case T_SubqueryScan:
 			plan = (Plan *) create_subqueryscan_plan(root,
 													 (SubqueryScanPath *) best_path,
@@ -3440,6 +3454,71 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_tidrangescan_plan
+ *	 Returns a tidrangescan plan for the base relation scanned by 'best_path'
+ *	 with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static TidRangeScan *
+create_tidrangescan_plan(PlannerInfo *root, TidRangePath *best_path,
+						 List *tlist, List *scan_clauses)
+{
+	TidRangeScan *scan_plan;
+	Index		scan_relid = best_path->path.parent->relid;
+	List	   *tidrangequals = best_path->tidrangequals;
+
+	/* it should be a base rel... */
+	Assert(scan_relid > 0);
+	Assert(best_path->path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * The qpqual list must contain all restrictions not enforced by the
+	 * tidrangequals list.  tidrangequals has AND semantics, so we can simply
+	 * remove any qual that appears in it.
+	 */
+	{
+		List	   *qpqual = NIL;
+		ListCell   *l;
+
+		foreach(l, scan_clauses)
+		{
+			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+			if (rinfo->pseudoconstant)
+				continue;		/* we may drop pseudoconstants here */
+			if (list_member_ptr(tidrangequals, rinfo))
+				continue;		/* simple duplicate */
+			qpqual = lappend(qpqual, rinfo);
+		}
+		scan_clauses = qpqual;
+	}
+
+	/* Sort clauses into best execution order */
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+
+	/* Reduce RestrictInfo lists to bare expressions; ignore pseudoconstants */
+	tidrangequals = extract_actual_clauses(tidrangequals, false);
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/* Replace any outer-relation variables with nestloop params */
+	if (best_path->path.param_info)
+	{
+		tidrangequals = (List *)
+			replace_nestloop_params(root, (Node *) tidrangequals);
+		scan_clauses = (List *)
+			replace_nestloop_params(root, (Node *) scan_clauses);
+	}
+
+	scan_plan = make_tidrangescan(tlist,
+								  scan_clauses,
+								  scan_relid,
+								  tidrangequals);
+
+	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
+
+	return scan_plan;
+}
+
 /*
  * create_subqueryscan_plan
  *	 Returns a subqueryscan plan for the base relation scanned by 'best_path'
@@ -5373,6 +5452,25 @@ make_tidscan(List *qptlist,
 	return node;
 }
 
+static TidRangeScan *
+make_tidrangescan(List *qptlist,
+				  List *qpqual,
+				  Index scanrelid,
+				  List *tidrangequals)
+{
+	TidRangeScan *node = makeNode(TidRangeScan);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->tidrangequals = tidrangequals;
+
+	return node;
+}
+
 static SubqueryScan *
 make_subqueryscan(List *qptlist,
 				  List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index c3c36be13e..42f088ad71 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -619,6 +619,22 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 								  rtoffset, 1);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				TidRangeScan *splan = (TidRangeScan *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist,
+								  rtoffset, NUM_EXEC_TLIST(plan));
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+				splan->tidrangequals =
+					fix_scan_list(root, splan->tidrangequals,
+								  rtoffset, 1);
+			}
+			break;
 		case T_SubqueryScan:
 			/* Needs special treatment, see comments below */
 			return set_subqueryscan_references(root,
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 54ef61bfb3..f3e46e0959 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2367,6 +2367,12 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_TidRangeScan:
+			finalize_primnode((Node *) ((TidRangeScan *) plan)->tidrangequals,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			break;
+
 		case T_SubqueryScan:
 			{
 				SubqueryScan *sscan = (SubqueryScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d465b9e213..8a552812f5 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1203,6 +1203,35 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 	return pathnode;
 }
 
+/*
+ * create_tidscan_path
+ *	  Creates a path corresponding to a scan by a range of TIDs, returning
+ *	  the pathnode.
+ */
+TidRangePath *
+create_tidrangescan_path(PlannerInfo *root, RelOptInfo *rel,
+						 List *tidrangequals, Relids required_outer)
+{
+	TidRangePath *pathnode = makeNode(TidRangePath);
+
+	pathnode->path.pathtype = T_TidRangeScan;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_workers = 0;
+	pathnode->path.pathkeys = NIL;	/* always unordered */
+
+	pathnode->tidrangequals = tidrangequals;
+
+	cost_tidrangescan(&pathnode->path, root, rel, tidrangequals,
+					  pathnode->path.param_info);
+
+	return pathnode;
+}
+
 /*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index da322b453e..3f0d25fba8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -466,6 +466,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	/* Collect info about relation's foreign keys, if relevant */
 	get_relation_foreign_keys(root, rel, relation, inhparent);
 
+	/* Collect info about functions implemented by the rel's table AM. */
+	if (relation->rd_tableam && relation->rd_tableam->scan_getnextslot_inrange != NULL)
+		rel->amflags |= AMFLAG_HAS_TID_RANGE;
+
 	/*
 	 * Collect info about relation's partitioning scheme, if any. Only
 	 * inheritance parents may be partitioned.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 731ff708b9..345c877aeb 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -234,6 +234,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->subroot = NULL;
 	rel->subplan_params = NIL;
 	rel->rel_parallel_workers = -1; /* set up in get_relation_info */
+	rel->amflags = 0;
 	rel->serverid = InvalidOid;
 	rel->userid = rte->checkAsUser;
 	rel->useridiscurrent = false;
@@ -646,6 +647,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->subroot = NULL;
 	joinrel->subplan_params = NIL;
 	joinrel->rel_parallel_workers = -1;
+	joinrel->amflags = 0;
 	joinrel->serverid = InvalidOid;
 	joinrel->userid = InvalidOid;
 	joinrel->useridiscurrent = false;
@@ -826,6 +828,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	joinrel->eclass_indexes = NULL;
 	joinrel->subroot = NULL;
 	joinrel->subplan_params = NIL;
+	joinrel->amflags = 0;
 	joinrel->serverid = InvalidOid;
 	joinrel->userid = InvalidOid;
 	joinrel->useridiscurrent = false;
diff --git a/src/backend/storage/page/itemptr.c b/src/backend/storage/page/itemptr.c
index 55759c383b..c7aec7dbd9 100644
--- a/src/backend/storage/page/itemptr.c
+++ b/src/backend/storage/page/itemptr.c
@@ -71,3 +71,61 @@ ItemPointerCompare(ItemPointer arg1, ItemPointer arg2)
 	else
 		return 0;
 }
+
+/*
+ * ItemPointerInc
+ *		Increment 'pointer' by 1 only paying attention to the ItemPointer's
+ *		type's range limits and not MaxOffsetNumber and FirstOffsetNumber.
+ *		This may result in 'pointer' becoming !OffsetNumberIsValid.
+ *
+ * If the pointer is already the maximum possible values permitted by the
+ * range of the ItemPointer's types, then do nothing.
+ */
+void
+ItemPointerInc(ItemPointer pointer)
+{
+	BlockNumber blk = ItemPointerGetBlockNumberNoCheck(pointer);
+	OffsetNumber off = ItemPointerGetOffsetNumberNoCheck(pointer);
+
+	if (off == PG_UINT16_MAX)
+	{
+		if (blk != InvalidBlockNumber)
+		{
+			off = 0;
+			blk++;
+		}
+	}
+	else
+		off++;
+
+	ItemPointerSet(pointer, blk, off);
+}
+
+/*
+ * ItemPointerDec
+ *		Decrement 'pointer' by 1 only paying attention to the ItemPointer's
+ *		type's range limits and not MaxOffsetNumber and FirstOffsetNumber.
+ *		This may result in 'pointer' becoming !OffsetNumberIsValid.
+ *
+ * If the pointer is already the minimum possible values permitted by the
+ * range of the ItemPointer's types, then do nothing.
+ */
+void
+ItemPointerDec(ItemPointer pointer)
+{
+	BlockNumber blk = ItemPointerGetBlockNumberNoCheck(pointer);
+	OffsetNumber off = ItemPointerGetOffsetNumberNoCheck(pointer);
+
+	if (off == 0)
+	{
+		if (blk != 0)
+		{
+			off = PG_UINT16_MAX;
+			blk--;
+		}
+	}
+	else
+		off--;
+
+	ItemPointerSet(pointer, blk, off);
+}
\ No newline at end of file
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index d96a47b1ce..8f21fb4ba9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -121,6 +121,10 @@ extern void heap_endscan(TableScanDesc scan);
 extern HeapTuple heap_getnext(TableScanDesc scan, ScanDirection direction);
 extern bool heap_getnextslot(TableScanDesc sscan,
 							 ScanDirection direction, struct TupleTableSlot *slot);
+extern bool heap_getnextslot_inrange(TableScanDesc sscan,
+									 ScanDirection direction,
+									 TupleTableSlot *slot, ItemPointer mintid,
+									 ItemPointer maxtid);
 
 extern bool heap_fetch(Relation relation, Snapshot snapshot,
 					   HeapTuple tuple, Buffer *userbuf);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 33bffb6815..d1c608b176 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -325,6 +325,26 @@ typedef struct TableAmRoutine
 									 ScanDirection direction,
 									 TupleTableSlot *slot);
 
+	/*
+	 * Return next tuple from `scan` where TID is within the defined range.
+	 * This behaves like scan_getnextslot but only returns tuples from the
+	 * given range of TIDs.  Ranges are inclusive.  This function is optional
+	 * and may be set to NULL if TID range scans are not supported by the AM.
+	 *
+	 * Implementations of this function must themselves handle ItemPointers
+	 * of any value. i.e, they must handle each of the following:
+	 *
+	 * 1) mintid or maxtid is beyond the end of the table; and
+	 * 2) mintid is above maxtid; and
+	 * 3) item offset for mintid or maxtid is beyond the maximum offset
+	 * allowed by the AM.
+	 */
+	bool		(*scan_getnextslot_inrange) (TableScanDesc scan,
+											 ScanDirection direction,
+											 TupleTableSlot *slot,
+											 ItemPointer mintid,
+											 ItemPointer maxtid);
+
 
 	/* ------------------------------------------------------------------------
 	 * Parallel table scan related functions.
@@ -1015,6 +1035,36 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 	return sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, slot);
 }
 
+/*
+ * Return next tuple from defined TID range from `scan` and store in slot.
+ */
+static inline bool
+table_scan_getnextslot_inrange(TableScanDesc sscan, ScanDirection direction,
+							   TupleTableSlot *slot, ItemPointer mintid,
+							   ItemPointer maxtid)
+{
+	/*
+	 * The planner should never make a plan which uses this function when the
+	 * table AM has not defined any function for this callback.
+	 */
+	Assert(sscan->rs_rd->rd_tableam->scan_getnextslot_inrange != NULL);
+
+	slot->tts_tableOid = RelationGetRelid(sscan->rs_rd);
+
+	/*
+	 * We don't expect direct calls to table_scan_getnextslot_inrange with
+	 * valid CheckXidAlive for catalog or regular tables.  See detailed
+	 * comments in xact.c where these variables are declared.
+	 */
+	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
+		elog(ERROR, "unexpected table_scan_getnextslot_inrange call during logical decoding");
+
+	return sscan->rs_rd->rd_tableam->scan_getnextslot_inrange(sscan,
+															  direction,
+															  slot,
+															  mintid,
+															  maxtid);
+}
 
 /* ----------------------------------------------------------------------------
  * Parallel table scan related functions.
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 0d4eac8f96..85395a81ee 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -237,15 +237,15 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git a/src/include/executor/nodeTidrangescan.h b/src/include/executor/nodeTidrangescan.h
new file mode 100644
index 0000000000..e53783a3bf
--- /dev/null
+++ b/src/include/executor/nodeTidrangescan.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeTidrangescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODETIDRANGESCAN_H
+#define NODETIDRANGESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern TidRangeScanState *ExecInitTidRangeScan(TidRangeScan *node,
+											   EState *estate, int eflags);
+extern void ExecEndTidRangeScan(TidRangeScanState *node);
+extern void ExecReScanTidRangeScan(TidRangeScanState *node);
+
+#endif							/* NODETIDRANGESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d65099c94a..dba1cea745 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1617,6 +1617,24 @@ typedef struct TidScanState
 	HeapTupleData tss_htup;
 } TidScanState;
 
+/* ----------------
+ *	 TidRangeScanState information
+ *
+ *		trss_tidexprs		list of TidOpExpr structs (see nodeTidrangescan.c)
+ *		trss_mintid			the lowest TID in the scan range
+ *		trss_maxtid			the highest TID in the scan range
+ *		trss_inScan			is a scan currently in progress?
+ * ----------------
+ */
+typedef struct TidRangeScanState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	List	   *trss_tidexprs;
+	ItemPointerData trss_mintid;
+	ItemPointerData trss_maxtid;
+	bool		trss_inScan;
+} TidRangeScanState;
+
 /* ----------------
  *	 SubqueryScanState information
  *
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index caed683ba9..3016836ede 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -59,6 +59,7 @@ typedef enum NodeTag
 	T_BitmapIndexScan,
 	T_BitmapHeapScan,
 	T_TidScan,
+	T_TidRangeScan,
 	T_SubqueryScan,
 	T_FunctionScan,
 	T_ValuesScan,
@@ -116,6 +117,7 @@ typedef enum NodeTag
 	T_BitmapIndexScanState,
 	T_BitmapHeapScanState,
 	T_TidScanState,
+	T_TidRangeScanState,
 	T_SubqueryScanState,
 	T_FunctionScanState,
 	T_TableFuncScanState,
@@ -229,6 +231,7 @@ typedef enum NodeTag
 	T_BitmapAndPath,
 	T_BitmapOrPath,
 	T_TidPath,
+	T_TidRangePath,
 	T_SubqueryScanPath,
 	T_ForeignPath,
 	T_CustomPath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index cde2637798..5f93364116 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -621,6 +621,10 @@ typedef struct PartitionSchemeData *PartitionScheme;
  * to simplify matching join clauses to those lists.
  *----------
  */
+
+/* Bitmask of flags supported by table AMs */
+#define AMFLAG_HAS_TID_RANGE (1 << 0)
+
 typedef enum RelOptKind
 {
 	RELOPT_BASEREL,
@@ -710,6 +714,8 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	int			amflags;		/* Bitmask of optional features supported by
+								 * the table AM */
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
@@ -1323,6 +1329,18 @@ typedef struct TidPath
 	List	   *tidquals;		/* qual(s) involving CTID = something */
 } TidPath;
 
+/*
+ * TidRangePath represents a scan by a continguous range of TIDs
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ */
+typedef struct TidRangePath
+{
+	Path		path;
+	List	   *tidrangequals;
+} TidRangePath;
+
 /*
  * SubqueryScanPath represents a scan of an unflattened subquery-in-FROM
  *
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 43160439f0..6e62104d0b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -485,6 +485,19 @@ typedef struct TidScan
 	List	   *tidquals;		/* qual(s) involving CTID = something */
 } TidScan;
 
+/* ----------------
+ *		tid range scan node
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ * ----------------
+ */
+typedef struct TidRangeScan
+{
+	Scan		scan;
+	List	   *tidrangequals;	/* qual(s) involving CTID op something */
+} TidRangeScan;
+
 /* ----------------
  *		subquery scan node
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ed2e4af4be..1be93be098 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -83,6 +83,9 @@ extern void cost_bitmap_or_node(BitmapOrPath *path, PlannerInfo *root);
 extern void cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec);
 extern void cost_tidscan(Path *path, PlannerInfo *root,
 						 RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info);
+extern void cost_tidrangescan(Path *path, PlannerInfo *root,
+							  RelOptInfo *baserel, List *tidrangequals,
+							  ParamPathInfo *param_info);
 extern void cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
 							  RelOptInfo *baserel, ParamPathInfo *param_info);
 extern void cost_functionscan(Path *path, PlannerInfo *root,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 23dec14cbd..22c6d4c4fd 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,6 +63,10 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 										   List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 									List *tidquals, Relids required_outer);
+extern TidRangePath *create_tidrangescan_path(PlannerInfo *root,
+											  RelOptInfo *rel,
+											  List *tidrangequals,
+											  Relids required_outer);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 									  List *subpaths, List *partial_subpaths,
 									  List *pathkeys, Relids required_outer,
diff --git a/src/include/storage/itemptr.h b/src/include/storage/itemptr.h
index 0e6990140b..cd4b8fbacb 100644
--- a/src/include/storage/itemptr.h
+++ b/src/include/storage/itemptr.h
@@ -202,5 +202,7 @@ typedef ItemPointerData *ItemPointer;
 
 extern bool ItemPointerEquals(ItemPointer pointer1, ItemPointer pointer2);
 extern int32 ItemPointerCompare(ItemPointer arg1, ItemPointer arg2);
+extern void ItemPointerInc(ItemPointer pointer);
+extern void ItemPointerDec(ItemPointer pointer);
 
 #endif							/* ITEMPTR_H */
diff --git a/src/test/regress/expected/tidrangescan.out b/src/test/regress/expected/tidrangescan.out
new file mode 100644
index 0000000000..0384304c7f
--- /dev/null
+++ b/src/test/regress/expected/tidrangescan.out
@@ -0,0 +1,302 @@
+-- tests for tidrangescans
+SET enable_seqscan TO off;
+CREATE TABLE tidrangescan(id integer, data text);
+-- insert enough tuples to fill at least two pages
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,200) AS s(i);
+-- remove all tuples after the 10th tuple on each page.  Trying to ensure
+-- we get the same layout with all CPU architectures and smaller than standard
+-- page sizes.
+DELETE FROM tidrangescan
+WHERE substring(ctid::text FROM ',(\d+)\)')::integer > 10 OR substring(ctid::text FROM '\((\d+),')::integer > 2;
+VACUUM tidrangescan;
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid <= '(1,5)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+ (1,1)
+ (1,2)
+ (1,3)
+ (1,4)
+ (1,5)
+(15 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(0,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid > '(2,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+  ctid  
+--------
+ (2,9)
+ (2,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ('(2,8)'::tid < ctid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+  ctid  
+--------
+ (2,9)
+ (2,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(2,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+  ctid  
+--------
+ (2,8)
+ (2,9)
+ (2,10)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(100,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ((ctid > '(1,4)'::tid) AND ('(1,7)'::tid >= ctid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+ ctid  
+-------
+ (1,5)
+ (1,6)
+ (1,7)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (('(1,7)'::tid >= ctid) AND (ctid > '(1,4)'::tid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+ ctid  
+-------
+ (1,5)
+ (1,6)
+ (1,7)
+(3 rows)
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan WHERE ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(4294967295,65535)';
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid 
+------
+(0 rows)
+
+-- NULLs in the range cannot return tuples
+SELECT ctid FROM tidrangescan WHERE ctid >= (SELECT NULL::tid);
+ ctid 
+------
+(0 rows)
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+ ctid 
+------
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid > '(9,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+ ctid 
+------
+(0 rows)
+
+-- rescans
+EXPLAIN (COSTS OFF)
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+                  QUERY PLAN                   
+-----------------------------------------------
+ Nested Loop
+   ->  Tid Range Scan on tidrangescan t
+         TID Cond: (ctid < '(1,0)'::tid)
+   ->  Aggregate
+         ->  Tid Range Scan on tidrangescan t2
+               TID Cond: (ctid <= t.ctid)
+(6 rows)
+
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+  ctid  | c  
+--------+----
+ (0,1)  |  1
+ (0,2)  |  2
+ (0,3)  |  3
+ (0,4)  |  4
+ (0,5)  |  5
+ (0,6)  |  6
+ (0,7)  |  7
+ (0,8)  |  8
+ (0,9)  |  9
+ (0,10) | 10
+(10 rows)
+
+-- cursors
+-- Ensure we get a TID Range scan without a Materialize node.
+EXPLAIN (COSTS OFF)
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH NEXT c;
+ ctid  
+-------
+ (0,2)
+(1 row)
+
+FETCH PRIOR c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH FIRST c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH LAST c;
+  ctid  
+--------
+ (0,10)
+(1 row)
+
+COMMIT;
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
+RESET enable_seqscan;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index e0e1ef71dd..2b9763a869 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -80,7 +80,7 @@ test: brin gin gist spgist privileges init_privs security_label collate matview
 # ----------
 # Another group of parallel tests
 # ----------
-test: create_table_like alter_generic alter_operator misc async dbsize misc_functions sysviews tsrf tid tidscan collate.icu.utf8 incremental_sort
+test: create_table_like alter_generic alter_operator misc async dbsize misc_functions sysviews tsrf tid tidscan tidrangescan collate.icu.utf8 incremental_sort
 
 # rules cannot run concurrently with any test that creates
 # a view or rule in the public schema
diff --git a/src/test/regress/sql/tidrangescan.sql b/src/test/regress/sql/tidrangescan.sql
new file mode 100644
index 0000000000..2da35807ff
--- /dev/null
+++ b/src/test/regress/sql/tidrangescan.sql
@@ -0,0 +1,104 @@
+-- tests for tidrangescans
+
+SET enable_seqscan TO off;
+CREATE TABLE tidrangescan(id integer, data text);
+
+-- insert enough tuples to fill at least two pages
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,200) AS s(i);
+
+-- remove all tuples after the 10th tuple on each page.  Trying to ensure
+-- we get the same layout with all CPU architectures and smaller than standard
+-- page sizes.
+DELETE FROM tidrangescan
+WHERE substring(ctid::text FROM ',(\d+)\)')::integer > 10 OR substring(ctid::text FROM '\((\d+),')::integer > 2;
+VACUUM tidrangescan;
+
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan WHERE ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)' LIMIT 1;
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(4294967295,65535)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- NULLs in the range cannot return tuples
+SELECT ctid FROM tidrangescan WHERE ctid >= (SELECT NULL::tid);
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+
+-- rescans
+EXPLAIN (COSTS OFF)
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+
+-- cursors
+
+-- Ensure we get a TID Range scan without a Materialize node.
+EXPLAIN (COSTS OFF)
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+FETCH NEXT c;
+FETCH PRIOR c;
+FETCH FIRST c;
+FETCH LAST c;
+COMMIT;
+
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
+
+RESET enable_seqscan;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 721b230bf2..c3cb2e8b05 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2529,8 +2529,13 @@ TextPositionState
 TheLexeme
 TheSubstitute
 TidExpr
+TidExprType
 TidHashKey
+TidOpExpr
 TidPath
+TidRangePath
+TidRangeScan
+TidRangeScanState
 TidScan
 TidScanState
 TimeADT
-- 
2.27.0

#83

Zhihong Yu

zyu@yugabyte.com

almost 5 years ago

In reply to: David Rowley (#82)

Re: Tid scan improvements

Hi,

bq. within this range. Table AMs where scanning ranges of TIDs does not
make
sense or is difficult to implement efficiently may choose to not implement

Is there criterion on how to judge efficiency ?

+       if (tidopexpr->exprtype == TIDEXPR_LOWER_BOUND)
...
+       if (tidopexpr->exprtype == TIDEXPR_UPPER_BOUND)

The if statement for upper bound should be prefixed with 'else', right ?

+ * TidRecheck -- access method routine to recheck a tuple in EvalPlanQual
...
+TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot)

The method name in the comment doesn't match the real method name.

+ *     ExecTidRangeScan(node)
...
+ExecTidRangeScan(PlanState *pstate)

Parameter names don't match.

Cheers

On Mon, Jan 25, 2021 at 5:23 PM David Rowley <dgrowleyml@gmail.com> wrote:

Show quoted text

On Thu, 21 Jan 2021 at 18:16, David Rowley <dgrowleyml@gmail.com> wrote:

I've implemented this in the attached.

The bug fix in 0001 is now committed, so I'm just attaching the 0002
patch again after having rebased... This is mostly just to keep the
CFbot happy.

David

#84

David Rowley

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Zhihong Yu (#83)

Re: Tid scan improvements

Thanks for having a look at this.

On Tue, 26 Jan 2021 at 15:48, Zhihong Yu <zyu@yugabyte.com> wrote:

bq. within this range. Table AMs where scanning ranges of TIDs does not make
sense or is difficult to implement efficiently may choose to not implement

Is there criterion on how to judge efficiency ?

For example, if the AM had no way to index such a column and the
method needed to scan the entire table to find TIDs in that range. The
planner may as well just pick a SeqScan. If that's the case, then the
table AM may as well not bother implementing that function.

+       if (tidopexpr->exprtype == TIDEXPR_LOWER_BOUND)
...
+       if (tidopexpr->exprtype == TIDEXPR_UPPER_BOUND)
The if statement for upper bound should be prefixed with 'else', right ?

Yeah, thanks.

+ * TidRecheck -- access method routine to recheck a tuple in EvalPlanQual
...
+TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot)
The method name in the comment doesn't match the real method name.

Well noticed.

+ *     ExecTidRangeScan(node)
...
+ExecTidRangeScan(PlanState *pstate)

Parameter names don't match.

hmm. Looking around it seems there's lots of other places that do
this. I think the the comment is really just indicating that the
function is taking an executor node state as a parameter.

Have a look at: git grep -E "\s\*.*$node$$" that shows the other
places. Some of these have the parameter named "node", and many others
have some other name.

I've made the two changes locally. Since the two issues were mostly
cosmetic, I'll post an updated patch after some bigger changes are
required.

Thanks again for looking at this.

David

#85

Andres Freund

andres@anarazel.de

almost 5 years ago

In reply to: David Rowley (#82)

Re: Tid scan improvements

Hi,

On 2021-01-26 14:22:42 +1300, David Rowley wrote:

diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 33bffb6815..d1c608b176 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -325,6 +325,26 @@ typedef struct TableAmRoutine
ScanDirection direction,
TupleTableSlot *slot);

+	/*
+	 * Return next tuple from `scan` where TID is within the defined range.
+	 * This behaves like scan_getnextslot but only returns tuples from the
+	 * given range of TIDs.  Ranges are inclusive.  This function is optional
+	 * and may be set to NULL if TID range scans are not supported by the AM.
+	 *
+	 * Implementations of this function must themselves handle ItemPointers
+	 * of any value. i.e, they must handle each of the following:
+	 *
+	 * 1) mintid or maxtid is beyond the end of the table; and
+	 * 2) mintid is above maxtid; and
+	 * 3) item offset for mintid or maxtid is beyond the maximum offset
+	 * allowed by the AM.
+	 */
+	bool		(*scan_getnextslot_inrange) (TableScanDesc scan,
+											 ScanDirection direction,
+											 TupleTableSlot *slot,
+											 ItemPointer mintid,
+											 ItemPointer maxtid);
+

/* ------------------------------------------------------------------------
* Parallel table scan related functions.
@@ -1015,6 +1035,36 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
return sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, slot);
}

+/*
+ * Return next tuple from defined TID range from `scan` and store in slot.
+ */
+static inline bool
+table_scan_getnextslot_inrange(TableScanDesc sscan, ScanDirection direction,
+							   TupleTableSlot *slot, ItemPointer mintid,
+							   ItemPointer maxtid)
+{
+	/*
+	 * The planner should never make a plan which uses this function when the
+	 * table AM has not defined any function for this callback.
+	 */
+	Assert(sscan->rs_rd->rd_tableam->scan_getnextslot_inrange != NULL);
+
+	slot->tts_tableOid = RelationGetRelid(sscan->rs_rd);
+
+	/*
+	 * We don't expect direct calls to table_scan_getnextslot_inrange with
+	 * valid CheckXidAlive for catalog or regular tables.  See detailed
+	 * comments in xact.c where these variables are declared.
+	 */
+	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
+		elog(ERROR, "unexpected table_scan_getnextslot_inrange call during logical decoding");
+
+	return sscan->rs_rd->rd_tableam->scan_getnextslot_inrange(sscan,
+															  direction,
+															  slot,
+															  mintid,
+															  maxtid);
+}

I don't really like that this API relies on mintid/maxtid to stay the
same across multiple scan_getnextslot_inrange() calls. I think we'd at
least need to document that that's required and what needs to be done to
scan a different set of min/max tid (or re-scan the same min/max from
scratch).

Perhaps something like

typedef struct TableScanTidRange TableScanTidRange;

TableScanTidRange* table_scan_tid_range_start(TableScanDesc sscan, ItemPointer mintid, ItemPointer maxtid);
bool table_scan_tid_range_nextslot(TableScanDesc sscan, TableScanTidRange *range, TupleTableSlot *slot);
void table_scan_tid_range_end(TableScanDesc sscan, TableScanTidRange* range);

would work better? That'd allow an AM to have arbitrarily large state
for a tid range scan, would make it clear what the lifetime of the
ItemPointer mintid, ItemPointer maxtid are etc. Wouldn't, on the API
level, prevent multiple tid range scans from being in progress at the
same times though :(. Perhaps we could add a TableScanTidRange* pointer
to TableScanDesc which'd be checked/set by tableam.h which'd prevent that?

Greetings,

Andres Freund

#86

David Rowley

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Andres Freund (#85)

Re: Tid scan improvements

Thanks for looking at this.

On Thu, 4 Feb 2021 at 10:19, Andres Freund <andres@anarazel.de> wrote:

Perhaps something like

typedef struct TableScanTidRange TableScanTidRange;

TableScanTidRange* table_scan_tid_range_start(TableScanDesc sscan, ItemPointer mintid, ItemPointer maxtid);
bool table_scan_tid_range_nextslot(TableScanDesc sscan, TableScanTidRange *range, TupleTableSlot *slot);
void table_scan_tid_range_end(TableScanDesc sscan, TableScanTidRange* range);

would work better? That'd allow an AM to have arbitrarily large state
for a tid range scan, would make it clear what the lifetime of the
ItemPointer mintid, ItemPointer maxtid are etc. Wouldn't, on the API
level, prevent multiple tid range scans from being in progress at the
same times though :(. Perhaps we could add a TableScanTidRange* pointer
to TableScanDesc which'd be checked/set by tableam.h which'd prevent that?

Maybe the TableScanTidRange can just have a field to store the
TableScanDesc. That way table_scan_tid_range_nextslot and
table_scan_tid_range_end can just pass the TableScanTidRange pointer.

That way it seems like it would be ok for multiple scans to be going
on concurrently as nobody should be reusing the TableScanDesc.

Does that seem ok?

David

#87

David Rowley

dgrowleyml@gmail.com

almost 5 years ago

In reply to: David Rowley (#86)

1 attachment(s)

Re: Tid scan improvements

On Thu, 4 Feb 2021 at 10:31, David Rowley <dgrowleyml@gmail.com> wrote:

Thanks for looking at this.

On Thu, 4 Feb 2021 at 10:19, Andres Freund <andres@anarazel.de> wrote:

Perhaps something like

typedef struct TableScanTidRange TableScanTidRange;

TableScanTidRange* table_scan_tid_range_start(TableScanDesc sscan, ItemPointer mintid, ItemPointer maxtid);
bool table_scan_tid_range_nextslot(TableScanDesc sscan, TableScanTidRange *range, TupleTableSlot *slot);
void table_scan_tid_range_end(TableScanDesc sscan, TableScanTidRange* range);

would work better? That'd allow an AM to have arbitrarily large state
for a tid range scan, would make it clear what the lifetime of the
ItemPointer mintid, ItemPointer maxtid are etc. Wouldn't, on the API
level, prevent multiple tid range scans from being in progress at the
same times though :(. Perhaps we could add a TableScanTidRange* pointer
to TableScanDesc which'd be checked/set by tableam.h which'd prevent that?

Maybe the TableScanTidRange can just have a field to store the
TableScanDesc. That way table_scan_tid_range_nextslot and
table_scan_tid_range_end can just pass the TableScanTidRange pointer.

That way it seems like it would be ok for multiple scans to be going
on concurrently as nobody should be reusing the TableScanDesc.

I ended up adding just two new API functions to table AM.

void (*scan_set_tid_range) (TableScanDesc sscan,
ItemPointer mintid,
ItemPointer maxtid);

and
bool (*scan_tid_range_nextslot) (TableScanDesc sscan,
ScanDirection direction,
TupleTableSlot *slot);

I added an additional function in tableam.h that does not have a
corresponding API function:

static inline TableScanDesc
table_tid_range_start(Relation rel, Snapshot snapshot,
ItemPointer mintid,
ItemPointer maxtid)

This just calls the standard scan_begin then calls scan_set_tid_range
setting the specified mintid and maxtid.

I also added 2 new fields to TableScanDesc:

ItemPointerData rs_mintid;
ItemPointerData rs_maxtid;

I didn't quite see a need to have a new start and end scan API function.

Updated patch attached.

David

Attachments:

v13-0001-Add-TID-Range-Scans-to-support-efficient-scannin.patchtext/plain; charset=US-ASCII; name=v13-0001-Add-TID-Range-Scans-to-support-efficient-scannin.patchDownload

From 36c34e3089d2ad224e7606df14209d8cc199aa67 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 21 Jan 2021 16:48:15 +1300
Subject: [PATCH v13] Add TID Range Scans to support efficient scanning ranges
 of TIDs

This adds a new node type named TID Range Scan.  The query planner will
generate paths for TID Range scans when quals are discovered on base
relations which search for ranges of ctid.  These ranges may be open at
either end.

To support this, a new optional callback function has been added to table
AM which is named scan_getnextslot_inrange.  This function accepts a
minimum and maximum ItemPointer to allow efficient retrieval of tuples
within this range.  Table AMs where scanning ranges of TIDs does not make
sense or is difficult to implement efficiently may choose to not implement
this function.

Author: Edmund Horner and David Rowley
Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com
---
 src/backend/access/heap/heapam.c           | 147 ++++++++
 src/backend/access/heap/heapam_handler.c   |   3 +
 src/backend/commands/explain.c             |  23 ++
 src/backend/executor/Makefile              |   1 +
 src/backend/executor/execAmi.c             |   6 +
 src/backend/executor/execProcnode.c        |  10 +
 src/backend/executor/nodeTidrangescan.c    | 413 +++++++++++++++++++++
 src/backend/nodes/copyfuncs.c              |  24 ++
 src/backend/nodes/outfuncs.c               |  13 +
 src/backend/optimizer/README               |   1 +
 src/backend/optimizer/path/costsize.c      |  95 +++++
 src/backend/optimizer/path/tidpath.c       | 117 +++++-
 src/backend/optimizer/plan/createplan.c    |  98 +++++
 src/backend/optimizer/plan/setrefs.c       |  16 +
 src/backend/optimizer/plan/subselect.c     |   6 +
 src/backend/optimizer/util/pathnode.c      |  29 ++
 src/backend/optimizer/util/plancat.c       |   6 +
 src/backend/optimizer/util/relnode.c       |   3 +
 src/backend/storage/page/itemptr.c         |  58 +++
 src/include/access/heapam.h                |   6 +-
 src/include/access/relscan.h               |   4 +
 src/include/access/tableam.h               |  84 ++++-
 src/include/catalog/pg_operator.dat        |   6 +-
 src/include/executor/nodeTidrangescan.h    |  23 ++
 src/include/nodes/execnodes.h              |  18 +
 src/include/nodes/nodes.h                  |   3 +
 src/include/nodes/pathnodes.h              |  18 +
 src/include/nodes/plannodes.h              |  13 +
 src/include/optimizer/cost.h               |   3 +
 src/include/optimizer/pathnode.h           |   4 +
 src/include/storage/itemptr.h              |   2 +
 src/test/regress/expected/tidrangescan.out | 302 +++++++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/serial_schedule           |   1 +
 src/test/regress/sql/tidrangescan.sql      | 104 ++++++
 src/tools/pgindent/typedefs.list           |   5 +
 36 files changed, 1646 insertions(+), 21 deletions(-)
 create mode 100644 src/backend/executor/nodeTidrangescan.c
 create mode 100644 src/include/executor/nodeTidrangescan.h
 create mode 100644 src/test/regress/expected/tidrangescan.out
 create mode 100644 src/test/regress/sql/tidrangescan.sql

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9926e2bd54..71e2762761 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1391,6 +1391,153 @@ heap_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *s
 	return true;
 }
 
+void
+heap_set_tid_range(TableScanDesc sscan, ItemPointer mintid,
+				   ItemPointer maxtid)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+	BlockNumber startBlk;
+	BlockNumber numBlks;
+	ItemPointerData highestItem;
+	ItemPointerData lowestItem;
+
+	/*
+	 * For relations without any pages, we can simply leave the TID range
+	 * unset.  There will be no tuples to scan, therefore no tuples outside
+	 * the given TID range.
+	 */
+	if (scan->rs_nblocks == 0)
+		return;
+
+	/*
+	 * Set up some ItemPointers which point to the first and last possible
+	 * tuples in the heap.
+	 */
+	ItemPointerSet(&highestItem, scan->rs_nblocks - 1, MaxOffsetNumber);
+	ItemPointerSet(&lowestItem, 0, FirstOffsetNumber);
+
+	/*
+	 * If the given maximum TID is below the highest possible TID in the
+	 * relation, then restrict the range to that, otherwise we scan to the
+	 * end of the relation.
+	 */
+	if (ItemPointerCompare(maxtid, &highestItem) < 0)
+		ItemPointerCopy(maxtid, &highestItem);
+
+	/*
+	 * If the given minimum TID is above the lowest possible TID in the
+	 * relation, then restrict the range to only scan for TIDs above that.
+	 */
+	if (ItemPointerCompare(mintid, &lowestItem) > 0)
+		ItemPointerCopy(mintid, &lowestItem);
+
+	/*
+	 * Check for an empty range and protect from would be negative results
+	 * from the numBlks calculation below.
+	 */
+	if (ItemPointerCompare(&highestItem, &lowestItem) < 0)
+	{
+		/* Set an empty range of blocks to scan */
+		heap_setscanlimits(sscan, 0, 0);
+		return;
+	}
+
+	/*
+	 * Calculate the first block and the number of blocks we must scan.
+	 * We could be more aggressive here and perform some more validation
+	 * to try and further narrow the scope of blocks to scan by checking
+	 * if the lowerItem has an offset above MaxOffsetNumber.  In this
+	 * case, we could advance startBlk by one.  Likewise if highestItem
+	 * has an offset of 0 we could scan one fewer blocks.  However, such
+	 * an optimization does not seem worth troubling over, currently.
+	 */
+	startBlk = ItemPointerGetBlockNumberNoCheck(&lowestItem);
+
+	numBlks = ItemPointerGetBlockNumberNoCheck(&highestItem) -
+				ItemPointerGetBlockNumberNoCheck(&lowestItem) + 1;
+
+	/* Set the start block and number of blocks to scan */
+	heap_setscanlimits(sscan, startBlk, numBlks);
+
+	/* Finally, set the TID range in sscan */
+	ItemPointerCopy(&lowestItem, &sscan->rs_mintid);
+	ItemPointerCopy(&highestItem, &sscan->rs_maxtid);
+}
+
+bool
+heap_tid_range_nextslot(TableScanDesc sscan, ScanDirection direction,
+						TupleTableSlot *slot)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+	ItemPointer mintid = &sscan->rs_mintid;
+	ItemPointer maxtid = &sscan->rs_maxtid;
+
+	/* Note: no locking manipulations needed */
+	for (;;)
+	{
+		if (sscan->rs_flags & SO_ALLOW_PAGEMODE)
+			heapgettup_pagemode(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+		else
+			heapgettup(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+
+		if (scan->rs_ctup.t_data == NULL)
+		{
+			ExecClearTuple(slot);
+			return false;
+		}
+
+		/*
+		 * heap_set_tid_range will have used heap_setscanlimits to limit the
+		 * range of pages we scan to only ones that can contain the TID range
+		 * we're scanning for.  Here we must filter out any tuples from these
+		 * pages that are outwith that range.
+		 */
+		if (ItemPointerCompare(&scan->rs_ctup.t_self, mintid) < 0)
+		{
+			ExecClearTuple(slot);
+
+			/*
+			 * When scanning backwards, the TIDs will be in descending order.
+			 * Future tuples in this direction will be lower still, so we can
+			 * just return false to indicate there will be no more tuples.
+			 */
+			if (ScanDirectionIsBackward(direction))
+				return false;
+
+			continue;
+		}
+
+		/*
+		 * Likewise for the final page, we must filter out tids greater than
+		 * maxtid.
+		 */
+		if (ItemPointerCompare(&scan->rs_ctup.t_self, maxtid) > 0)
+		{
+			ExecClearTuple(slot);
+
+			/*
+			 * When scanning forward, the TIDs will be in ascending order.
+			 * Future tuples in this direction will be higher still, so we can
+			 * just return false to indicate there will be no more tuples.
+			 */
+			if (ScanDirectionIsForward(direction))
+				return false;
+			continue;
+		}
+
+		break;
+	}
+
+	/*
+	 * if we get here it means we have a new current scan tuple, so point to
+	 * the proper return buffer and return the tuple.
+	 */
+	pgstat_count_heap_getnext(scan->rs_base.rs_rd);
+
+	ExecStoreBufferHeapTuple(&scan->rs_ctup, slot, scan->rs_cbuf);
+	return true;
+}
+
 /*
  *	heap_fetch		- retrieve tuple with given tid
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 4a70e20a14..792ebc926f 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2542,6 +2542,9 @@ static const TableAmRoutine heapam_methods = {
 	.scan_rescan = heap_rescan,
 	.scan_getnextslot = heap_getnextslot,
 
+	.scan_set_tid_range = heap_set_tid_range,
+	.scan_tid_range_nextslot = heap_tid_range_nextslot,
+
 	.parallelscan_estimate = table_block_parallelscan_estimate,
 	.parallelscan_initialize = table_block_parallelscan_initialize,
 	.parallelscan_reinitialize = table_block_parallelscan_reinitialize,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5d7eb3574c..3f2ebd3b72 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1057,6 +1057,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1223,6 +1224,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_TidScan:
 			pname = sname = "Tid Scan";
 			break;
+		case T_TidRangeScan:
+			pname = sname = "Tid Range Scan";
+			break;
 		case T_SubqueryScan:
 			pname = sname = "Subquery Scan";
 			break;
@@ -1417,6 +1421,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SampleScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1871,6 +1876,23 @@ ExplainNode(PlanState *planstate, List *ancestors,
 											   planstate, es);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				/*
+				 * The tidrangequals list has AND semantics, so be sure to
+				 * show it as an AND condition.
+				 */
+				List	   *tidquals = ((TidRangeScan *) plan)->tidrangequals;
+
+				if (list_length(tidquals) > 1)
+					tidquals = list_make1(make_andclause(tidquals));
+				show_scan_qual(tidquals, "TID Cond", planstate, ancestors, es);
+				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+				if (plan->qual)
+					show_instrumentation_count("Rows Removed by Filter", 1,
+											   planstate, es);
+			}
+			break;
 		case T_ForeignScan:
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
@@ -3558,6 +3580,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..74ac59faa1 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -67,6 +67,7 @@ OBJS = \
 	nodeSubplan.o \
 	nodeSubqueryscan.o \
 	nodeTableFuncscan.o \
+	nodeTidrangescan.o \
 	nodeTidscan.o \
 	nodeUnique.o \
 	nodeValuesscan.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 23bdb53cd1..4543ac79ed 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -51,6 +51,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecReScanTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecReScanSubqueryScan((SubqueryScanState *) node);
 			break;
@@ -562,6 +567,7 @@ ExecSupportsBackwardScan(Plan *node)
 
 		case T_SeqScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_FunctionScan:
 		case T_ValuesScan:
 		case T_CteScan:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 414df50a05..29766d8196 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -109,6 +109,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -238,6 +239,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 												   estate, eflags);
 			break;
 
+		case T_TidRangeScan:
+			result = (PlanState *) ExecInitTidRangeScan((TidRangeScan *) node,
+														estate, eflags);
+			break;
+
 		case T_SubqueryScan:
 			result = (PlanState *) ExecInitSubqueryScan((SubqueryScan *) node,
 														estate, eflags);
@@ -637,6 +643,10 @@ ExecEndNode(PlanState *node)
 			ExecEndTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecEndTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecEndSubqueryScan((SubqueryScanState *) node);
 			break;
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
new file mode 100644
index 0000000000..7aba549255
--- /dev/null
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -0,0 +1,413 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.c
+ *	  Routines to support tid range scans of relations
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeTidrangescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "access/tableam.h"
+#include "catalog/pg_operator.h"
+#include "executor/execdebug.h"
+#include "executor/nodeTidrangescan.h"
+#include "nodes/nodeFuncs.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+
+
+#define IsCTIDVar(node)  \
+	((node) != NULL && \
+	 IsA((node), Var) && \
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
+	 ((Var *) (node))->varlevelsup == 0)
+
+typedef enum
+{
+	TIDEXPR_UPPER_BOUND,
+	TIDEXPR_LOWER_BOUND
+} TidExprType;
+
+/* Upper or lower range bound for scan */
+typedef struct TidOpExpr
+{
+	TidExprType exprtype;		/* type of op; lower or upper */
+	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
+	bool		inclusive;		/* whether op is inclusive */
+} TidOpExpr;
+
+/*
+ * For the given 'expr', build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.
+ */
+static TidOpExpr *
+MakeTidOpExpr(OpExpr *expr, TidRangeScanState *tidstate)
+{
+	Node	   *arg1 = get_leftop((Expr *) expr);
+	Node	   *arg2 = get_rightop((Expr *) expr);
+	ExprState  *exprstate = NULL;
+	bool		invert = false;
+	TidOpExpr  *tidopexpr;
+
+	if (IsCTIDVar(arg1))
+		exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
+	else if (IsCTIDVar(arg2))
+	{
+		exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
+		invert = true;
+	}
+	else
+		elog(ERROR, "could not identify CTID variable");
+
+	tidopexpr = (TidOpExpr *) palloc(sizeof(TidOpExpr));
+	tidopexpr->inclusive = false;		/* for now */
+
+	switch (expr->opno)
+	{
+		case TIDLessEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDLessOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
+			break;
+		case TIDGreaterEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDGreaterOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
+			break;
+		default:
+			elog(ERROR, "could not identify CTID operator");
+	}
+
+	tidopexpr->exprstate = exprstate;
+
+	return tidopexpr;
+}
+
+/*
+ * Extract the qual subexpressions that yield TIDs to search for,
+ * and compile them into ExprStates if they're ordinary expressions.
+ */
+static void
+TidExprListCreate(TidRangeScanState *tidrangestate)
+{
+	TidRangeScan *node = (TidRangeScan *) tidrangestate->ss.ps.plan;
+	List	   *tidexprs = NIL;
+	ListCell   *l;
+
+	foreach(l, node->tidrangequals)
+	{
+		OpExpr	   *opexpr = lfirst(l);
+		TidOpExpr  *tidopexpr;
+
+		if (!IsA(opexpr, OpExpr))
+			elog(ERROR, "could not identify CTID expression");
+
+		tidopexpr = MakeTidOpExpr(opexpr, tidrangestate);
+		tidexprs = lappend(tidexprs, tidopexpr);
+	}
+
+	tidrangestate->trss_tidexprs = tidexprs;
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeEval
+ *
+ *		Compute and set node's block and offset range to scan by evaluating
+ *		the trss_tidexprs.  Returns false if we detect the range cannot
+ *		contain any tuples.  Returns true if it's possible for the range to
+ *		contain tuples.
+ * ----------------------------------------------------------------
+ */
+static bool
+TidRangeEval(TidRangeScanState *node)
+{
+	ExprContext *econtext = node->ss.ps.ps_ExprContext;
+	ItemPointerData lowerBound;
+	ItemPointerData upperBound;
+	ListCell   *l;
+
+	/*
+	 * Set the upper and lower bounds to the absolute limits of the range of
+	 * the ItemPointer type.  Below we'll try to narrow this range on either
+	 * side by looking at the TidOpExprs.
+	 */
+	ItemPointerSet(&lowerBound, 0, 0);
+	ItemPointerSet(&upperBound, InvalidBlockNumber, PG_UINT16_MAX);
+
+	foreach(l, node->trss_tidexprs)
+	{
+		TidOpExpr  *tidopexpr = (TidOpExpr *) lfirst(l);
+		ItemPointer itemptr;
+		bool		isNull;
+
+		/* Evaluate this bound. */
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidopexpr->exprstate,
+													  econtext,
+													  &isNull));
+
+		/* If the bound is NULL, *nothing* matches the qual. */
+		if (isNull)
+			return false;
+
+		if (tidopexpr->exprtype == TIDEXPR_LOWER_BOUND)
+		{
+			ItemPointerData lb;
+
+			ItemPointerCopy(itemptr, &lb);
+
+			/*
+			 * Normalize non-inclusive ranges to become inclusive.  The
+			 * resulting ItemPointer here may not be a valid item pointer.
+			 */
+			if (!tidopexpr->inclusive)
+				ItemPointerInc(&lb);
+
+			/* Check if we can narrow the range using this qual */
+			if (ItemPointerCompare(&lb, &lowerBound) > 0)
+				ItemPointerCopy(&lb, &lowerBound);
+		}
+
+		else if (tidopexpr->exprtype == TIDEXPR_UPPER_BOUND)
+		{
+			ItemPointerData ub;
+
+			ItemPointerCopy(itemptr, &ub);
+
+			/*
+			 * Normalize non-inclusive ranges to become inclusive.  The
+			 * resulting ItemPointer here may not be a valid item pointer.
+			 */
+			if (!tidopexpr->inclusive)
+				ItemPointerDec(&ub);
+
+			/* Check if we can narrow the range using this qual */
+			if (ItemPointerCompare(&ub, &upperBound) < 0)
+				ItemPointerCopy(&ub, &upperBound);
+		}
+	}
+
+	ItemPointerCopy(&lowerBound, &node->trss_mintid);
+	ItemPointerCopy(&upperBound, &node->trss_maxtid);
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeNext
+ *
+ *		Retrieve a tuple from the TidRangeScan node's currentRelation
+ *		using the tids in the TidRangeScanState information.
+ *
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+TidRangeNext(TidRangeScanState *node)
+{
+	TableScanDesc scandesc;
+	EState	   *estate;
+	ScanDirection direction;
+	TupleTableSlot *slot;
+
+	/*
+	 * extract necessary information from tid scan node
+	 */
+	scandesc = node->ss.ss_currentScanDesc;
+	estate = node->ss.ps.state;
+	slot = node->ss.ss_ScanTupleSlot;
+	direction = estate->es_direction;
+
+	if (!node->trss_inScan)
+	{
+		/* First time through, compute TID range to scan */
+		if (!TidRangeEval(node))
+			return NULL;
+
+		if (scandesc == NULL)
+		{
+			scandesc = table_tid_range_start(node->ss.ss_currentRelation,
+											 estate->es_snapshot,
+											 &node->trss_mintid,
+											 &node->trss_maxtid);
+			node->ss.ss_currentScanDesc = scandesc;
+		}
+		else
+			table_set_tid_range(scandesc, &node->trss_mintid,
+								&node->trss_maxtid);
+
+		node->trss_inScan = true;
+	}
+
+	/* Fetch the next tuple. */
+	if (!table_tid_range_nextslot(scandesc, direction, slot))
+	{
+		node->trss_inScan = false;
+		ExecClearTuple(slot);
+	}
+
+	return slot;
+}
+
+/*
+ * TidRangeRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecTidRangeScan(node)
+ *
+ *		Scans the relation using tids and returns the next qualifying tuple.
+ *		We call the ExecScan() routine and pass it the appropriate
+ *		access method functions.
+ *
+ *		Conditions:
+ *		  -- the "cursor" maintained by the AMI is positioned at the tuple
+ *			 returned previously.
+ *
+ *		Initial States:
+ *		  -- the relation indicated is opened for scanning so that the
+ *			 "cursor" is positioned before the first qualifying tuple.
+ *		  -- trss_startBlock is InvalidBlockNumber
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecTidRangeScan(PlanState *pstate)
+{
+	TidRangeScanState *node = castNode(TidRangeScanState, pstate);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) TidRangeNext,
+					(ExecScanRecheckMtd) TidRangeRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanTidRangeScan(node)
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanTidRangeScan(TidRangeScanState *node)
+{
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+
+	if (scan != NULL)
+		table_rescan(scan, NULL);
+
+	/* mark scan as not in progress, and tid range list as not computed yet */
+	node->trss_inScan = false;
+
+	ExecScanReScan(&node->ss);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecEndTidRangeScan
+ *
+ *		Releases any storage allocated through C routines.
+ *		Returns nothing.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndTidRangeScan(TidRangeScanState *node)
+{
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+
+	if (scan != NULL)
+		table_endscan(scan);
+
+	/*
+	 * Free the exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitTidRangeScan
+ *
+ *		Initializes the tid range scan's state information, creates
+ *		scan keys, and opens the base and tid relations.
+ *
+ *		Parameters:
+ *		  node: TidRangeScan node produced by the planner.
+ *		  estate: the execution state initialized in InitPlan.
+ * ----------------------------------------------------------------
+ */
+TidRangeScanState *
+ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
+{
+	TidRangeScanState *tidrangestate;
+	Relation	currentRelation;
+
+	/*
+	 * create state structure
+	 */
+	tidrangestate = makeNode(TidRangeScanState);
+	tidrangestate->ss.ps.plan = (Plan *) node;
+	tidrangestate->ss.ps.state = estate;
+	tidrangestate->ss.ps.ExecProcNode = ExecTidRangeScan;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &tidrangestate->ss.ps);
+
+	/*
+	 * mark scan as not in progress, and tid range as not computed yet
+	 */
+	tidrangestate->trss_inScan = false;
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	tidrangestate->ss.ss_currentRelation = currentRelation;
+	tidrangestate->ss.ss_currentScanDesc = NULL;	/* no table scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &tidrangestate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTypeTL(&tidrangestate->ss.ps);
+	ExecAssignScanProjectionInfo(&tidrangestate->ss);
+
+	/*
+	 * initialize child expressions
+	 */
+	tidrangestate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) tidrangestate);
+
+	TidExprListCreate(tidrangestate);
+
+	/*
+	 * all done.
+	 */
+	return tidrangestate;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 65bbc18ecb..aaba1ec2c4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -585,6 +585,27 @@ _copyTidScan(const TidScan *from)
 	return newnode;
 }
 
+/*
+ * _copyTidRangeScan
+ */
+static TidRangeScan *
+_copyTidRangeScan(const TidRangeScan *from)
+{
+	TidRangeScan *newnode = makeNode(TidRangeScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_NODE_FIELD(tidrangequals);
+
+	return newnode;
+}
+
 /*
  * _copySubqueryScan
  */
@@ -4938,6 +4959,9 @@ copyObjectImpl(const void *from)
 		case T_TidScan:
 			retval = _copyTidScan(from);
 			break;
+		case T_TidRangeScan:
+			retval = _copyTidRangeScan(from);
+			break;
 		case T_SubqueryScan:
 			retval = _copySubqueryScan(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f5dcedf6e8..3347a7cc0f 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -608,6 +608,16 @@ _outTidScan(StringInfo str, const TidScan *node)
 	WRITE_NODE_FIELD(tidquals);
 }
 
+static void
+_outTidRangeScan(StringInfo str, const TidRangeScan *node)
+{
+	WRITE_NODE_TYPE("TIDRANGESCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_NODE_FIELD(tidrangequals);
+}
+
 static void
 _outSubqueryScan(StringInfo str, const SubqueryScan *node)
 {
@@ -3810,6 +3820,9 @@ outNode(StringInfo str, const void *obj)
 			case T_TidScan:
 				_outTidScan(str, obj);
 				break;
+			case T_TidRangeScan:
+				_outTidRangeScan(str, obj);
+				break;
 			case T_SubqueryScan:
 				_outSubqueryScan(str, obj);
 				break;
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index efb52858c8..4a6c348162 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -374,6 +374,7 @@ RelOptInfo      - a relation or joined relations
   IndexPath     - index scan
   BitmapHeapPath - top of a bitmapped index scan
   TidPath       - scan by CTID
+  TidRangePath  - scan a contiguous range of CTIDs
   SubqueryScanPath - scan a subquery-in-FROM
   ForeignPath   - scan a foreign table, foreign join or foreign upper-relation
   CustomPath    - for custom scan providers
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index aab06c7d21..744a9aed3e 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1283,6 +1283,101 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_tidrangescan
+ *	  Determines and sets the costs of scanning a relation using a range of
+ *	  TIDs for 'path'
+ *
+ * 'baserel' is the relation to be scanned
+ * 'tidrangequals' is the list of TID-checkable range quals
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_tidrangescan(Path *path, PlannerInfo *root,
+				  RelOptInfo *baserel, List *tidrangequals,
+				  ParamPathInfo *param_info)
+{
+	Selectivity selectivity;
+	double		pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	QualCost	tid_qual_cost;
+	double		ntuples;
+	double		nseqpages;
+	double		spc_random_page_cost;
+	double		spc_seq_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (param_info)
+		path->rows = param_info->ppi_rows;
+	else
+		path->rows = baserel->rows;
+
+	/* Count how many tuples and pages we expect to scan */
+	selectivity = clauselist_selectivity(root, tidrangequals, baserel->relid,
+										 JOIN_INNER, NULL);
+	pages = ceil(selectivity * baserel->pages);
+
+	if (pages <= 0.0)
+		pages = 1.0;
+
+	/*
+	 * The first page in a range requires a random seek, but each subsequent
+	 * page is just a normal sequential page read. NOTE: it's desirable for
+	 * Tid Range Scans to cost more than the equivalent Sequential Scans,
+	 * because Seq Scans have some performance advantages such as scan
+	 * synchronization and parallelizability, and we'd prefer one of them to
+	 * be picked unless a Tid Range Scan really is better.
+	 */
+	ntuples = selectivity * baserel->tuples;
+	nseqpages = pages - 1.0;
+
+	if (!enable_tidscan)
+		startup_cost += disable_cost;
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrieved tuple.
+	 */
+	cost_qual_eval(&tid_qual_cost, tidrangequals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/* disk costs; 1 random page and the remainder as seq pages */
+	run_cost += spc_random_page_cost + spc_seq_page_cost * nseqpages;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+	/*
+	 * XXX currently we assume TID quals are a subset of qpquals at this
+	 * point; they will be removed (if possible) when we create the plan, so
+	 * we subtract their cost from the total qpqual cost.  (If the TID quals
+	 * can't be removed, this is a mistake and we're going to underestimate
+	 * the CPU cost a bit.)
+	 */
+	startup_cost += qpqual_cost.startup + tid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		tid_qual_cost.per_tuple;
+	run_cost += cpu_per_tuple * ntuples;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->pathtarget->cost.startup;
+	run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+	path->startup_cost = startup_cost;
+	path->total_cost = startup_cost + run_cost;
+}
+
 /*
  * cost_subqueryscan
  *	  Determines and returns the cost of scanning a subquery RTE.
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 0845b460e2..41d86e42e0 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -2,9 +2,9 @@
  *
  * tidpath.c
  *	  Routines to determine which TID conditions are usable for scanning
- *	  a given relation, and create TidPaths accordingly.
+ *	  a given relation, and create TidPaths and TidRangePaths accordingly.
  *
- * What we are looking for here is WHERE conditions of the form
+ * For TidPaths, we look for WHERE conditions of the form
  * "CTID = pseudoconstant", which can be implemented by just fetching
  * the tuple directly via heap_fetch().  We can also handle OR'd conditions
  * such as (CTID = const1) OR (CTID = const2), as well as ScalarArrayOpExpr
@@ -23,6 +23,9 @@
  * a function, but in practice it works better to keep the special node
  * representation all the way through to execution.
  *
+ * Additionally, TidRangePaths may be created for conditions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=, and
+ * AND-clauses composed of such conditions.
  *
  * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -63,14 +66,14 @@ IsCTIDVar(Var *var, RelOptInfo *rel)
 
 /*
  * Check to see if a RestrictInfo is of the form
- *		CTID = pseudoconstant
+ *		CTID OP pseudoconstant
  * or
- *		pseudoconstant = CTID
- * where the CTID Var belongs to relation "rel", and nothing on the
- * other side of the clause does.
+ *		pseudoconstant OP CTID
+ * where OP is a binary operation, the CTID Var belongs to relation "rel",
+ * and nothing on the other side of the clause does.
  */
 static bool
-IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+IsBinaryTidClause(RestrictInfo *rinfo, RelOptInfo *rel)
 {
 	OpExpr	   *node;
 	Node	   *arg1,
@@ -83,10 +86,9 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 		return false;
 	node = (OpExpr *) rinfo->clause;
 
-	/* Operator must be tideq */
-	if (node->opno != TIDEqualOperator)
+	/* OpExpr must have two arguments */
+	if (list_length(node->args) != 2)
 		return false;
-	Assert(list_length(node->args) == 2);
 	arg1 = linitial(node->args);
 	arg2 = lsecond(node->args);
 
@@ -116,6 +118,50 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 	return true;				/* success */
 }
 
+/*
+ * Check to see if a RestrictInfo is of the form
+ *		CTID = pseudoconstant
+ * or
+ *		pseudoconstant = CTID
+ * where the CTID Var belongs to relation "rel", and nothing on the
+ * other side of the clause does.
+ */
+static bool
+IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	if (!IsBinaryTidClause(rinfo, rel))
+		return false;
+
+	if (((OpExpr *) rinfo->clause)->opno == TIDEqualOperator)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check to see if a RestrictInfo is of the form
+ *		CTID OP pseudoconstant
+ * or
+ *		pseudoconstant OP CTID
+ * where OP is a range operator such as <, <=, >, or >=, the CTID Var belongs
+ * to relation "rel", and nothing on the other side of the clause does.
+ */
+static bool
+IsTidRangeClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	Oid			opno;
+
+	if (!IsBinaryTidClause(rinfo, rel))
+		return false;
+	opno = ((OpExpr *) rinfo->clause)->opno;
+
+	if (opno == TIDLessOperator || opno == TIDLessEqOperator ||
+		opno == TIDGreaterOperator || opno == TIDGreaterEqOperator)
+		return true;
+
+	return false;
+}
+
 /*
  * Check to see if a RestrictInfo is of the form
  *		CTID = ANY (pseudoconstant_array)
@@ -222,7 +268,7 @@ TidQualFromRestrictInfo(PlannerInfo *root, RestrictInfo *rinfo, RelOptInfo *rel)
  *
  * Returns a List of CTID qual RestrictInfos for the specified rel (with
  * implicit OR semantics across the list), or NIL if there are no usable
- * conditions.
+ * equality conditions.
  *
  * This function is just concerned with handling AND/OR recursion.
  */
@@ -301,6 +347,34 @@ TidQualFromRestrictInfoList(PlannerInfo *root, List *rlist, RelOptInfo *rel)
 	return rlst;
 }
 
+/*
+ * Extract a set of CTID range conditions from implicit-AND List of RestrictInfos
+ *
+ * Returns a List of CTID range qual RestrictInfos for the specified rel
+ * (with implicit AND semantics across the list), or NIL if there are no
+ * usable range conditions or if the rel's table AM does not support TID range
+ * scans.
+ */
+static List *
+TidRangeQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
+{
+	List	   *rlst = NIL;
+	ListCell   *l;
+
+	if ((rel->amflags & AMFLAG_HAS_TID_RANGE) == 0)
+		return NIL;
+
+	foreach(l, rlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (IsTidRangeClause(rinfo, rel))
+			rlst = lappend(rlst, rinfo);
+	}
+
+	return rlst;
+}
+
 /*
  * Given a list of join clauses involving our rel, create a parameterized
  * TidPath for each one that is a suitable TidEqual clause.
@@ -385,6 +459,7 @@ void
 create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 {
 	List	   *tidquals;
+	List	   *tidrangequals;
 
 	/*
 	 * If any suitable quals exist in the rel's baserestrict list, generate a
@@ -404,6 +479,26 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 												   required_outer));
 	}
 
+	/*
+	 * If there are range quals in the baserestrict list, generate a
+	 * TidRangePath.
+	 */
+	tidrangequals = TidRangeQualFromRestrictInfoList(rel->baserestrictinfo,
+													 rel);
+
+	if (tidrangequals)
+	{
+		/*
+		 * This path uses no join clauses, but it could still have required
+		 * parameterization due to LATERAL refs in its tlist.
+		 */
+		Relids		required_outer = rel->lateral_relids;
+
+		add_path(rel, (Path *) create_tidrangescan_path(root, rel,
+														tidrangequals,
+														required_outer));
+	}
+
 	/*
 	 * Try to generate parameterized TidPaths using equality clauses extracted
 	 * from EquivalenceClasses.  (This is important since simple "t1.ctid =
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 6c8305c977..906cab7053 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -129,6 +129,10 @@ static Plan *create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
 static void bitmap_subplan_mark_shared(Plan *plan);
 static TidScan *create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 									List *tlist, List *scan_clauses);
+static TidRangeScan *create_tidrangescan_plan(PlannerInfo *root,
+											  TidRangePath *best_path,
+											  List *tlist,
+											  List *scan_clauses);
 static SubqueryScan *create_subqueryscan_plan(PlannerInfo *root,
 											  SubqueryScanPath *best_path,
 											  List *tlist, List *scan_clauses);
@@ -193,6 +197,8 @@ static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 											Index scanrelid);
 static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
 							 List *tidquals);
+static TidRangeScan *make_tidrangescan(List *qptlist, List *qpqual,
+									   Index scanrelid, List *tidrangequals);
 static SubqueryScan *make_subqueryscan(List *qptlist,
 									   List *qpqual,
 									   Index scanrelid,
@@ -384,6 +390,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -679,6 +686,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												scan_clauses);
 			break;
 
+		case T_TidRangeScan:
+			plan = (Plan *) create_tidrangescan_plan(root,
+													 (TidRangePath *) best_path,
+													 tlist,
+													 scan_clauses);
+			break;
+
 		case T_SubqueryScan:
 			plan = (Plan *) create_subqueryscan_plan(root,
 													 (SubqueryScanPath *) best_path,
@@ -3436,6 +3450,71 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_tidrangescan_plan
+ *	 Returns a tidrangescan plan for the base relation scanned by 'best_path'
+ *	 with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static TidRangeScan *
+create_tidrangescan_plan(PlannerInfo *root, TidRangePath *best_path,
+						 List *tlist, List *scan_clauses)
+{
+	TidRangeScan *scan_plan;
+	Index		scan_relid = best_path->path.parent->relid;
+	List	   *tidrangequals = best_path->tidrangequals;
+
+	/* it should be a base rel... */
+	Assert(scan_relid > 0);
+	Assert(best_path->path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * The qpqual list must contain all restrictions not enforced by the
+	 * tidrangequals list.  tidrangequals has AND semantics, so we can simply
+	 * remove any qual that appears in it.
+	 */
+	{
+		List	   *qpqual = NIL;
+		ListCell   *l;
+
+		foreach(l, scan_clauses)
+		{
+			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+			if (rinfo->pseudoconstant)
+				continue;		/* we may drop pseudoconstants here */
+			if (list_member_ptr(tidrangequals, rinfo))
+				continue;		/* simple duplicate */
+			qpqual = lappend(qpqual, rinfo);
+		}
+		scan_clauses = qpqual;
+	}
+
+	/* Sort clauses into best execution order */
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+
+	/* Reduce RestrictInfo lists to bare expressions; ignore pseudoconstants */
+	tidrangequals = extract_actual_clauses(tidrangequals, false);
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/* Replace any outer-relation variables with nestloop params */
+	if (best_path->path.param_info)
+	{
+		tidrangequals = (List *)
+			replace_nestloop_params(root, (Node *) tidrangequals);
+		scan_clauses = (List *)
+			replace_nestloop_params(root, (Node *) scan_clauses);
+	}
+
+	scan_plan = make_tidrangescan(tlist,
+								  scan_clauses,
+								  scan_relid,
+								  tidrangequals);
+
+	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
+
+	return scan_plan;
+}
+
 /*
  * create_subqueryscan_plan
  *	 Returns a subqueryscan plan for the base relation scanned by 'best_path'
@@ -5369,6 +5448,25 @@ make_tidscan(List *qptlist,
 	return node;
 }
 
+static TidRangeScan *
+make_tidrangescan(List *qptlist,
+				  List *qpqual,
+				  Index scanrelid,
+				  List *tidrangequals)
+{
+	TidRangeScan *node = makeNode(TidRangeScan);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->tidrangequals = tidrangequals;
+
+	return node;
+}
+
 static SubqueryScan *
 make_subqueryscan(List *qptlist,
 				  List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index c3c36be13e..42f088ad71 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -619,6 +619,22 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 								  rtoffset, 1);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				TidRangeScan *splan = (TidRangeScan *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist,
+								  rtoffset, NUM_EXEC_TLIST(plan));
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+				splan->tidrangequals =
+					fix_scan_list(root, splan->tidrangequals,
+								  rtoffset, 1);
+			}
+			break;
 		case T_SubqueryScan:
 			/* Needs special treatment, see comments below */
 			return set_subqueryscan_references(root,
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 54ef61bfb3..f3e46e0959 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2367,6 +2367,12 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_TidRangeScan:
+			finalize_primnode((Node *) ((TidRangeScan *) plan)->tidrangequals,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			break;
+
 		case T_SubqueryScan:
 			{
 				SubqueryScan *sscan = (SubqueryScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 9be0c4a6af..6a66e23351 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1203,6 +1203,35 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 	return pathnode;
 }
 
+/*
+ * create_tidscan_path
+ *	  Creates a path corresponding to a scan by a range of TIDs, returning
+ *	  the pathnode.
+ */
+TidRangePath *
+create_tidrangescan_path(PlannerInfo *root, RelOptInfo *rel,
+						 List *tidrangequals, Relids required_outer)
+{
+	TidRangePath *pathnode = makeNode(TidRangePath);
+
+	pathnode->path.pathtype = T_TidRangeScan;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_workers = 0;
+	pathnode->path.pathkeys = NIL;	/* always unordered */
+
+	pathnode->tidrangequals = tidrangequals;
+
+	cost_tidrangescan(&pathnode->path, root, rel, tidrangequals,
+					  pathnode->path.param_info);
+
+	return pathnode;
+}
+
 /*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 177e6e336a..c4262bf697 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -467,6 +467,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	/* Collect info about relation's foreign keys, if relevant */
 	get_relation_foreign_keys(root, rel, relation, inhparent);
 
+	/* Collect info about functions implemented by the rel's table AM. */
+	if (relation->rd_tableam &&
+		relation->rd_tableam->scan_set_tid_range != NULL &&
+		relation->rd_tableam->scan_tid_range_nextslot != NULL)
+		rel->amflags |= AMFLAG_HAS_TID_RANGE;
+
 	/*
 	 * Collect info about relation's partitioning scheme, if any. Only
 	 * inheritance parents may be partitioned.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 731ff708b9..345c877aeb 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -234,6 +234,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->subroot = NULL;
 	rel->subplan_params = NIL;
 	rel->rel_parallel_workers = -1; /* set up in get_relation_info */
+	rel->amflags = 0;
 	rel->serverid = InvalidOid;
 	rel->userid = rte->checkAsUser;
 	rel->useridiscurrent = false;
@@ -646,6 +647,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->subroot = NULL;
 	joinrel->subplan_params = NIL;
 	joinrel->rel_parallel_workers = -1;
+	joinrel->amflags = 0;
 	joinrel->serverid = InvalidOid;
 	joinrel->userid = InvalidOid;
 	joinrel->useridiscurrent = false;
@@ -826,6 +828,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	joinrel->eclass_indexes = NULL;
 	joinrel->subroot = NULL;
 	joinrel->subplan_params = NIL;
+	joinrel->amflags = 0;
 	joinrel->serverid = InvalidOid;
 	joinrel->userid = InvalidOid;
 	joinrel->useridiscurrent = false;
diff --git a/src/backend/storage/page/itemptr.c b/src/backend/storage/page/itemptr.c
index 55759c383b..c7aec7dbd9 100644
--- a/src/backend/storage/page/itemptr.c
+++ b/src/backend/storage/page/itemptr.c
@@ -71,3 +71,61 @@ ItemPointerCompare(ItemPointer arg1, ItemPointer arg2)
 	else
 		return 0;
 }
+
+/*
+ * ItemPointerInc
+ *		Increment 'pointer' by 1 only paying attention to the ItemPointer's
+ *		type's range limits and not MaxOffsetNumber and FirstOffsetNumber.
+ *		This may result in 'pointer' becoming !OffsetNumberIsValid.
+ *
+ * If the pointer is already the maximum possible values permitted by the
+ * range of the ItemPointer's types, then do nothing.
+ */
+void
+ItemPointerInc(ItemPointer pointer)
+{
+	BlockNumber blk = ItemPointerGetBlockNumberNoCheck(pointer);
+	OffsetNumber off = ItemPointerGetOffsetNumberNoCheck(pointer);
+
+	if (off == PG_UINT16_MAX)
+	{
+		if (blk != InvalidBlockNumber)
+		{
+			off = 0;
+			blk++;
+		}
+	}
+	else
+		off++;
+
+	ItemPointerSet(pointer, blk, off);
+}
+
+/*
+ * ItemPointerDec
+ *		Decrement 'pointer' by 1 only paying attention to the ItemPointer's
+ *		type's range limits and not MaxOffsetNumber and FirstOffsetNumber.
+ *		This may result in 'pointer' becoming !OffsetNumberIsValid.
+ *
+ * If the pointer is already the minimum possible values permitted by the
+ * range of the ItemPointer's types, then do nothing.
+ */
+void
+ItemPointerDec(ItemPointer pointer)
+{
+	BlockNumber blk = ItemPointerGetBlockNumberNoCheck(pointer);
+	OffsetNumber off = ItemPointerGetOffsetNumberNoCheck(pointer);
+
+	if (off == 0)
+	{
+		if (blk != 0)
+		{
+			off = PG_UINT16_MAX;
+			blk--;
+		}
+	}
+	else
+		off--;
+
+	ItemPointerSet(pointer, blk, off);
+}
\ No newline at end of file
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index d96a47b1ce..63cae778f1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -121,7 +121,11 @@ extern void heap_endscan(TableScanDesc scan);
 extern HeapTuple heap_getnext(TableScanDesc scan, ScanDirection direction);
 extern bool heap_getnextslot(TableScanDesc sscan,
 							 ScanDirection direction, struct TupleTableSlot *slot);
-
+extern void heap_set_tid_range(TableScanDesc sscan, ItemPointer mintid,
+							   ItemPointer maxtid);
+extern bool heap_tid_range_nextslot(TableScanDesc sscan,
+									ScanDirection direction,
+									TupleTableSlot *slot);
 extern bool heap_fetch(Relation relation, Snapshot snapshot,
 					   HeapTuple tuple, Buffer *userbuf);
 extern bool heap_hot_search_buffer(ItemPointer tid, Relation relation,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 005f3fdd2b..22611eaa98 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -36,6 +36,10 @@ typedef struct TableScanDescData
 	int			rs_nkeys;		/* number of scan keys */
 	struct ScanKeyData *rs_key; /* array of scan key descriptors */
 
+	/* Range of ItemPointers to scan. */
+	ItemPointerData rs_mintid;
+	ItemPointerData rs_maxtid;
+
 	/*
 	 * Information about type and behaviour of the scan, a bitmask of members
 	 * of the ScanOptions enum (see tableam.h).
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 33bffb6815..11c6dabe86 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -49,18 +49,19 @@ typedef enum ScanOptions
 	SO_TYPE_BITMAPSCAN = 1 << 1,
 	SO_TYPE_SAMPLESCAN = 1 << 2,
 	SO_TYPE_TIDSCAN = 1 << 3,
-	SO_TYPE_ANALYZE = 1 << 4,
+	SO_TYPE_TIDRANGESCAN = 1 << 4,
+	SO_TYPE_ANALYZE = 1 << 5,
 
 	/* several of SO_ALLOW_* may be specified */
 	/* allow or disallow use of access strategy */
-	SO_ALLOW_STRAT = 1 << 5,
+	SO_ALLOW_STRAT = 1 << 6,
 	/* report location to syncscan logic? */
-	SO_ALLOW_SYNC = 1 << 6,
+	SO_ALLOW_SYNC = 1 << 7,
 	/* verify visibility page-at-a-time? */
-	SO_ALLOW_PAGEMODE = 1 << 7,
+	SO_ALLOW_PAGEMODE = 1 << 8,
 
 	/* unregister snapshot at scan end? */
-	SO_TEMP_SNAPSHOT = 1 << 8
+	SO_TEMP_SNAPSHOT = 1 << 9
 } ScanOptions;
 
 /*
@@ -325,6 +326,26 @@ typedef struct TableAmRoutine
 									 ScanDirection direction,
 									 TupleTableSlot *slot);
 
+	/*
+	 * Optional functions to provide scanning for ranges of ItemPointers.
+	 * Implementations must either provide both of these functions, or neither
+	 * of them.
+	 *
+	 * Implementations of this function must themselves handle ItemPointers
+	 * of any value. i.e, they must handle each of the following:
+	 *
+	 * 1) mintid or maxtid is beyond the end of the table; and
+	 * 2) mintid is above maxtid; and
+	 * 3) item offset for mintid or maxtid is beyond the maximum offset
+	 * allowed by the AM.
+	 */
+	void		(*scan_set_tid_range) (TableScanDesc sscan,
+									   ItemPointer mintid,
+									   ItemPointer maxtid);
+
+	bool		(*scan_tid_range_nextslot) (TableScanDesc sscan,
+											ScanDirection direction,
+											TupleTableSlot *slot);
 
 	/* ------------------------------------------------------------------------
 	 * Parallel table scan related functions.
@@ -1015,6 +1036,59 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 	return sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, slot);
 }
 
+/* ----------------------------------------------------------------------------
+ * TID Range scanning related functions.
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * table_tid_range_start is the entry point for setting up a TableScanDesc for
+ * a Tid Range scan.
+ */
+static inline TableScanDesc
+table_tid_range_start(Relation rel, Snapshot snapshot,
+					  ItemPointer mintid,
+					  ItemPointer maxtid)
+{
+	TableScanDesc sscan;
+	uint32 flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+
+	sscan = rel->rd_tableam->scan_begin(rel, snapshot, 0, NULL, NULL, flags);
+
+	/* Set the range of TIDs to scan */
+	sscan->rs_rd->rd_tableam->scan_set_tid_range(sscan, mintid, maxtid);
+
+	return sscan;
+}
+
+/*
+ * table_set_tid_range resets the minimum and maximum TID range to scan for a
+ * TableScanDesc created by table_tid_range_start.
+ */
+static inline void
+table_set_tid_range(TableScanDesc sscan, ItemPointer mintid,
+					ItemPointer maxtid)
+{
+	/* Ensure table_tid_range_start() was used. */
+	Assert((sscan->rs_flags & SO_TYPE_TIDRANGESCAN) != 0);
+
+	sscan->rs_rd->rd_tableam->scan_set_tid_range(sscan, mintid, maxtid);
+}
+/*
+ * Fetch the next tuple from `sscan` for a TID range scan.  Stores the tuple
+ * slot and return true, or returns false if no more tuples exist.
+ */
+static inline bool
+table_tid_range_nextslot(TableScanDesc sscan, ScanDirection direction,
+						 TupleTableSlot *slot)
+{
+	/* Ensure the TID range was properly set */
+	Assert((sscan->rs_flags & SO_TYPE_TIDRANGESCAN) != 0);
+
+	return sscan->rs_rd->rd_tableam->scan_tid_range_nextslot(sscan, direction,
+															 slot);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Parallel table scan related functions.
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 0d4eac8f96..85395a81ee 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -237,15 +237,15 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git a/src/include/executor/nodeTidrangescan.h b/src/include/executor/nodeTidrangescan.h
new file mode 100644
index 0000000000..e53783a3bf
--- /dev/null
+++ b/src/include/executor/nodeTidrangescan.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeTidrangescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODETIDRANGESCAN_H
+#define NODETIDRANGESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern TidRangeScanState *ExecInitTidRangeScan(TidRangeScan *node,
+											   EState *estate, int eflags);
+extern void ExecEndTidRangeScan(TidRangeScanState *node);
+extern void ExecReScanTidRangeScan(TidRangeScanState *node);
+
+#endif							/* NODETIDRANGESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d65099c94a..dba1cea745 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1617,6 +1617,24 @@ typedef struct TidScanState
 	HeapTupleData tss_htup;
 } TidScanState;
 
+/* ----------------
+ *	 TidRangeScanState information
+ *
+ *		trss_tidexprs		list of TidOpExpr structs (see nodeTidrangescan.c)
+ *		trss_mintid			the lowest TID in the scan range
+ *		trss_maxtid			the highest TID in the scan range
+ *		trss_inScan			is a scan currently in progress?
+ * ----------------
+ */
+typedef struct TidRangeScanState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	List	   *trss_tidexprs;
+	ItemPointerData trss_mintid;
+	ItemPointerData trss_maxtid;
+	bool		trss_inScan;
+} TidRangeScanState;
+
 /* ----------------
  *	 SubqueryScanState information
  *
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 40ae489c23..e22df890ef 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -59,6 +59,7 @@ typedef enum NodeTag
 	T_BitmapIndexScan,
 	T_BitmapHeapScan,
 	T_TidScan,
+	T_TidRangeScan,
 	T_SubqueryScan,
 	T_FunctionScan,
 	T_ValuesScan,
@@ -116,6 +117,7 @@ typedef enum NodeTag
 	T_BitmapIndexScanState,
 	T_BitmapHeapScanState,
 	T_TidScanState,
+	T_TidRangeScanState,
 	T_SubqueryScanState,
 	T_FunctionScanState,
 	T_TableFuncScanState,
@@ -229,6 +231,7 @@ typedef enum NodeTag
 	T_BitmapAndPath,
 	T_BitmapOrPath,
 	T_TidPath,
+	T_TidRangePath,
 	T_SubqueryScanPath,
 	T_ForeignPath,
 	T_CustomPath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ec93e648c..9a1c873861 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -621,6 +621,10 @@ typedef struct PartitionSchemeData *PartitionScheme;
  * to simplify matching join clauses to those lists.
  *----------
  */
+
+/* Bitmask of flags supported by table AMs */
+#define AMFLAG_HAS_TID_RANGE (1 << 0)
+
 typedef enum RelOptKind
 {
 	RELOPT_BASEREL,
@@ -710,6 +714,8 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	int			amflags;		/* Bitmask of optional features supported by
+								 * the table AM */
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
@@ -1323,6 +1329,18 @@ typedef struct TidPath
 	List	   *tidquals;		/* qual(s) involving CTID = something */
 } TidPath;
 
+/*
+ * TidRangePath represents a scan by a continguous range of TIDs
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ */
+typedef struct TidRangePath
+{
+	Path		path;
+	List	   *tidrangequals;
+} TidRangePath;
+
 /*
  * SubqueryScanPath represents a scan of an unflattened subquery-in-FROM
  *
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 43160439f0..6e62104d0b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -485,6 +485,19 @@ typedef struct TidScan
 	List	   *tidquals;		/* qual(s) involving CTID = something */
 } TidScan;
 
+/* ----------------
+ *		tid range scan node
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ * ----------------
+ */
+typedef struct TidRangeScan
+{
+	Scan		scan;
+	List	   *tidrangequals;	/* qual(s) involving CTID op something */
+} TidRangeScan;
+
 /* ----------------
  *		subquery scan node
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ed2e4af4be..1be93be098 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -83,6 +83,9 @@ extern void cost_bitmap_or_node(BitmapOrPath *path, PlannerInfo *root);
 extern void cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec);
 extern void cost_tidscan(Path *path, PlannerInfo *root,
 						 RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info);
+extern void cost_tidrangescan(Path *path, PlannerInfo *root,
+							  RelOptInfo *baserel, List *tidrangequals,
+							  ParamPathInfo *param_info);
 extern void cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
 							  RelOptInfo *baserel, ParamPathInfo *param_info);
 extern void cost_functionscan(Path *path, PlannerInfo *root,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 8dfc36a4e1..54f4b782fc 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,6 +63,10 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 										   List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 									List *tidquals, Relids required_outer);
+extern TidRangePath *create_tidrangescan_path(PlannerInfo *root,
+											  RelOptInfo *rel,
+											  List *tidrangequals,
+											  Relids required_outer);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 									  List *subpaths, List *partial_subpaths,
 									  List *pathkeys, Relids required_outer,
diff --git a/src/include/storage/itemptr.h b/src/include/storage/itemptr.h
index 0e6990140b..cd4b8fbacb 100644
--- a/src/include/storage/itemptr.h
+++ b/src/include/storage/itemptr.h
@@ -202,5 +202,7 @@ typedef ItemPointerData *ItemPointer;
 
 extern bool ItemPointerEquals(ItemPointer pointer1, ItemPointer pointer2);
 extern int32 ItemPointerCompare(ItemPointer arg1, ItemPointer arg2);
+extern void ItemPointerInc(ItemPointer pointer);
+extern void ItemPointerDec(ItemPointer pointer);
 
 #endif							/* ITEMPTR_H */
diff --git a/src/test/regress/expected/tidrangescan.out b/src/test/regress/expected/tidrangescan.out
new file mode 100644
index 0000000000..0384304c7f
--- /dev/null
+++ b/src/test/regress/expected/tidrangescan.out
@@ -0,0 +1,302 @@
+-- tests for tidrangescans
+SET enable_seqscan TO off;
+CREATE TABLE tidrangescan(id integer, data text);
+-- insert enough tuples to fill at least two pages
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,200) AS s(i);
+-- remove all tuples after the 10th tuple on each page.  Trying to ensure
+-- we get the same layout with all CPU architectures and smaller than standard
+-- page sizes.
+DELETE FROM tidrangescan
+WHERE substring(ctid::text FROM ',(\d+)\)')::integer > 10 OR substring(ctid::text FROM '\((\d+),')::integer > 2;
+VACUUM tidrangescan;
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid <= '(1,5)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+ (1,1)
+ (1,2)
+ (1,3)
+ (1,4)
+ (1,5)
+(15 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(0,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid > '(2,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+  ctid  
+--------
+ (2,9)
+ (2,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ('(2,8)'::tid < ctid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+  ctid  
+--------
+ (2,9)
+ (2,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(2,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+  ctid  
+--------
+ (2,8)
+ (2,9)
+ (2,10)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(100,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ((ctid > '(1,4)'::tid) AND ('(1,7)'::tid >= ctid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+ ctid  
+-------
+ (1,5)
+ (1,6)
+ (1,7)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (('(1,7)'::tid >= ctid) AND (ctid > '(1,4)'::tid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+ ctid  
+-------
+ (1,5)
+ (1,6)
+ (1,7)
+(3 rows)
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan WHERE ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(4294967295,65535)';
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid 
+------
+(0 rows)
+
+-- NULLs in the range cannot return tuples
+SELECT ctid FROM tidrangescan WHERE ctid >= (SELECT NULL::tid);
+ ctid 
+------
+(0 rows)
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+ ctid 
+------
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid > '(9,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+ ctid 
+------
+(0 rows)
+
+-- rescans
+EXPLAIN (COSTS OFF)
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+                  QUERY PLAN                   
+-----------------------------------------------
+ Nested Loop
+   ->  Tid Range Scan on tidrangescan t
+         TID Cond: (ctid < '(1,0)'::tid)
+   ->  Aggregate
+         ->  Tid Range Scan on tidrangescan t2
+               TID Cond: (ctid <= t.ctid)
+(6 rows)
+
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+  ctid  | c  
+--------+----
+ (0,1)  |  1
+ (0,2)  |  2
+ (0,3)  |  3
+ (0,4)  |  4
+ (0,5)  |  5
+ (0,6)  |  6
+ (0,7)  |  7
+ (0,8)  |  8
+ (0,9)  |  9
+ (0,10) | 10
+(10 rows)
+
+-- cursors
+-- Ensure we get a TID Range scan without a Materialize node.
+EXPLAIN (COSTS OFF)
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH NEXT c;
+ ctid  
+-------
+ (0,2)
+(1 row)
+
+FETCH PRIOR c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH FIRST c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH LAST c;
+  ctid  
+--------
+ (0,10)
+(1 row)
+
+COMMIT;
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
+RESET enable_seqscan;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 12bb67e491..c77b0d7342 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -80,7 +80,7 @@ test: brin gin gist spgist privileges init_privs security_label collate matview
 # ----------
 # Another group of parallel tests
 # ----------
-test: create_table_like alter_generic alter_operator misc async dbsize misc_functions sysviews tsrf tid tidscan collate.icu.utf8 incremental_sort
+test: create_table_like alter_generic alter_operator misc async dbsize misc_functions sysviews tsrf tid tidscan tidrangescan collate.icu.utf8 incremental_sort
 
 # rules cannot run concurrently with any test that creates
 # a view or rule in the public schema
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 59b416fd80..0264a97324 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -138,6 +138,7 @@ test: sysviews
 test: tsrf
 test: tid
 test: tidscan
+test: tidrangescan
 test: collate.icu.utf8
 test: rules
 test: psql
diff --git a/src/test/regress/sql/tidrangescan.sql b/src/test/regress/sql/tidrangescan.sql
new file mode 100644
index 0000000000..2da35807ff
--- /dev/null
+++ b/src/test/regress/sql/tidrangescan.sql
@@ -0,0 +1,104 @@
+-- tests for tidrangescans
+
+SET enable_seqscan TO off;
+CREATE TABLE tidrangescan(id integer, data text);
+
+-- insert enough tuples to fill at least two pages
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,200) AS s(i);
+
+-- remove all tuples after the 10th tuple on each page.  Trying to ensure
+-- we get the same layout with all CPU architectures and smaller than standard
+-- page sizes.
+DELETE FROM tidrangescan
+WHERE substring(ctid::text FROM ',(\d+)\)')::integer > 10 OR substring(ctid::text FROM '\((\d+),')::integer > 2;
+VACUUM tidrangescan;
+
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan WHERE ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)' LIMIT 1;
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(4294967295,65535)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- NULLs in the range cannot return tuples
+SELECT ctid FROM tidrangescan WHERE ctid >= (SELECT NULL::tid);
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+
+-- rescans
+EXPLAIN (COSTS OFF)
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+
+-- cursors
+
+-- Ensure we get a TID Range scan without a Materialize node.
+EXPLAIN (COSTS OFF)
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+FETCH NEXT c;
+FETCH PRIOR c;
+FETCH FIRST c;
+FETCH LAST c;
+COMMIT;
+
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
+
+RESET enable_seqscan;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1d540fe489..0c876a6efc 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2530,8 +2530,13 @@ TextPositionState
 TheLexeme
 TheSubstitute
 TidExpr
+TidExprType
 TidHashKey
+TidOpExpr
 TidPath
+TidRangePath
+TidRangeScan
+TidRangeScanState
 TidScan
 TidScanState
 TimeADT
-- 
2.27.0

#88

David Rowley

dgrowleyml@gmail.com

almost 5 years ago

In reply to: David Rowley (#87)

1 attachment(s)

Re: Tid scan improvements

On Thu, 4 Feb 2021 at 23:51, David Rowley <dgrowleyml@gmail.com> wrote:

Updated patch attached.

I made another pass over this patch and did a bit of renaming work
around the heap_* functions and the tableam functions. I think the new
names are a bit more aligned to the existing names.

I don't really see anything else that I'm unhappy with about this
patch, so pending any objections or last-minute reviews, I plan to
push it later this week.

David

Attachments:

v14-0001-Add-TID-Range-Scans-to-support-efficient-scannin.patchtext/plain; charset=US-ASCII; name=v14-0001-Add-TID-Range-Scans-to-support-efficient-scannin.patchDownload

From db31de966787b2d5bc0fac38a584710a091c3b3b Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 21 Jan 2021 16:48:15 +1300
Subject: [PATCH v14] Add TID Range Scans to support efficient scanning ranges
 of TIDs

This adds a new node type named TID Range Scan.  The query planner will
generate paths for TID Range scans when quals are discovered on base
relations which search for ranges of ctid.  These ranges may be open at
either end.

To support this, a new optional callback function has been added to table
AM which is named scan_getnextslot_inrange.  This function accepts a
minimum and maximum ItemPointer to allow efficient retrieval of tuples
within this range.  Table AMs where scanning ranges of TIDs does not make
sense or is difficult to implement efficiently may choose to not implement
this function.

Author: Edmund Horner and David Rowley
Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com
---
 src/backend/access/heap/heapam.c           | 147 ++++++++
 src/backend/access/heap/heapam_handler.c   |   3 +
 src/backend/commands/explain.c             |  23 ++
 src/backend/executor/Makefile              |   1 +
 src/backend/executor/execAmi.c             |   6 +
 src/backend/executor/execProcnode.c        |  10 +
 src/backend/executor/nodeTidrangescan.c    | 411 +++++++++++++++++++++
 src/backend/nodes/copyfuncs.c              |  24 ++
 src/backend/nodes/outfuncs.c               |  14 +
 src/backend/optimizer/README               |   1 +
 src/backend/optimizer/path/costsize.c      |  95 +++++
 src/backend/optimizer/path/tidpath.c       | 117 +++++-
 src/backend/optimizer/plan/createplan.c    |  98 +++++
 src/backend/optimizer/plan/setrefs.c       |  16 +
 src/backend/optimizer/plan/subselect.c     |   6 +
 src/backend/optimizer/util/pathnode.c      |  29 ++
 src/backend/optimizer/util/plancat.c       |   6 +
 src/backend/optimizer/util/relnode.c       |   3 +
 src/backend/storage/page/itemptr.c         |  58 +++
 src/include/access/heapam.h                |   6 +-
 src/include/access/relscan.h               |   4 +
 src/include/access/tableam.h               |  91 ++++-
 src/include/catalog/pg_operator.dat        |   6 +-
 src/include/executor/nodeTidrangescan.h    |  23 ++
 src/include/nodes/execnodes.h              |  18 +
 src/include/nodes/nodes.h                  |   3 +
 src/include/nodes/pathnodes.h              |  18 +
 src/include/nodes/plannodes.h              |  13 +
 src/include/optimizer/cost.h               |   3 +
 src/include/optimizer/pathnode.h           |   4 +
 src/include/storage/itemptr.h              |   2 +
 src/test/regress/expected/tidrangescan.out | 302 +++++++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/serial_schedule           |   1 +
 src/test/regress/sql/tidrangescan.sql      | 104 ++++++
 src/tools/pgindent/typedefs.list           |   5 +
 36 files changed, 1652 insertions(+), 21 deletions(-)
 create mode 100644 src/backend/executor/nodeTidrangescan.c
 create mode 100644 src/include/executor/nodeTidrangescan.h
 create mode 100644 src/test/regress/expected/tidrangescan.out
 create mode 100644 src/test/regress/sql/tidrangescan.sql

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9926e2bd54..2171a12e0e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1391,6 +1391,153 @@ heap_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *s
 	return true;
 }
 
+void
+heap_set_tidrange(TableScanDesc sscan, ItemPointer mintid,
+				  ItemPointer maxtid)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+	BlockNumber startBlk;
+	BlockNumber numBlks;
+	ItemPointerData highestItem;
+	ItemPointerData lowestItem;
+
+	/*
+	 * For relations without any pages, we can simply leave the TID range
+	 * unset.  There will be no tuples to scan, therefore no tuples outside
+	 * the given TID range.
+	 */
+	if (scan->rs_nblocks == 0)
+		return;
+
+	/*
+	 * Set up some ItemPointers which point to the first and last possible
+	 * tuples in the heap.
+	 */
+	ItemPointerSet(&highestItem, scan->rs_nblocks - 1, MaxOffsetNumber);
+	ItemPointerSet(&lowestItem, 0, FirstOffsetNumber);
+
+	/*
+	 * If the given maximum TID is below the highest possible TID in the
+	 * relation, then restrict the range to that, otherwise we scan to the end
+	 * of the relation.
+	 */
+	if (ItemPointerCompare(maxtid, &highestItem) < 0)
+		ItemPointerCopy(maxtid, &highestItem);
+
+	/*
+	 * If the given minimum TID is above the lowest possible TID in the
+	 * relation, then restrict the range to only scan for TIDs above that.
+	 */
+	if (ItemPointerCompare(mintid, &lowestItem) > 0)
+		ItemPointerCopy(mintid, &lowestItem);
+
+	/*
+	 * Check for an empty range and protect from would be negative results
+	 * from the numBlks calculation below.
+	 */
+	if (ItemPointerCompare(&highestItem, &lowestItem) < 0)
+	{
+		/* Set an empty range of blocks to scan */
+		heap_setscanlimits(sscan, 0, 0);
+		return;
+	}
+
+	/*
+	 * Calculate the first block and the number of blocks we must scan. We
+	 * could be more aggressive here and perform some more validation to try
+	 * and further narrow the scope of blocks to scan by checking if the
+	 * lowerItem has an offset above MaxOffsetNumber.  In this case, we could
+	 * advance startBlk by one.  Likewise if highestItem has an offset of 0 we
+	 * could scan one fewer blocks.  However, such an optimization does not
+	 * seem worth troubling over, currently.
+	 */
+	startBlk = ItemPointerGetBlockNumberNoCheck(&lowestItem);
+
+	numBlks = ItemPointerGetBlockNumberNoCheck(&highestItem) -
+		ItemPointerGetBlockNumberNoCheck(&lowestItem) + 1;
+
+	/* Set the start block and number of blocks to scan */
+	heap_setscanlimits(sscan, startBlk, numBlks);
+
+	/* Finally, set the TID range in sscan */
+	ItemPointerCopy(&lowestItem, &sscan->rs_mintid);
+	ItemPointerCopy(&highestItem, &sscan->rs_maxtid);
+}
+
+bool
+heap_getnextslot_tidrange(TableScanDesc sscan, ScanDirection direction,
+						  TupleTableSlot *slot)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+	ItemPointer mintid = &sscan->rs_mintid;
+	ItemPointer maxtid = &sscan->rs_maxtid;
+
+	/* Note: no locking manipulations needed */
+	for (;;)
+	{
+		if (sscan->rs_flags & SO_ALLOW_PAGEMODE)
+			heapgettup_pagemode(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+		else
+			heapgettup(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+
+		if (scan->rs_ctup.t_data == NULL)
+		{
+			ExecClearTuple(slot);
+			return false;
+		}
+
+		/*
+		 * heap_set_tidrange will have used heap_setscanlimits to limit the
+		 * range of pages we scan to only ones that can contain the TID range
+		 * we're scanning for.  Here we must filter out any tuples from these
+		 * pages that are outwith that range.
+		 */
+		if (ItemPointerCompare(&scan->rs_ctup.t_self, mintid) < 0)
+		{
+			ExecClearTuple(slot);
+
+			/*
+			 * When scanning backwards, the TIDs will be in descending order.
+			 * Future tuples in this direction will be lower still, so we can
+			 * just return false to indicate there will be no more tuples.
+			 */
+			if (ScanDirectionIsBackward(direction))
+				return false;
+
+			continue;
+		}
+
+		/*
+		 * Likewise for the final page, we must filter out TIDs greater than
+		 * maxtid.
+		 */
+		if (ItemPointerCompare(&scan->rs_ctup.t_self, maxtid) > 0)
+		{
+			ExecClearTuple(slot);
+
+			/*
+			 * When scanning forward, the TIDs will be in ascending order.
+			 * Future tuples in this direction will be higher still, so we can
+			 * just return false to indicate there will be no more tuples.
+			 */
+			if (ScanDirectionIsForward(direction))
+				return false;
+			continue;
+		}
+
+		break;
+	}
+
+	/*
+	 * if we get here it means we have a new current scan tuple, so point to
+	 * the proper return buffer and return the tuple.
+	 */
+	pgstat_count_heap_getnext(scan->rs_base.rs_rd);
+
+	ExecStoreBufferHeapTuple(&scan->rs_ctup, slot, scan->rs_cbuf);
+	return true;
+}
+
 /*
  *	heap_fetch		- retrieve tuple with given tid
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 4a70e20a14..bd5faf0c1f 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2542,6 +2542,9 @@ static const TableAmRoutine heapam_methods = {
 	.scan_rescan = heap_rescan,
 	.scan_getnextslot = heap_getnextslot,
 
+	.scan_set_tidrange = heap_set_tidrange,
+	.scan_getnextslot_tidrange = heap_getnextslot_tidrange,
+
 	.parallelscan_estimate = table_block_parallelscan_estimate,
 	.parallelscan_initialize = table_block_parallelscan_initialize,
 	.parallelscan_reinitialize = table_block_parallelscan_reinitialize,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f80e379973..afc45429ba 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1057,6 +1057,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1223,6 +1224,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_TidScan:
 			pname = sname = "Tid Scan";
 			break;
+		case T_TidRangeScan:
+			pname = sname = "Tid Range Scan";
+			break;
 		case T_SubqueryScan:
 			pname = sname = "Subquery Scan";
 			break;
@@ -1417,6 +1421,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SampleScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1871,6 +1876,23 @@ ExplainNode(PlanState *planstate, List *ancestors,
 											   planstate, es);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				/*
+				 * The tidrangequals list has AND semantics, so be sure to
+				 * show it as an AND condition.
+				 */
+				List	   *tidquals = ((TidRangeScan *) plan)->tidrangequals;
+
+				if (list_length(tidquals) > 1)
+					tidquals = list_make1(make_andclause(tidquals));
+				show_scan_qual(tidquals, "TID Cond", planstate, ancestors, es);
+				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+				if (plan->qual)
+					show_instrumentation_count("Rows Removed by Filter", 1,
+											   planstate, es);
+			}
+			break;
 		case T_ForeignScan:
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
@@ -3558,6 +3580,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..74ac59faa1 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -67,6 +67,7 @@ OBJS = \
 	nodeSubplan.o \
 	nodeSubqueryscan.o \
 	nodeTableFuncscan.o \
+	nodeTidrangescan.o \
 	nodeTidscan.o \
 	nodeUnique.o \
 	nodeValuesscan.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 23bdb53cd1..4543ac79ed 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -51,6 +51,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecReScanTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecReScanSubqueryScan((SubqueryScanState *) node);
 			break;
@@ -562,6 +567,7 @@ ExecSupportsBackwardScan(Plan *node)
 
 		case T_SeqScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_FunctionScan:
 		case T_ValuesScan:
 		case T_CteScan:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 414df50a05..29766d8196 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -109,6 +109,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -238,6 +239,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 												   estate, eflags);
 			break;
 
+		case T_TidRangeScan:
+			result = (PlanState *) ExecInitTidRangeScan((TidRangeScan *) node,
+														estate, eflags);
+			break;
+
 		case T_SubqueryScan:
 			result = (PlanState *) ExecInitSubqueryScan((SubqueryScan *) node,
 														estate, eflags);
@@ -637,6 +643,10 @@ ExecEndNode(PlanState *node)
 			ExecEndTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecEndTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecEndSubqueryScan((SubqueryScanState *) node);
 			break;
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
new file mode 100644
index 0000000000..add87cdd82
--- /dev/null
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -0,0 +1,411 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.c
+ *	  Routines to support TID range scans of relations
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeTidrangescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "access/tableam.h"
+#include "catalog/pg_operator.h"
+#include "executor/execdebug.h"
+#include "executor/nodeTidrangescan.h"
+#include "nodes/nodeFuncs.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+
+
+#define IsCTIDVar(node)  \
+	((node) != NULL && \
+	 IsA((node), Var) && \
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
+	 ((Var *) (node))->varlevelsup == 0)
+
+typedef enum
+{
+	TIDEXPR_UPPER_BOUND,
+	TIDEXPR_LOWER_BOUND
+} TidExprType;
+
+/* Upper or lower range bound for scan */
+typedef struct TidOpExpr
+{
+	TidExprType exprtype;		/* type of op; lower or upper */
+	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
+	bool		inclusive;		/* whether op is inclusive */
+} TidOpExpr;
+
+/*
+ * For the given 'expr', build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.
+ */
+static TidOpExpr *
+MakeTidOpExpr(OpExpr *expr, TidRangeScanState *tidstate)
+{
+	Node	   *arg1 = get_leftop((Expr *) expr);
+	Node	   *arg2 = get_rightop((Expr *) expr);
+	ExprState  *exprstate = NULL;
+	bool		invert = false;
+	TidOpExpr  *tidopexpr;
+
+	if (IsCTIDVar(arg1))
+		exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
+	else if (IsCTIDVar(arg2))
+	{
+		exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
+		invert = true;
+	}
+	else
+		elog(ERROR, "could not identify CTID variable");
+
+	tidopexpr = (TidOpExpr *) palloc(sizeof(TidOpExpr));
+	tidopexpr->inclusive = false;	/* for now */
+
+	switch (expr->opno)
+	{
+		case TIDLessEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDLessOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
+			break;
+		case TIDGreaterEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDGreaterOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
+			break;
+		default:
+			elog(ERROR, "could not identify CTID operator");
+	}
+
+	tidopexpr->exprstate = exprstate;
+
+	return tidopexpr;
+}
+
+/*
+ * Extract the qual subexpressions that yield TIDs to search for,
+ * and compile them into ExprStates if they're ordinary expressions.
+ */
+static void
+TidExprListCreate(TidRangeScanState *tidrangestate)
+{
+	TidRangeScan *node = (TidRangeScan *) tidrangestate->ss.ps.plan;
+	List	   *tidexprs = NIL;
+	ListCell   *l;
+
+	foreach(l, node->tidrangequals)
+	{
+		OpExpr	   *opexpr = lfirst(l);
+		TidOpExpr  *tidopexpr;
+
+		if (!IsA(opexpr, OpExpr))
+			elog(ERROR, "could not identify CTID expression");
+
+		tidopexpr = MakeTidOpExpr(opexpr, tidrangestate);
+		tidexprs = lappend(tidexprs, tidopexpr);
+	}
+
+	tidrangestate->trss_tidexprs = tidexprs;
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeEval
+ *
+ *		Compute and set node's block and offset range to scan by evaluating
+ *		the trss_tidexprs.  Returns false if we detect the range cannot
+ *		contain any tuples.  Returns true if it's possible for the range to
+ *		contain tuples.
+ * ----------------------------------------------------------------
+ */
+static bool
+TidRangeEval(TidRangeScanState *node)
+{
+	ExprContext *econtext = node->ss.ps.ps_ExprContext;
+	ItemPointerData lowerBound;
+	ItemPointerData upperBound;
+	ListCell   *l;
+
+	/*
+	 * Set the upper and lower bounds to the absolute limits of the range of
+	 * the ItemPointer type.  Below we'll try to narrow this range on either
+	 * side by looking at the TidOpExprs.
+	 */
+	ItemPointerSet(&lowerBound, 0, 0);
+	ItemPointerSet(&upperBound, InvalidBlockNumber, PG_UINT16_MAX);
+
+	foreach(l, node->trss_tidexprs)
+	{
+		TidOpExpr  *tidopexpr = (TidOpExpr *) lfirst(l);
+		ItemPointer itemptr;
+		bool		isNull;
+
+		/* Evaluate this bound. */
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidopexpr->exprstate,
+													  econtext,
+													  &isNull));
+
+		/* If the bound is NULL, *nothing* matches the qual. */
+		if (isNull)
+			return false;
+
+		if (tidopexpr->exprtype == TIDEXPR_LOWER_BOUND)
+		{
+			ItemPointerData lb;
+
+			ItemPointerCopy(itemptr, &lb);
+
+			/*
+			 * Normalize non-inclusive ranges to become inclusive.  The
+			 * resulting ItemPointer here may not be a valid item pointer.
+			 */
+			if (!tidopexpr->inclusive)
+				ItemPointerInc(&lb);
+
+			/* Check if we can narrow the range using this qual */
+			if (ItemPointerCompare(&lb, &lowerBound) > 0)
+				ItemPointerCopy(&lb, &lowerBound);
+		}
+
+		else if (tidopexpr->exprtype == TIDEXPR_UPPER_BOUND)
+		{
+			ItemPointerData ub;
+
+			ItemPointerCopy(itemptr, &ub);
+
+			/*
+			 * Normalize non-inclusive ranges to become inclusive.  The
+			 * resulting ItemPointer here may not be a valid item pointer.
+			 */
+			if (!tidopexpr->inclusive)
+				ItemPointerDec(&ub);
+
+			/* Check if we can narrow the range using this qual */
+			if (ItemPointerCompare(&ub, &upperBound) < 0)
+				ItemPointerCopy(&ub, &upperBound);
+		}
+	}
+
+	ItemPointerCopy(&lowerBound, &node->trss_mintid);
+	ItemPointerCopy(&upperBound, &node->trss_maxtid);
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeNext
+ *
+ *		Retrieve a tuple from the TidRangeScan node's currentRelation
+ *		using the tids in the TidRangeScanState information.
+ *
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+TidRangeNext(TidRangeScanState *node)
+{
+	TableScanDesc scandesc;
+	EState	   *estate;
+	ScanDirection direction;
+	TupleTableSlot *slot;
+
+	/*
+	 * extract necessary information from tid scan node
+	 */
+	scandesc = node->ss.ss_currentScanDesc;
+	estate = node->ss.ps.state;
+	slot = node->ss.ss_ScanTupleSlot;
+	direction = estate->es_direction;
+
+	if (!node->trss_inScan)
+	{
+		/* First time through, compute TID range to scan */
+		if (!TidRangeEval(node))
+			return NULL;
+
+		if (scandesc == NULL)
+		{
+			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
+												estate->es_snapshot,
+												&node->trss_mintid,
+												&node->trss_maxtid);
+			node->ss.ss_currentScanDesc = scandesc;
+		}
+		else
+			table_set_tidrange(scandesc, &node->trss_mintid,
+							   &node->trss_maxtid);
+
+		node->trss_inScan = true;
+	}
+
+	/* Fetch the next tuple. */
+	if (!table_scan_getnextslot_tidrange(scandesc, direction, slot))
+	{
+		node->trss_inScan = false;
+		ExecClearTuple(slot);
+	}
+
+	return slot;
+}
+
+/*
+ * TidRangeRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecTidRangeScan(node)
+ *
+ *		Scans the relation using tids and returns the next qualifying tuple.
+ *		We call the ExecScan() routine and pass it the appropriate
+ *		access method functions.
+ *
+ *		Conditions:
+ *		  -- the "cursor" maintained by the AMI is positioned at the tuple
+ *			 returned previously.
+ *
+ *		Initial States:
+ *		  -- the relation indicated is opened for TID range scanning.
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecTidRangeScan(PlanState *pstate)
+{
+	TidRangeScanState *node = castNode(TidRangeScanState, pstate);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) TidRangeNext,
+					(ExecScanRecheckMtd) TidRangeRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanTidRangeScan(node)
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanTidRangeScan(TidRangeScanState *node)
+{
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+
+	if (scan != NULL)
+		table_rescan(scan, NULL);
+
+	/* mark scan as not in progress, and tid range list as not computed yet */
+	node->trss_inScan = false;
+
+	ExecScanReScan(&node->ss);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecEndTidRangeScan
+ *
+ *		Releases any storage allocated through C routines.
+ *		Returns nothing.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndTidRangeScan(TidRangeScanState *node)
+{
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+
+	if (scan != NULL)
+		table_endscan(scan);
+
+	/*
+	 * Free the exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitTidRangeScan
+ *
+ *		Initializes the tid range scan's state information, creates
+ *		scan keys, and opens the scan relation.
+ *
+ *		Parameters:
+ *		  node: TidRangeScan node produced by the planner.
+ *		  estate: the execution state initialized in InitPlan.
+ * ----------------------------------------------------------------
+ */
+TidRangeScanState *
+ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
+{
+	TidRangeScanState *tidrangestate;
+	Relation	currentRelation;
+
+	/*
+	 * create state structure
+	 */
+	tidrangestate = makeNode(TidRangeScanState);
+	tidrangestate->ss.ps.plan = (Plan *) node;
+	tidrangestate->ss.ps.state = estate;
+	tidrangestate->ss.ps.ExecProcNode = ExecTidRangeScan;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &tidrangestate->ss.ps);
+
+	/*
+	 * mark scan as not in progress, and TID range as not computed yet
+	 */
+	tidrangestate->trss_inScan = false;
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	tidrangestate->ss.ss_currentRelation = currentRelation;
+	tidrangestate->ss.ss_currentScanDesc = NULL;	/* no table scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &tidrangestate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTypeTL(&tidrangestate->ss.ps);
+	ExecAssignScanProjectionInfo(&tidrangestate->ss);
+
+	/*
+	 * initialize child expressions
+	 */
+	tidrangestate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) tidrangestate);
+
+	TidExprListCreate(tidrangestate);
+
+	/*
+	 * all done.
+	 */
+	return tidrangestate;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 65bbc18ecb..aaba1ec2c4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -585,6 +585,27 @@ _copyTidScan(const TidScan *from)
 	return newnode;
 }
 
+/*
+ * _copyTidRangeScan
+ */
+static TidRangeScan *
+_copyTidRangeScan(const TidRangeScan *from)
+{
+	TidRangeScan *newnode = makeNode(TidRangeScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_NODE_FIELD(tidrangequals);
+
+	return newnode;
+}
+
 /*
  * _copySubqueryScan
  */
@@ -4938,6 +4959,9 @@ copyObjectImpl(const void *from)
 		case T_TidScan:
 			retval = _copyTidScan(from);
 			break;
+		case T_TidRangeScan:
+			retval = _copyTidRangeScan(from);
+			break;
 		case T_SubqueryScan:
 			retval = _copySubqueryScan(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f5dcedf6e8..8fc432bfe1 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -608,6 +608,16 @@ _outTidScan(StringInfo str, const TidScan *node)
 	WRITE_NODE_FIELD(tidquals);
 }
 
+static void
+_outTidRangeScan(StringInfo str, const TidRangeScan *node)
+{
+	WRITE_NODE_TYPE("TIDRANGESCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_NODE_FIELD(tidrangequals);
+}
+
 static void
 _outSubqueryScan(StringInfo str, const SubqueryScan *node)
 {
@@ -2314,6 +2324,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	WRITE_NODE_FIELD(subroot);
 	WRITE_NODE_FIELD(subplan_params);
 	WRITE_INT_FIELD(rel_parallel_workers);
+	WRITE_UINT_FIELD(amflags);
 	WRITE_OID_FIELD(serverid);
 	WRITE_OID_FIELD(userid);
 	WRITE_BOOL_FIELD(useridiscurrent);
@@ -3810,6 +3821,9 @@ outNode(StringInfo str, const void *obj)
 			case T_TidScan:
 				_outTidScan(str, obj);
 				break;
+			case T_TidRangeScan:
+				_outTidRangeScan(str, obj);
+				break;
 			case T_SubqueryScan:
 				_outSubqueryScan(str, obj);
 				break;
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index efb52858c8..4a6c348162 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -374,6 +374,7 @@ RelOptInfo      - a relation or joined relations
   IndexPath     - index scan
   BitmapHeapPath - top of a bitmapped index scan
   TidPath       - scan by CTID
+  TidRangePath  - scan a contiguous range of CTIDs
   SubqueryScanPath - scan a subquery-in-FROM
   ForeignPath   - scan a foreign table, foreign join or foreign upper-relation
   CustomPath    - for custom scan providers
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index aab06c7d21..a25b674a19 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1283,6 +1283,101 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_tidrangescan
+ *	  Determines and sets the costs of scanning a relation using a range of
+ *	  TIDs for 'path'
+ *
+ * 'baserel' is the relation to be scanned
+ * 'tidrangequals' is the list of TID-checkable range quals
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_tidrangescan(Path *path, PlannerInfo *root,
+				  RelOptInfo *baserel, List *tidrangequals,
+				  ParamPathInfo *param_info)
+{
+	Selectivity selectivity;
+	double		pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	QualCost	tid_qual_cost;
+	double		ntuples;
+	double		nseqpages;
+	double		spc_random_page_cost;
+	double		spc_seq_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (param_info)
+		path->rows = param_info->ppi_rows;
+	else
+		path->rows = baserel->rows;
+
+	/* Count how many tuples and pages we expect to scan */
+	selectivity = clauselist_selectivity(root, tidrangequals, baserel->relid,
+										 JOIN_INNER, NULL);
+	pages = ceil(selectivity * baserel->pages);
+
+	if (pages <= 0.0)
+		pages = 1.0;
+
+	/*
+	 * The first page in a range requires a random seek, but each subsequent
+	 * page is just a normal sequential page read. NOTE: it's desirable for
+	 * TID Range Scans to cost more than the equivalent Sequential Scans,
+	 * because Seq Scans have some performance advantages such as scan
+	 * synchronization and parallelizability, and we'd prefer one of them to
+	 * be picked unless a TID Range Scan really is better.
+	 */
+	ntuples = selectivity * baserel->tuples;
+	nseqpages = pages - 1.0;
+
+	if (!enable_tidscan)
+		startup_cost += disable_cost;
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrieved tuple.
+	 */
+	cost_qual_eval(&tid_qual_cost, tidrangequals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/* disk costs; 1 random page and the remainder as seq pages */
+	run_cost += spc_random_page_cost + spc_seq_page_cost * nseqpages;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+	/*
+	 * XXX currently we assume TID quals are a subset of qpquals at this
+	 * point; they will be removed (if possible) when we create the plan, so
+	 * we subtract their cost from the total qpqual cost.  (If the TID quals
+	 * can't be removed, this is a mistake and we're going to underestimate
+	 * the CPU cost a bit.)
+	 */
+	startup_cost += qpqual_cost.startup + tid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		tid_qual_cost.per_tuple;
+	run_cost += cpu_per_tuple * ntuples;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->pathtarget->cost.startup;
+	run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+	path->startup_cost = startup_cost;
+	path->total_cost = startup_cost + run_cost;
+}
+
 /*
  * cost_subqueryscan
  *	  Determines and returns the cost of scanning a subquery RTE.
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 0845b460e2..41d86e42e0 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -2,9 +2,9 @@
  *
  * tidpath.c
  *	  Routines to determine which TID conditions are usable for scanning
- *	  a given relation, and create TidPaths accordingly.
+ *	  a given relation, and create TidPaths and TidRangePaths accordingly.
  *
- * What we are looking for here is WHERE conditions of the form
+ * For TidPaths, we look for WHERE conditions of the form
  * "CTID = pseudoconstant", which can be implemented by just fetching
  * the tuple directly via heap_fetch().  We can also handle OR'd conditions
  * such as (CTID = const1) OR (CTID = const2), as well as ScalarArrayOpExpr
@@ -23,6 +23,9 @@
  * a function, but in practice it works better to keep the special node
  * representation all the way through to execution.
  *
+ * Additionally, TidRangePaths may be created for conditions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=, and
+ * AND-clauses composed of such conditions.
  *
  * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -63,14 +66,14 @@ IsCTIDVar(Var *var, RelOptInfo *rel)
 
 /*
  * Check to see if a RestrictInfo is of the form
- *		CTID = pseudoconstant
+ *		CTID OP pseudoconstant
  * or
- *		pseudoconstant = CTID
- * where the CTID Var belongs to relation "rel", and nothing on the
- * other side of the clause does.
+ *		pseudoconstant OP CTID
+ * where OP is a binary operation, the CTID Var belongs to relation "rel",
+ * and nothing on the other side of the clause does.
  */
 static bool
-IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+IsBinaryTidClause(RestrictInfo *rinfo, RelOptInfo *rel)
 {
 	OpExpr	   *node;
 	Node	   *arg1,
@@ -83,10 +86,9 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 		return false;
 	node = (OpExpr *) rinfo->clause;
 
-	/* Operator must be tideq */
-	if (node->opno != TIDEqualOperator)
+	/* OpExpr must have two arguments */
+	if (list_length(node->args) != 2)
 		return false;
-	Assert(list_length(node->args) == 2);
 	arg1 = linitial(node->args);
 	arg2 = lsecond(node->args);
 
@@ -116,6 +118,50 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 	return true;				/* success */
 }
 
+/*
+ * Check to see if a RestrictInfo is of the form
+ *		CTID = pseudoconstant
+ * or
+ *		pseudoconstant = CTID
+ * where the CTID Var belongs to relation "rel", and nothing on the
+ * other side of the clause does.
+ */
+static bool
+IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	if (!IsBinaryTidClause(rinfo, rel))
+		return false;
+
+	if (((OpExpr *) rinfo->clause)->opno == TIDEqualOperator)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check to see if a RestrictInfo is of the form
+ *		CTID OP pseudoconstant
+ * or
+ *		pseudoconstant OP CTID
+ * where OP is a range operator such as <, <=, >, or >=, the CTID Var belongs
+ * to relation "rel", and nothing on the other side of the clause does.
+ */
+static bool
+IsTidRangeClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	Oid			opno;
+
+	if (!IsBinaryTidClause(rinfo, rel))
+		return false;
+	opno = ((OpExpr *) rinfo->clause)->opno;
+
+	if (opno == TIDLessOperator || opno == TIDLessEqOperator ||
+		opno == TIDGreaterOperator || opno == TIDGreaterEqOperator)
+		return true;
+
+	return false;
+}
+
 /*
  * Check to see if a RestrictInfo is of the form
  *		CTID = ANY (pseudoconstant_array)
@@ -222,7 +268,7 @@ TidQualFromRestrictInfo(PlannerInfo *root, RestrictInfo *rinfo, RelOptInfo *rel)
  *
  * Returns a List of CTID qual RestrictInfos for the specified rel (with
  * implicit OR semantics across the list), or NIL if there are no usable
- * conditions.
+ * equality conditions.
  *
  * This function is just concerned with handling AND/OR recursion.
  */
@@ -301,6 +347,34 @@ TidQualFromRestrictInfoList(PlannerInfo *root, List *rlist, RelOptInfo *rel)
 	return rlst;
 }
 
+/*
+ * Extract a set of CTID range conditions from implicit-AND List of RestrictInfos
+ *
+ * Returns a List of CTID range qual RestrictInfos for the specified rel
+ * (with implicit AND semantics across the list), or NIL if there are no
+ * usable range conditions or if the rel's table AM does not support TID range
+ * scans.
+ */
+static List *
+TidRangeQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
+{
+	List	   *rlst = NIL;
+	ListCell   *l;
+
+	if ((rel->amflags & AMFLAG_HAS_TID_RANGE) == 0)
+		return NIL;
+
+	foreach(l, rlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (IsTidRangeClause(rinfo, rel))
+			rlst = lappend(rlst, rinfo);
+	}
+
+	return rlst;
+}
+
 /*
  * Given a list of join clauses involving our rel, create a parameterized
  * TidPath for each one that is a suitable TidEqual clause.
@@ -385,6 +459,7 @@ void
 create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 {
 	List	   *tidquals;
+	List	   *tidrangequals;
 
 	/*
 	 * If any suitable quals exist in the rel's baserestrict list, generate a
@@ -404,6 +479,26 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 												   required_outer));
 	}
 
+	/*
+	 * If there are range quals in the baserestrict list, generate a
+	 * TidRangePath.
+	 */
+	tidrangequals = TidRangeQualFromRestrictInfoList(rel->baserestrictinfo,
+													 rel);
+
+	if (tidrangequals)
+	{
+		/*
+		 * This path uses no join clauses, but it could still have required
+		 * parameterization due to LATERAL refs in its tlist.
+		 */
+		Relids		required_outer = rel->lateral_relids;
+
+		add_path(rel, (Path *) create_tidrangescan_path(root, rel,
+														tidrangequals,
+														required_outer));
+	}
+
 	/*
 	 * Try to generate parameterized TidPaths using equality clauses extracted
 	 * from EquivalenceClasses.  (This is important since simple "t1.ctid =
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 6c8305c977..906cab7053 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -129,6 +129,10 @@ static Plan *create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
 static void bitmap_subplan_mark_shared(Plan *plan);
 static TidScan *create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 									List *tlist, List *scan_clauses);
+static TidRangeScan *create_tidrangescan_plan(PlannerInfo *root,
+											  TidRangePath *best_path,
+											  List *tlist,
+											  List *scan_clauses);
 static SubqueryScan *create_subqueryscan_plan(PlannerInfo *root,
 											  SubqueryScanPath *best_path,
 											  List *tlist, List *scan_clauses);
@@ -193,6 +197,8 @@ static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 											Index scanrelid);
 static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
 							 List *tidquals);
+static TidRangeScan *make_tidrangescan(List *qptlist, List *qpqual,
+									   Index scanrelid, List *tidrangequals);
 static SubqueryScan *make_subqueryscan(List *qptlist,
 									   List *qpqual,
 									   Index scanrelid,
@@ -384,6 +390,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -679,6 +686,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												scan_clauses);
 			break;
 
+		case T_TidRangeScan:
+			plan = (Plan *) create_tidrangescan_plan(root,
+													 (TidRangePath *) best_path,
+													 tlist,
+													 scan_clauses);
+			break;
+
 		case T_SubqueryScan:
 			plan = (Plan *) create_subqueryscan_plan(root,
 													 (SubqueryScanPath *) best_path,
@@ -3436,6 +3450,71 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_tidrangescan_plan
+ *	 Returns a tidrangescan plan for the base relation scanned by 'best_path'
+ *	 with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static TidRangeScan *
+create_tidrangescan_plan(PlannerInfo *root, TidRangePath *best_path,
+						 List *tlist, List *scan_clauses)
+{
+	TidRangeScan *scan_plan;
+	Index		scan_relid = best_path->path.parent->relid;
+	List	   *tidrangequals = best_path->tidrangequals;
+
+	/* it should be a base rel... */
+	Assert(scan_relid > 0);
+	Assert(best_path->path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * The qpqual list must contain all restrictions not enforced by the
+	 * tidrangequals list.  tidrangequals has AND semantics, so we can simply
+	 * remove any qual that appears in it.
+	 */
+	{
+		List	   *qpqual = NIL;
+		ListCell   *l;
+
+		foreach(l, scan_clauses)
+		{
+			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+			if (rinfo->pseudoconstant)
+				continue;		/* we may drop pseudoconstants here */
+			if (list_member_ptr(tidrangequals, rinfo))
+				continue;		/* simple duplicate */
+			qpqual = lappend(qpqual, rinfo);
+		}
+		scan_clauses = qpqual;
+	}
+
+	/* Sort clauses into best execution order */
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+
+	/* Reduce RestrictInfo lists to bare expressions; ignore pseudoconstants */
+	tidrangequals = extract_actual_clauses(tidrangequals, false);
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/* Replace any outer-relation variables with nestloop params */
+	if (best_path->path.param_info)
+	{
+		tidrangequals = (List *)
+			replace_nestloop_params(root, (Node *) tidrangequals);
+		scan_clauses = (List *)
+			replace_nestloop_params(root, (Node *) scan_clauses);
+	}
+
+	scan_plan = make_tidrangescan(tlist,
+								  scan_clauses,
+								  scan_relid,
+								  tidrangequals);
+
+	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
+
+	return scan_plan;
+}
+
 /*
  * create_subqueryscan_plan
  *	 Returns a subqueryscan plan for the base relation scanned by 'best_path'
@@ -5369,6 +5448,25 @@ make_tidscan(List *qptlist,
 	return node;
 }
 
+static TidRangeScan *
+make_tidrangescan(List *qptlist,
+				  List *qpqual,
+				  Index scanrelid,
+				  List *tidrangequals)
+{
+	TidRangeScan *node = makeNode(TidRangeScan);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->tidrangequals = tidrangequals;
+
+	return node;
+}
+
 static SubqueryScan *
 make_subqueryscan(List *qptlist,
 				  List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index c3c36be13e..42f088ad71 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -619,6 +619,22 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 								  rtoffset, 1);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				TidRangeScan *splan = (TidRangeScan *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist,
+								  rtoffset, NUM_EXEC_TLIST(plan));
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+				splan->tidrangequals =
+					fix_scan_list(root, splan->tidrangequals,
+								  rtoffset, 1);
+			}
+			break;
 		case T_SubqueryScan:
 			/* Needs special treatment, see comments below */
 			return set_subqueryscan_references(root,
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 54ef61bfb3..f3e46e0959 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2367,6 +2367,12 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_TidRangeScan:
+			finalize_primnode((Node *) ((TidRangeScan *) plan)->tidrangequals,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			break;
+
 		case T_SubqueryScan:
 			{
 				SubqueryScan *sscan = (SubqueryScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 9be0c4a6af..6a66e23351 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1203,6 +1203,35 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 	return pathnode;
 }
 
+/*
+ * create_tidscan_path
+ *	  Creates a path corresponding to a scan by a range of TIDs, returning
+ *	  the pathnode.
+ */
+TidRangePath *
+create_tidrangescan_path(PlannerInfo *root, RelOptInfo *rel,
+						 List *tidrangequals, Relids required_outer)
+{
+	TidRangePath *pathnode = makeNode(TidRangePath);
+
+	pathnode->path.pathtype = T_TidRangeScan;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_workers = 0;
+	pathnode->path.pathkeys = NIL;	/* always unordered */
+
+	pathnode->tidrangequals = tidrangequals;
+
+	cost_tidrangescan(&pathnode->path, root, rel, tidrangequals,
+					  pathnode->path.param_info);
+
+	return pathnode;
+}
+
 /*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 177e6e336a..c5947fa418 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -467,6 +467,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	/* Collect info about relation's foreign keys, if relevant */
 	get_relation_foreign_keys(root, rel, relation, inhparent);
 
+	/* Collect info about functions implemented by the rel's table AM. */
+	if (relation->rd_tableam &&
+		relation->rd_tableam->scan_set_tidrange != NULL &&
+		relation->rd_tableam->scan_getnextslot_tidrange != NULL)
+		rel->amflags |= AMFLAG_HAS_TID_RANGE;
+
 	/*
 	 * Collect info about relation's partitioning scheme, if any. Only
 	 * inheritance parents may be partitioned.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 731ff708b9..345c877aeb 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -234,6 +234,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->subroot = NULL;
 	rel->subplan_params = NIL;
 	rel->rel_parallel_workers = -1; /* set up in get_relation_info */
+	rel->amflags = 0;
 	rel->serverid = InvalidOid;
 	rel->userid = rte->checkAsUser;
 	rel->useridiscurrent = false;
@@ -646,6 +647,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->subroot = NULL;
 	joinrel->subplan_params = NIL;
 	joinrel->rel_parallel_workers = -1;
+	joinrel->amflags = 0;
 	joinrel->serverid = InvalidOid;
 	joinrel->userid = InvalidOid;
 	joinrel->useridiscurrent = false;
@@ -826,6 +828,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	joinrel->eclass_indexes = NULL;
 	joinrel->subroot = NULL;
 	joinrel->subplan_params = NIL;
+	joinrel->amflags = 0;
 	joinrel->serverid = InvalidOid;
 	joinrel->userid = InvalidOid;
 	joinrel->useridiscurrent = false;
diff --git a/src/backend/storage/page/itemptr.c b/src/backend/storage/page/itemptr.c
index 55759c383b..4e3644e3ab 100644
--- a/src/backend/storage/page/itemptr.c
+++ b/src/backend/storage/page/itemptr.c
@@ -71,3 +71,61 @@ ItemPointerCompare(ItemPointer arg1, ItemPointer arg2)
 	else
 		return 0;
 }
+
+/*
+ * ItemPointerInc
+ *		Increment 'pointer' by 1 only paying attention to the ItemPointer's
+ *		type's range limits and not MaxOffsetNumber and FirstOffsetNumber.
+ *		This may result in 'pointer' becoming !OffsetNumberIsValid.
+ *
+ * If the pointer is already the maximum possible values permitted by the
+ * range of the ItemPointer's types, then do nothing.
+ */
+void
+ItemPointerInc(ItemPointer pointer)
+{
+	BlockNumber blk = ItemPointerGetBlockNumberNoCheck(pointer);
+	OffsetNumber off = ItemPointerGetOffsetNumberNoCheck(pointer);
+
+	if (off == PG_UINT16_MAX)
+	{
+		if (blk != InvalidBlockNumber)
+		{
+			off = 0;
+			blk++;
+		}
+	}
+	else
+		off++;
+
+	ItemPointerSet(pointer, blk, off);
+}
+
+/*
+ * ItemPointerDec
+ *		Decrement 'pointer' by 1 only paying attention to the ItemPointer's
+ *		type's range limits and not MaxOffsetNumber and FirstOffsetNumber.
+ *		This may result in 'pointer' becoming !OffsetNumberIsValid.
+ *
+ * If the pointer is already the minimum possible values permitted by the
+ * range of the ItemPointer's types, then do nothing.
+ */
+void
+ItemPointerDec(ItemPointer pointer)
+{
+	BlockNumber blk = ItemPointerGetBlockNumberNoCheck(pointer);
+	OffsetNumber off = ItemPointerGetOffsetNumberNoCheck(pointer);
+
+	if (off == 0)
+	{
+		if (blk != 0)
+		{
+			off = PG_UINT16_MAX;
+			blk--;
+		}
+	}
+	else
+		off--;
+
+	ItemPointerSet(pointer, blk, off);
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 60e5cd3109..bc0936bc2d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -121,7 +121,11 @@ extern void heap_endscan(TableScanDesc scan);
 extern HeapTuple heap_getnext(TableScanDesc scan, ScanDirection direction);
 extern bool heap_getnextslot(TableScanDesc sscan,
 							 ScanDirection direction, struct TupleTableSlot *slot);
-
+extern void heap_set_tidrange(TableScanDesc sscan, ItemPointer mintid,
+							  ItemPointer maxtid);
+extern bool heap_getnextslot_tidrange(TableScanDesc sscan,
+									  ScanDirection direction,
+									  TupleTableSlot *slot);
 extern bool heap_fetch(Relation relation, Snapshot snapshot,
 					   HeapTuple tuple, Buffer *userbuf);
 extern bool heap_hot_search_buffer(ItemPointer tid, Relation relation,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 005f3fdd2b..0ef6d8edf7 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -36,6 +36,10 @@ typedef struct TableScanDescData
 	int			rs_nkeys;		/* number of scan keys */
 	struct ScanKeyData *rs_key; /* array of scan key descriptors */
 
+	/* Range of ItemPointers for table_scan_getnextslot_tidrange() to scan. */
+	ItemPointerData rs_mintid;
+	ItemPointerData rs_maxtid;
+
 	/*
 	 * Information about type and behaviour of the scan, a bitmask of members
 	 * of the ScanOptions enum (see tableam.h).
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 33bffb6815..767d2f86de 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -49,18 +49,19 @@ typedef enum ScanOptions
 	SO_TYPE_BITMAPSCAN = 1 << 1,
 	SO_TYPE_SAMPLESCAN = 1 << 2,
 	SO_TYPE_TIDSCAN = 1 << 3,
-	SO_TYPE_ANALYZE = 1 << 4,
+	SO_TYPE_TIDRANGESCAN = 1 << 4,
+	SO_TYPE_ANALYZE = 1 << 5,
 
 	/* several of SO_ALLOW_* may be specified */
 	/* allow or disallow use of access strategy */
-	SO_ALLOW_STRAT = 1 << 5,
+	SO_ALLOW_STRAT = 1 << 6,
 	/* report location to syncscan logic? */
-	SO_ALLOW_SYNC = 1 << 6,
+	SO_ALLOW_SYNC = 1 << 7,
 	/* verify visibility page-at-a-time? */
-	SO_ALLOW_PAGEMODE = 1 << 7,
+	SO_ALLOW_PAGEMODE = 1 << 8,
 
 	/* unregister snapshot at scan end? */
-	SO_TEMP_SNAPSHOT = 1 << 8
+	SO_TEMP_SNAPSHOT = 1 << 9
 } ScanOptions;
 
 /*
@@ -325,6 +326,30 @@ typedef struct TableAmRoutine
 									 ScanDirection direction,
 									 TupleTableSlot *slot);
 
+	/*-----------
+	 * Optional functions to provide scanning for ranges of ItemPointers.
+	 * Implementations must either provide both of these functions, or neither
+	 * of them.
+	 *
+	 * Implementations of scan_set_tidrange must themselves handle
+	 * ItemPointers of any value. i.e, they must handle each of the following:
+	 *
+	 * 1) mintid or maxtid is beyond the end of the table; and
+	 * 2) mintid is above maxtid; and
+	 * 3) item offset for mintid or maxtid is beyond the maximum offset
+	 * allowed by the AM.
+	 */
+	void		(*scan_set_tidrange) (TableScanDesc scan,
+									  ItemPointer mintid,
+									  ItemPointer maxtid);
+
+	/*
+	 * Return next tuple from `scan` that's in the range of TIDs defined by
+	 * scan_set_tidrange.
+	 */
+	bool		(*scan_getnextslot_tidrange) (TableScanDesc scan,
+											  ScanDirection direction,
+											  TupleTableSlot *slot);
 
 	/* ------------------------------------------------------------------------
 	 * Parallel table scan related functions.
@@ -1015,6 +1040,62 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 	return sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, slot);
 }
 
+/* ----------------------------------------------------------------------------
+ * TID Range scanning related functions.
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * table_beginscan_tidrange is the entry point for setting up a TableScanDesc
+ * for a TID range scan.
+ */
+static inline TableScanDesc
+table_beginscan_tidrange(Relation rel, Snapshot snapshot,
+						 ItemPointer mintid,
+						 ItemPointer maxtid)
+{
+	TableScanDesc sscan;
+	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+
+	sscan = rel->rd_tableam->scan_begin(rel, snapshot, 0, NULL, NULL, flags);
+
+	/* Set the range of TIDs to scan */
+	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
+
+	return sscan;
+}
+
+/*
+ * table_set_tidrange resets the minimum and maximum TID range to scan for a
+ * TableScanDesc created by table_beginscan_tidrange.
+ */
+static inline void
+table_set_tidrange(TableScanDesc sscan, ItemPointer mintid,
+				   ItemPointer maxtid)
+{
+	/* Ensure table_beginscan_tidrange() was used. */
+	Assert((sscan->rs_flags & SO_TYPE_TIDRANGESCAN) != 0);
+
+	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
+}
+
+/*
+ * Fetch the next tuple from `sscan` for a TID range scan created by
+ * table_beginscan_tidrange().  Stores the tuple in `slot` and returns true,
+ * or returns false if no more tuples exist in the range.
+ */
+static inline bool
+table_scan_getnextslot_tidrange(TableScanDesc sscan, ScanDirection direction,
+								TupleTableSlot *slot)
+{
+	/* Ensure the TID range was properly set */
+	Assert((sscan->rs_flags & SO_TYPE_TIDRANGESCAN) != 0);
+
+	return sscan->rs_rd->rd_tableam->scan_getnextslot_tidrange(sscan,
+															   direction,
+															   slot);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Parallel table scan related functions.
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 0d4eac8f96..85395a81ee 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -237,15 +237,15 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git a/src/include/executor/nodeTidrangescan.h b/src/include/executor/nodeTidrangescan.h
new file mode 100644
index 0000000000..e53783a3bf
--- /dev/null
+++ b/src/include/executor/nodeTidrangescan.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeTidrangescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODETIDRANGESCAN_H
+#define NODETIDRANGESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern TidRangeScanState *ExecInitTidRangeScan(TidRangeScan *node,
+											   EState *estate, int eflags);
+extern void ExecEndTidRangeScan(TidRangeScanState *node);
+extern void ExecReScanTidRangeScan(TidRangeScanState *node);
+
+#endif							/* NODETIDRANGESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 943931f65d..e31ad6204e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1624,6 +1624,24 @@ typedef struct TidScanState
 	HeapTupleData tss_htup;
 } TidScanState;
 
+/* ----------------
+ *	 TidRangeScanState information
+ *
+ *		trss_tidexprs		list of TidOpExpr structs (see nodeTidrangescan.c)
+ *		trss_mintid			the lowest TID in the scan range
+ *		trss_maxtid			the highest TID in the scan range
+ *		trss_inScan			is a scan currently in progress?
+ * ----------------
+ */
+typedef struct TidRangeScanState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	List	   *trss_tidexprs;
+	ItemPointerData trss_mintid;
+	ItemPointerData trss_maxtid;
+	bool		trss_inScan;
+} TidRangeScanState;
+
 /* ----------------
  *	 SubqueryScanState information
  *
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 40ae489c23..e22df890ef 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -59,6 +59,7 @@ typedef enum NodeTag
 	T_BitmapIndexScan,
 	T_BitmapHeapScan,
 	T_TidScan,
+	T_TidRangeScan,
 	T_SubqueryScan,
 	T_FunctionScan,
 	T_ValuesScan,
@@ -116,6 +117,7 @@ typedef enum NodeTag
 	T_BitmapIndexScanState,
 	T_BitmapHeapScanState,
 	T_TidScanState,
+	T_TidRangeScanState,
 	T_SubqueryScanState,
 	T_FunctionScanState,
 	T_TableFuncScanState,
@@ -229,6 +231,7 @@ typedef enum NodeTag
 	T_BitmapAndPath,
 	T_BitmapOrPath,
 	T_TidPath,
+	T_TidRangePath,
 	T_SubqueryScanPath,
 	T_ForeignPath,
 	T_CustomPath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ec93e648c..b8a6e0fc9f 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -621,6 +621,10 @@ typedef struct PartitionSchemeData *PartitionScheme;
  * to simplify matching join clauses to those lists.
  *----------
  */
+
+/* Bitmask of flags supported by table AMs */
+#define AMFLAG_HAS_TID_RANGE (1 << 0)
+
 typedef enum RelOptKind
 {
 	RELOPT_BASEREL,
@@ -710,6 +714,8 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	uint32		amflags;		/* Bitmask of optional features supported by
+								 * the table AM */
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
@@ -1323,6 +1329,18 @@ typedef struct TidPath
 	List	   *tidquals;		/* qual(s) involving CTID = something */
 } TidPath;
 
+/*
+ * TidRangePath represents a scan by a continguous range of TIDs
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ */
+typedef struct TidRangePath
+{
+	Path		path;
+	List	   *tidrangequals;
+} TidRangePath;
+
 /*
  * SubqueryScanPath represents a scan of an unflattened subquery-in-FROM
  *
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 43160439f0..6e62104d0b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -485,6 +485,19 @@ typedef struct TidScan
 	List	   *tidquals;		/* qual(s) involving CTID = something */
 } TidScan;
 
+/* ----------------
+ *		tid range scan node
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ * ----------------
+ */
+typedef struct TidRangeScan
+{
+	Scan		scan;
+	List	   *tidrangequals;	/* qual(s) involving CTID op something */
+} TidRangeScan;
+
 /* ----------------
  *		subquery scan node
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ed2e4af4be..1be93be098 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -83,6 +83,9 @@ extern void cost_bitmap_or_node(BitmapOrPath *path, PlannerInfo *root);
 extern void cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec);
 extern void cost_tidscan(Path *path, PlannerInfo *root,
 						 RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info);
+extern void cost_tidrangescan(Path *path, PlannerInfo *root,
+							  RelOptInfo *baserel, List *tidrangequals,
+							  ParamPathInfo *param_info);
 extern void cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
 							  RelOptInfo *baserel, ParamPathInfo *param_info);
 extern void cost_functionscan(Path *path, PlannerInfo *root,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 8dfc36a4e1..54f4b782fc 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,6 +63,10 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 										   List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 									List *tidquals, Relids required_outer);
+extern TidRangePath *create_tidrangescan_path(PlannerInfo *root,
+											  RelOptInfo *rel,
+											  List *tidrangequals,
+											  Relids required_outer);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 									  List *subpaths, List *partial_subpaths,
 									  List *pathkeys, Relids required_outer,
diff --git a/src/include/storage/itemptr.h b/src/include/storage/itemptr.h
index 0e6990140b..cd4b8fbacb 100644
--- a/src/include/storage/itemptr.h
+++ b/src/include/storage/itemptr.h
@@ -202,5 +202,7 @@ typedef ItemPointerData *ItemPointer;
 
 extern bool ItemPointerEquals(ItemPointer pointer1, ItemPointer pointer2);
 extern int32 ItemPointerCompare(ItemPointer arg1, ItemPointer arg2);
+extern void ItemPointerInc(ItemPointer pointer);
+extern void ItemPointerDec(ItemPointer pointer);
 
 #endif							/* ITEMPTR_H */
diff --git a/src/test/regress/expected/tidrangescan.out b/src/test/regress/expected/tidrangescan.out
new file mode 100644
index 0000000000..0384304c7f
--- /dev/null
+++ b/src/test/regress/expected/tidrangescan.out
@@ -0,0 +1,302 @@
+-- tests for tidrangescans
+SET enable_seqscan TO off;
+CREATE TABLE tidrangescan(id integer, data text);
+-- insert enough tuples to fill at least two pages
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,200) AS s(i);
+-- remove all tuples after the 10th tuple on each page.  Trying to ensure
+-- we get the same layout with all CPU architectures and smaller than standard
+-- page sizes.
+DELETE FROM tidrangescan
+WHERE substring(ctid::text FROM ',(\d+)\)')::integer > 10 OR substring(ctid::text FROM '\((\d+),')::integer > 2;
+VACUUM tidrangescan;
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid <= '(1,5)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+ (1,1)
+ (1,2)
+ (1,3)
+ (1,4)
+ (1,5)
+(15 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(0,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid > '(2,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+  ctid  
+--------
+ (2,9)
+ (2,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ('(2,8)'::tid < ctid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+  ctid  
+--------
+ (2,9)
+ (2,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(2,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+  ctid  
+--------
+ (2,8)
+ (2,9)
+ (2,10)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(100,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ((ctid > '(1,4)'::tid) AND ('(1,7)'::tid >= ctid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+ ctid  
+-------
+ (1,5)
+ (1,6)
+ (1,7)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (('(1,7)'::tid >= ctid) AND (ctid > '(1,4)'::tid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+ ctid  
+-------
+ (1,5)
+ (1,6)
+ (1,7)
+(3 rows)
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan WHERE ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(4294967295,65535)';
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid 
+------
+(0 rows)
+
+-- NULLs in the range cannot return tuples
+SELECT ctid FROM tidrangescan WHERE ctid >= (SELECT NULL::tid);
+ ctid 
+------
+(0 rows)
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+ ctid 
+------
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid > '(9,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+ ctid 
+------
+(0 rows)
+
+-- rescans
+EXPLAIN (COSTS OFF)
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+                  QUERY PLAN                   
+-----------------------------------------------
+ Nested Loop
+   ->  Tid Range Scan on tidrangescan t
+         TID Cond: (ctid < '(1,0)'::tid)
+   ->  Aggregate
+         ->  Tid Range Scan on tidrangescan t2
+               TID Cond: (ctid <= t.ctid)
+(6 rows)
+
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+  ctid  | c  
+--------+----
+ (0,1)  |  1
+ (0,2)  |  2
+ (0,3)  |  3
+ (0,4)  |  4
+ (0,5)  |  5
+ (0,6)  |  6
+ (0,7)  |  7
+ (0,8)  |  8
+ (0,9)  |  9
+ (0,10) | 10
+(10 rows)
+
+-- cursors
+-- Ensure we get a TID Range scan without a Materialize node.
+EXPLAIN (COSTS OFF)
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH NEXT c;
+ ctid  
+-------
+ (0,2)
+(1 row)
+
+FETCH PRIOR c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH FIRST c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH LAST c;
+  ctid  
+--------
+ (0,10)
+(1 row)
+
+COMMIT;
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
+RESET enable_seqscan;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 12bb67e491..c77b0d7342 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -80,7 +80,7 @@ test: brin gin gist spgist privileges init_privs security_label collate matview
 # ----------
 # Another group of parallel tests
 # ----------
-test: create_table_like alter_generic alter_operator misc async dbsize misc_functions sysviews tsrf tid tidscan collate.icu.utf8 incremental_sort
+test: create_table_like alter_generic alter_operator misc async dbsize misc_functions sysviews tsrf tid tidscan tidrangescan collate.icu.utf8 incremental_sort
 
 # rules cannot run concurrently with any test that creates
 # a view or rule in the public schema
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 59b416fd80..0264a97324 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -138,6 +138,7 @@ test: sysviews
 test: tsrf
 test: tid
 test: tidscan
+test: tidrangescan
 test: collate.icu.utf8
 test: rules
 test: psql
diff --git a/src/test/regress/sql/tidrangescan.sql b/src/test/regress/sql/tidrangescan.sql
new file mode 100644
index 0000000000..2da35807ff
--- /dev/null
+++ b/src/test/regress/sql/tidrangescan.sql
@@ -0,0 +1,104 @@
+-- tests for tidrangescans
+
+SET enable_seqscan TO off;
+CREATE TABLE tidrangescan(id integer, data text);
+
+-- insert enough tuples to fill at least two pages
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,200) AS s(i);
+
+-- remove all tuples after the 10th tuple on each page.  Trying to ensure
+-- we get the same layout with all CPU architectures and smaller than standard
+-- page sizes.
+DELETE FROM tidrangescan
+WHERE substring(ctid::text FROM ',(\d+)\)')::integer > 10 OR substring(ctid::text FROM '\((\d+),')::integer > 2;
+VACUUM tidrangescan;
+
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan WHERE ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)' LIMIT 1;
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(4294967295,65535)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- NULLs in the range cannot return tuples
+SELECT ctid FROM tidrangescan WHERE ctid >= (SELECT NULL::tid);
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+
+-- rescans
+EXPLAIN (COSTS OFF)
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+
+-- cursors
+
+-- Ensure we get a TID Range scan without a Materialize node.
+EXPLAIN (COSTS OFF)
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+FETCH NEXT c;
+FETCH PRIOR c;
+FETCH FIRST c;
+FETCH LAST c;
+COMMIT;
+
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
+
+RESET enable_seqscan;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bab4f3adb3..c57682ba15 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2530,8 +2530,13 @@ TextPositionState
 TheLexeme
 TheSubstitute
 TidExpr
+TidExprType
 TidHashKey
+TidOpExpr
 TidPath
+TidRangePath
+TidRangeScan
+TidRangeScanState
 TidScan
 TidScanState
 TimeADT
-- 
2.27.0

#89

David Fetter

david@fetter.org

almost 5 years ago

In reply to: David Rowley (#88)

Re: Tid scan improvements

On Tue, Feb 16, 2021 at 10:22:41PM +1300, David Rowley wrote:

On Thu, 4 Feb 2021 at 23:51, David Rowley <dgrowleyml@gmail.com> wrote:

Updated patch attached.

I made another pass over this patch and did a bit of renaming work
around the heap_* functions and the tableam functions. I think the new
names are a bit more aligned to the existing names.

Thanks! I'm looking forward to making use of this :)

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#90

Andres Freund

andres@anarazel.de

almost 5 years ago

In reply to: David Rowley (#88)

Re: Tid scan improvements

Hi,

On 2021-02-04 23:51:39 +1300, David Rowley wrote:

I ended up adding just two new API functions to table AM.

void (*scan_set_tid_range) (TableScanDesc sscan,
ItemPointer mintid,
ItemPointer maxtid);

and
bool (*scan_tid_range_nextslot) (TableScanDesc sscan,
ScanDirection direction,
TupleTableSlot *slot);

I added an additional function in tableam.h that does not have a
corresponding API function:

static inline TableScanDesc
table_tid_range_start(Relation rel, Snapshot snapshot,
ItemPointer mintid,
ItemPointer maxtid)

This just calls the standard scan_begin then calls scan_set_tid_range
setting the specified mintid and maxtid.

Hm. But that means we can't rescan?

I also added 2 new fields to TableScanDesc:

ItemPointerData rs_mintid;
ItemPointerData rs_maxtid;

I didn't quite see a need to have a new start and end scan API function.

Yea. I guess they're not that large. Avoiding that was one of the two
reasons to have a separate scan state somewhere. The other that it
seemed like it'd possibly a bit cleaner API wise to deal with rescan.

+bool
+heap_getnextslot_tidrange(TableScanDesc sscan, ScanDirection direction,
+						  TupleTableSlot *slot)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+	ItemPointer mintid = &sscan->rs_mintid;
+	ItemPointer maxtid = &sscan->rs_maxtid;
+
+	/* Note: no locking manipulations needed */
+	for (;;)
+	{
+		if (sscan->rs_flags & SO_ALLOW_PAGEMODE)
+			heapgettup_pagemode(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+		else
+			heapgettup(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+
+		if (scan->rs_ctup.t_data == NULL)
+		{
+			ExecClearTuple(slot);
+			return false;
+		}
+
+		/*
+		 * heap_set_tidrange will have used heap_setscanlimits to limit the
+		 * range of pages we scan to only ones that can contain the TID range
+		 * we're scanning for.  Here we must filter out any tuples from these
+		 * pages that are outwith that range.
+		 */
+		if (ItemPointerCompare(&scan->rs_ctup.t_self, mintid) < 0)
+		{
+			ExecClearTuple(slot);
+
+			/*
+			 * When scanning backwards, the TIDs will be in descending order.
+			 * Future tuples in this direction will be lower still, so we can
+			 * just return false to indicate there will be no more tuples.
+			 */
+			if (ScanDirectionIsBackward(direction))
+				return false;
+
+			continue;
+		}
+
+		/*
+		 * Likewise for the final page, we must filter out TIDs greater than
+		 * maxtid.
+		 */
+		if (ItemPointerCompare(&scan->rs_ctup.t_self, maxtid) > 0)
+		{
+			ExecClearTuple(slot);
+
+			/*
+			 * When scanning forward, the TIDs will be in ascending order.
+			 * Future tuples in this direction will be higher still, so we can
+			 * just return false to indicate there will be no more tuples.
+			 */
+			if (ScanDirectionIsForward(direction))
+				return false;
+			continue;
+		}
+
+		break;
+	}
+
+	/*
+	 * if we get here it means we have a new current scan tuple, so point to
+	 * the proper return buffer and return the tuple.
+	 */
+	pgstat_count_heap_getnext(scan->rs_base.rs_rd);

I wonder if there's an argument for counting the misses above via
pgstat_count_heap_fetch()? Probably not, right?

+#define IsCTIDVar(node)  \
+	((node) != NULL && \
+	 IsA((node), Var) && \
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
+	 ((Var *) (node))->varlevelsup == 0)
+
+typedef enum
+{
+	TIDEXPR_UPPER_BOUND,
+	TIDEXPR_LOWER_BOUND
+} TidExprType;
+
+/* Upper or lower range bound for scan */
+typedef struct TidOpExpr
+{
+	TidExprType exprtype;		/* type of op; lower or upper */
+	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
+	bool		inclusive;		/* whether op is inclusive */
+} TidOpExpr;
+
+/*
+ * For the given 'expr', build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.
+ */
+static TidOpExpr *
+MakeTidOpExpr(OpExpr *expr, TidRangeScanState *tidstate)
+{
+	Node	   *arg1 = get_leftop((Expr *) expr);
+	Node	   *arg2 = get_rightop((Expr *) expr);
+	ExprState  *exprstate = NULL;
+	bool		invert = false;
+	TidOpExpr  *tidopexpr;
+
+	if (IsCTIDVar(arg1))
+		exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
+	else if (IsCTIDVar(arg2))
+	{
+		exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
+		invert = true;
+	}
+	else
+		elog(ERROR, "could not identify CTID variable");
+
+	tidopexpr = (TidOpExpr *) palloc(sizeof(TidOpExpr));
+	tidopexpr->inclusive = false;	/* for now */
+
+	switch (expr->opno)
+	{
+		case TIDLessEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDLessOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
+			break;
+		case TIDGreaterEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDGreaterOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
+			break;
+		default:
+			elog(ERROR, "could not identify CTID operator");
+	}
+
+	tidopexpr->exprstate = exprstate;
+
+	return tidopexpr;
+}
+
+/*
+ * Extract the qual subexpressions that yield TIDs to search for,
+ * and compile them into ExprStates if they're ordinary expressions.
+ */
+static void
+TidExprListCreate(TidRangeScanState *tidrangestate)
+{
+	TidRangeScan *node = (TidRangeScan *) tidrangestate->ss.ps.plan;
+	List	   *tidexprs = NIL;
+	ListCell   *l;
+
+	foreach(l, node->tidrangequals)
+	{
+		OpExpr	   *opexpr = lfirst(l);
+		TidOpExpr  *tidopexpr;
+
+		if (!IsA(opexpr, OpExpr))
+			elog(ERROR, "could not identify CTID expression");
+
+		tidopexpr = MakeTidOpExpr(opexpr, tidrangestate);
+		tidexprs = lappend(tidexprs, tidopexpr);
+	}
+
+	tidrangestate->trss_tidexprs = tidexprs;
+}

Architecturally it feels like this is something that really belongs more
into plan time?

+/*
+ * table_set_tidrange resets the minimum and maximum TID range to scan for a
+ * TableScanDesc created by table_beginscan_tidrange.
+ */
+static inline void
+table_set_tidrange(TableScanDesc sscan, ItemPointer mintid,
+				   ItemPointer maxtid)
+{
+	/* Ensure table_beginscan_tidrange() was used. */
+	Assert((sscan->rs_flags & SO_TYPE_TIDRANGESCAN) != 0);
+
+	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
+}

How does this interact with rescans?

Greetings,

Andres Freund

#91

David Rowley

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Andres Freund (#90)

Re: Tid scan improvements

Thanks for having a look at this.

On Wed, 17 Feb 2021 at 11:05, Andres Freund <andres@anarazel.de> wrote:

On 2021-02-04 23:51:39 +1300, David Rowley wrote:

and
bool (*scan_tid_range_nextslot) (TableScanDesc sscan,
ScanDirection direction,
TupleTableSlot *slot);

I added an additional function in tableam.h that does not have a
corresponding API function:

static inline TableScanDesc
table_tid_range_start(Relation rel, Snapshot snapshot,
ItemPointer mintid,
ItemPointer maxtid)

This just calls the standard scan_begin then calls scan_set_tid_range
setting the specified mintid and maxtid.

Hm. But that means we can't rescan?

It might not be perfect, but to rescan, we must call table_rescan()
then table_set_tidrange() before calling the
table_scan_getnextslot_tidrange() function.

+bool
+heap_getnextslot_tidrange(TableScanDesc sscan, ScanDirection direction,
+                                               TupleTableSlot *slot)
+{
+     HeapScanDesc scan = (HeapScanDesc) sscan;
+     ItemPointer mintid = &sscan->rs_mintid;
+     ItemPointer maxtid = &sscan->rs_maxtid;
+
+     /* Note: no locking manipulations needed */
+     for (;;)
+     {
+             if (sscan->rs_flags & SO_ALLOW_PAGEMODE)
+                     heapgettup_pagemode(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+             else
+                     heapgettup(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+
+             if (scan->rs_ctup.t_data == NULL)
+             {
+                     ExecClearTuple(slot);
+                     return false;
+             }
+
+             /*
+              * heap_set_tidrange will have used heap_setscanlimits to limit the
+              * range of pages we scan to only ones that can contain the TID range
+              * we're scanning for.  Here we must filter out any tuples from these
+              * pages that are outwith that range.
+              */
+             if (ItemPointerCompare(&scan->rs_ctup.t_self, mintid) < 0)
+             {
+                     ExecClearTuple(slot);
+
+                     /*
+                      * When scanning backwards, the TIDs will be in descending order.
+                      * Future tuples in this direction will be lower still, so we can
+                      * just return false to indicate there will be no more tuples.
+                      */
+                     if (ScanDirectionIsBackward(direction))
+                             return false;
+
+                     continue;
+             }
+
+             /*
+              * Likewise for the final page, we must filter out TIDs greater than
+              * maxtid.
+              */
+             if (ItemPointerCompare(&scan->rs_ctup.t_self, maxtid) > 0)
+             {
+                     ExecClearTuple(slot);
+
+                     /*
+                      * When scanning forward, the TIDs will be in ascending order.
+                      * Future tuples in this direction will be higher still, so we can
+                      * just return false to indicate there will be no more tuples.
+                      */
+                     if (ScanDirectionIsForward(direction))
+                             return false;
+                     continue;
+             }
+
+             break;
+     }
+
+     /*
+      * if we get here it means we have a new current scan tuple, so point to
+      * the proper return buffer and return the tuple.
+      */
+     pgstat_count_heap_getnext(scan->rs_base.rs_rd);

I wonder if there's an argument for counting the misses above via
pgstat_count_heap_fetch()? Probably not, right?

I'm a bit undecided about that. In theory, we're doing the heap
fetches of tuples on the target page which are outside of the range so
we should maybe count them. On the other hand, it might be a little
confusing for very observant users if they see the fetches going up
for the tuples we skip over in TID Range scans.

+#define IsCTIDVar(node)  \
+     ((node) != NULL && \
+      IsA((node), Var) && \
+      ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
+      ((Var *) (node))->varlevelsup == 0)
+
+typedef enum
+{
+     TIDEXPR_UPPER_BOUND,
+     TIDEXPR_LOWER_BOUND
+} TidExprType;
+
+/* Upper or lower range bound for scan */
+typedef struct TidOpExpr
+{
+     TidExprType exprtype;           /* type of op; lower or upper */
+     ExprState  *exprstate;          /* ExprState for a TID-yielding subexpr */
+     bool            inclusive;              /* whether op is inclusive */
+} TidOpExpr;
+
+/*
+ * For the given 'expr', build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.
+ */
+static TidOpExpr *
+MakeTidOpExpr(OpExpr *expr, TidRangeScanState *tidstate)
+{
+     Node       *arg1 = get_leftop((Expr *) expr);
+     Node       *arg2 = get_rightop((Expr *) expr);
+     ExprState  *exprstate = NULL;
+     bool            invert = false;
+     TidOpExpr  *tidopexpr;
+
+     if (IsCTIDVar(arg1))
+             exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
+     else if (IsCTIDVar(arg2))
+     {
+             exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
+             invert = true;
+     }
+     else
+             elog(ERROR, "could not identify CTID variable");
+
+     tidopexpr = (TidOpExpr *) palloc(sizeof(TidOpExpr));
+     tidopexpr->inclusive = false;   /* for now */
+
+     switch (expr->opno)
+     {
+             case TIDLessEqOperator:
+                     tidopexpr->inclusive = true;
+                     /* fall through */
+             case TIDLessOperator:
+                     tidopexpr->exprtype = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
+                     break;
+             case TIDGreaterEqOperator:
+                     tidopexpr->inclusive = true;
+                     /* fall through */
+             case TIDGreaterOperator:
+                     tidopexpr->exprtype = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
+                     break;
+             default:
+                     elog(ERROR, "could not identify CTID operator");
+     }
+
+     tidopexpr->exprstate = exprstate;
+
+     return tidopexpr;
+}
+
+/*
+ * Extract the qual subexpressions that yield TIDs to search for,
+ * and compile them into ExprStates if they're ordinary expressions.
+ */
+static void
+TidExprListCreate(TidRangeScanState *tidrangestate)
+{
+     TidRangeScan *node = (TidRangeScan *) tidrangestate->ss.ps.plan;
+     List       *tidexprs = NIL;
+     ListCell   *l;
+
+     foreach(l, node->tidrangequals)
+     {
+             OpExpr     *opexpr = lfirst(l);
+             TidOpExpr  *tidopexpr;
+
+             if (!IsA(opexpr, OpExpr))
+                     elog(ERROR, "could not identify CTID expression");
+
+             tidopexpr = MakeTidOpExpr(opexpr, tidrangestate);
+             tidexprs = lappend(tidexprs, tidopexpr);
+     }
+
+     tidrangestate->trss_tidexprs = tidexprs;
+}

Architecturally it feels like this is something that really belongs more
into plan time?

Possibly. It would mean TidOpExpr would have to become a Node type.
TID Range scan is really just following the lead set by TID Scan here.

I'm not sure if it's worth the trouble making these Node types for the
small gains there'd be in the performance of having the planner make
them once for prepared queries rather than them having to be built
during each execution.

Do you think it is?

+/*
+ * table_set_tidrange resets the minimum and maximum TID range to scan for a
+ * TableScanDesc created by table_beginscan_tidrange.
+ */
+static inline void
+table_set_tidrange(TableScanDesc sscan, ItemPointer mintid,
+                                ItemPointer maxtid)
+{
+     /* Ensure table_beginscan_tidrange() was used. */
+     Assert((sscan->rs_flags & SO_TYPE_TIDRANGESCAN) != 0);
+
+     sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
+}

How does this interact with rescans?

We must call table_rescan() before calling table_set_tidrange() again.
That perhaps could be documented better. I'm just unsure if that
should be documented in tableam.h or if it's a restriction that only
needs to exist in heapam.c

David

#92

David Rowley

dgrowleyml@gmail.com

almost 5 years ago

In reply to: David Rowley (#91)

1 attachment(s)

Re: Tid scan improvements

On Thu, 18 Feb 2021 at 09:45, David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 17 Feb 2021 at 11:05, Andres Freund <andres@anarazel.de> wrote:

Architecturally it feels like this is something that really belongs more
into plan time?

Possibly. It would mean TidOpExpr would have to become a Node type.
TID Range scan is really just following the lead set by TID Scan here.

I'm not sure if it's worth the trouble making these Node types for the
small gains there'd be in the performance of having the planner make
them once for prepared queries rather than them having to be built
during each execution.

I changed the code around and added a new Node type to the planner and
made it create the TidRangeExpr during planning.

However, I'm pretty much set on this being pretty horrible and I ended
up ripping it back out again. The reason is that there's quite a bit
of extra boilerplate code that goes with the new node type. e.g it
must be handled in setrefs.c. EXPLAIN also needs to know about the
new Node. That either means teaching the deparse code about
TidRangeExprs or having the Plan node carry along the OpExprs just so
we can make EXPLAIN work. Translating between the two might be
possible but it just seemed too much code and I started feeling pretty
bad about the whole idea.

+/*
+ * table_set_tidrange resets the minimum and maximum TID range to scan for a
+ * TableScanDesc created by table_beginscan_tidrange.
+ */
+static inline void
+table_set_tidrange(TableScanDesc sscan, ItemPointer mintid,
+                                ItemPointer maxtid)
+{
+     /* Ensure table_beginscan_tidrange() was used. */
+     Assert((sscan->rs_flags & SO_TYPE_TIDRANGESCAN) != 0);
+
+     sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
+}
How does this interact with rescans?
We must call table_rescan() before calling table_set_tidrange() again.
That perhaps could be documented better. I'm just unsure if that
should be documented in tableam.h or if it's a restriction that only
needs to exist in heapam.c

I've changed things around so that we no longer explicitly call
table_rescan() in nodeTidrangescan.c. Instead table_set_tidrange()
does a rescan call. I also adjusted the documentation to mention that
changing the tid range starts the scan again. This does mean we'll do
a ->scan_rescan() the first time we do table_set_tidrange(). I'm not
all that sure that matters.

v15 attached.

David

Attachments:

v15-0001-Add-TID-Range-Scans-to-support-efficient-scannin.patchtext/plain; charset=US-ASCII; name=v15-0001-Add-TID-Range-Scans-to-support-efficient-scannin.patchDownload

From a466597c7b71a004f905b2c4dc5de2735365c992 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Thu, 21 Jan 2021 16:48:15 +1300
Subject: [PATCH v15] Add TID Range Scans to support efficient scanning ranges
 of TIDs

This adds a new node type named TID Range Scan.  The query planner will
generate paths for TID Range scans when quals are discovered on base
relations which search for ranges of ctid.  These ranges may be open at
either end.

To support this, a new optional callback function has been added to table
AM which is named scan_getnextslot_inrange.  This function accepts a
minimum and maximum ItemPointer to allow efficient retrieval of tuples
within this range.  Table AMs where scanning ranges of TIDs does not make
sense or is difficult to implement efficiently may choose to not implement
this function.

Author: Edmund Horner and David Rowley
Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com
---
 src/backend/access/heap/heapam.c           | 147 ++++++++
 src/backend/access/heap/heapam_handler.c   |   3 +
 src/backend/commands/explain.c             |  23 ++
 src/backend/executor/Makefile              |   1 +
 src/backend/executor/execAmi.c             |   6 +
 src/backend/executor/execProcnode.c        |  10 +
 src/backend/executor/nodeTidrangescan.c    | 413 +++++++++++++++++++++
 src/backend/nodes/copyfuncs.c              |  24 ++
 src/backend/nodes/outfuncs.c               |  14 +
 src/backend/optimizer/README               |   1 +
 src/backend/optimizer/path/costsize.c      |  95 +++++
 src/backend/optimizer/path/tidpath.c       | 117 +++++-
 src/backend/optimizer/plan/createplan.c    |  98 +++++
 src/backend/optimizer/plan/setrefs.c       |  16 +
 src/backend/optimizer/plan/subselect.c     |   6 +
 src/backend/optimizer/util/pathnode.c      |  29 ++
 src/backend/optimizer/util/plancat.c       |   6 +
 src/backend/optimizer/util/relnode.c       |   3 +
 src/backend/storage/page/itemptr.c         |  58 +++
 src/include/access/heapam.h                |   6 +-
 src/include/access/relscan.h               |   4 +
 src/include/access/tableam.h               |  93 ++++-
 src/include/catalog/pg_operator.dat        |   6 +-
 src/include/executor/nodeTidrangescan.h    |  23 ++
 src/include/nodes/execnodes.h              |  18 +
 src/include/nodes/nodes.h                  |   3 +
 src/include/nodes/pathnodes.h              |  18 +
 src/include/nodes/plannodes.h              |  13 +
 src/include/optimizer/cost.h               |   3 +
 src/include/optimizer/pathnode.h           |   4 +
 src/include/storage/itemptr.h              |   2 +
 src/test/regress/expected/tidrangescan.out | 302 +++++++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/serial_schedule           |   1 +
 src/test/regress/sql/tidrangescan.sql      | 104 ++++++
 src/tools/pgindent/typedefs.list           |   5 +
 36 files changed, 1656 insertions(+), 21 deletions(-)
 create mode 100644 src/backend/executor/nodeTidrangescan.c
 create mode 100644 src/include/executor/nodeTidrangescan.h
 create mode 100644 src/test/regress/expected/tidrangescan.out
 create mode 100644 src/test/regress/sql/tidrangescan.sql

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9926e2bd54..2171a12e0e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1391,6 +1391,153 @@ heap_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *s
 	return true;
 }
 
+void
+heap_set_tidrange(TableScanDesc sscan, ItemPointer mintid,
+				  ItemPointer maxtid)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+	BlockNumber startBlk;
+	BlockNumber numBlks;
+	ItemPointerData highestItem;
+	ItemPointerData lowestItem;
+
+	/*
+	 * For relations without any pages, we can simply leave the TID range
+	 * unset.  There will be no tuples to scan, therefore no tuples outside
+	 * the given TID range.
+	 */
+	if (scan->rs_nblocks == 0)
+		return;
+
+	/*
+	 * Set up some ItemPointers which point to the first and last possible
+	 * tuples in the heap.
+	 */
+	ItemPointerSet(&highestItem, scan->rs_nblocks - 1, MaxOffsetNumber);
+	ItemPointerSet(&lowestItem, 0, FirstOffsetNumber);
+
+	/*
+	 * If the given maximum TID is below the highest possible TID in the
+	 * relation, then restrict the range to that, otherwise we scan to the end
+	 * of the relation.
+	 */
+	if (ItemPointerCompare(maxtid, &highestItem) < 0)
+		ItemPointerCopy(maxtid, &highestItem);
+
+	/*
+	 * If the given minimum TID is above the lowest possible TID in the
+	 * relation, then restrict the range to only scan for TIDs above that.
+	 */
+	if (ItemPointerCompare(mintid, &lowestItem) > 0)
+		ItemPointerCopy(mintid, &lowestItem);
+
+	/*
+	 * Check for an empty range and protect from would be negative results
+	 * from the numBlks calculation below.
+	 */
+	if (ItemPointerCompare(&highestItem, &lowestItem) < 0)
+	{
+		/* Set an empty range of blocks to scan */
+		heap_setscanlimits(sscan, 0, 0);
+		return;
+	}
+
+	/*
+	 * Calculate the first block and the number of blocks we must scan. We
+	 * could be more aggressive here and perform some more validation to try
+	 * and further narrow the scope of blocks to scan by checking if the
+	 * lowerItem has an offset above MaxOffsetNumber.  In this case, we could
+	 * advance startBlk by one.  Likewise if highestItem has an offset of 0 we
+	 * could scan one fewer blocks.  However, such an optimization does not
+	 * seem worth troubling over, currently.
+	 */
+	startBlk = ItemPointerGetBlockNumberNoCheck(&lowestItem);
+
+	numBlks = ItemPointerGetBlockNumberNoCheck(&highestItem) -
+		ItemPointerGetBlockNumberNoCheck(&lowestItem) + 1;
+
+	/* Set the start block and number of blocks to scan */
+	heap_setscanlimits(sscan, startBlk, numBlks);
+
+	/* Finally, set the TID range in sscan */
+	ItemPointerCopy(&lowestItem, &sscan->rs_mintid);
+	ItemPointerCopy(&highestItem, &sscan->rs_maxtid);
+}
+
+bool
+heap_getnextslot_tidrange(TableScanDesc sscan, ScanDirection direction,
+						  TupleTableSlot *slot)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+	ItemPointer mintid = &sscan->rs_mintid;
+	ItemPointer maxtid = &sscan->rs_maxtid;
+
+	/* Note: no locking manipulations needed */
+	for (;;)
+	{
+		if (sscan->rs_flags & SO_ALLOW_PAGEMODE)
+			heapgettup_pagemode(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+		else
+			heapgettup(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+
+		if (scan->rs_ctup.t_data == NULL)
+		{
+			ExecClearTuple(slot);
+			return false;
+		}
+
+		/*
+		 * heap_set_tidrange will have used heap_setscanlimits to limit the
+		 * range of pages we scan to only ones that can contain the TID range
+		 * we're scanning for.  Here we must filter out any tuples from these
+		 * pages that are outwith that range.
+		 */
+		if (ItemPointerCompare(&scan->rs_ctup.t_self, mintid) < 0)
+		{
+			ExecClearTuple(slot);
+
+			/*
+			 * When scanning backwards, the TIDs will be in descending order.
+			 * Future tuples in this direction will be lower still, so we can
+			 * just return false to indicate there will be no more tuples.
+			 */
+			if (ScanDirectionIsBackward(direction))
+				return false;
+
+			continue;
+		}
+
+		/*
+		 * Likewise for the final page, we must filter out TIDs greater than
+		 * maxtid.
+		 */
+		if (ItemPointerCompare(&scan->rs_ctup.t_self, maxtid) > 0)
+		{
+			ExecClearTuple(slot);
+
+			/*
+			 * When scanning forward, the TIDs will be in ascending order.
+			 * Future tuples in this direction will be higher still, so we can
+			 * just return false to indicate there will be no more tuples.
+			 */
+			if (ScanDirectionIsForward(direction))
+				return false;
+			continue;
+		}
+
+		break;
+	}
+
+	/*
+	 * if we get here it means we have a new current scan tuple, so point to
+	 * the proper return buffer and return the tuple.
+	 */
+	pgstat_count_heap_getnext(scan->rs_base.rs_rd);
+
+	ExecStoreBufferHeapTuple(&scan->rs_ctup, slot, scan->rs_cbuf);
+	return true;
+}
+
 /*
  *	heap_fetch		- retrieve tuple with given tid
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 4a70e20a14..bd5faf0c1f 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2542,6 +2542,9 @@ static const TableAmRoutine heapam_methods = {
 	.scan_rescan = heap_rescan,
 	.scan_getnextslot = heap_getnextslot,
 
+	.scan_set_tidrange = heap_set_tidrange,
+	.scan_getnextslot_tidrange = heap_getnextslot_tidrange,
+
 	.parallelscan_estimate = table_block_parallelscan_estimate,
 	.parallelscan_initialize = table_block_parallelscan_initialize,
 	.parallelscan_reinitialize = table_block_parallelscan_reinitialize,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f80e379973..afc45429ba 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1057,6 +1057,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1223,6 +1224,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_TidScan:
 			pname = sname = "Tid Scan";
 			break;
+		case T_TidRangeScan:
+			pname = sname = "Tid Range Scan";
+			break;
 		case T_SubqueryScan:
 			pname = sname = "Subquery Scan";
 			break;
@@ -1417,6 +1421,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SampleScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1871,6 +1876,23 @@ ExplainNode(PlanState *planstate, List *ancestors,
 											   planstate, es);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				/*
+				 * The tidrangequals list has AND semantics, so be sure to
+				 * show it as an AND condition.
+				 */
+				List	   *tidquals = ((TidRangeScan *) plan)->tidrangequals;
+
+				if (list_length(tidquals) > 1)
+					tidquals = list_make1(make_andclause(tidquals));
+				show_scan_qual(tidquals, "TID Cond", planstate, ancestors, es);
+				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+				if (plan->qual)
+					show_instrumentation_count("Rows Removed by Filter", 1,
+											   planstate, es);
+			}
+			break;
 		case T_ForeignScan:
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
@@ -3558,6 +3580,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index f990c6473a..74ac59faa1 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -67,6 +67,7 @@ OBJS = \
 	nodeSubplan.o \
 	nodeSubqueryscan.o \
 	nodeTableFuncscan.o \
+	nodeTidrangescan.o \
 	nodeTidscan.o \
 	nodeUnique.o \
 	nodeValuesscan.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 23bdb53cd1..4543ac79ed 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -51,6 +51,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecReScanTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecReScanSubqueryScan((SubqueryScanState *) node);
 			break;
@@ -562,6 +567,7 @@ ExecSupportsBackwardScan(Plan *node)
 
 		case T_SeqScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_FunctionScan:
 		case T_ValuesScan:
 		case T_CteScan:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 414df50a05..29766d8196 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -109,6 +109,7 @@
 #include "executor/nodeSubplan.h"
 #include "executor/nodeSubqueryscan.h"
 #include "executor/nodeTableFuncscan.h"
+#include "executor/nodeTidrangescan.h"
 #include "executor/nodeTidscan.h"
 #include "executor/nodeUnique.h"
 #include "executor/nodeValuesscan.h"
@@ -238,6 +239,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 												   estate, eflags);
 			break;
 
+		case T_TidRangeScan:
+			result = (PlanState *) ExecInitTidRangeScan((TidRangeScan *) node,
+														estate, eflags);
+			break;
+
 		case T_SubqueryScan:
 			result = (PlanState *) ExecInitSubqueryScan((SubqueryScan *) node,
 														estate, eflags);
@@ -637,6 +643,10 @@ ExecEndNode(PlanState *node)
 			ExecEndTidScan((TidScanState *) node);
 			break;
 
+		case T_TidRangeScanState:
+			ExecEndTidRangeScan((TidRangeScanState *) node);
+			break;
+
 		case T_SubqueryScanState:
 			ExecEndSubqueryScan((SubqueryScanState *) node);
 			break;
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
new file mode 100644
index 0000000000..da9e4c998b
--- /dev/null
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -0,0 +1,413 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.c
+ *	  Routines to support TID range scans of relations
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeTidrangescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "access/tableam.h"
+#include "catalog/pg_operator.h"
+#include "executor/execdebug.h"
+#include "executor/nodeTidrangescan.h"
+#include "nodes/nodeFuncs.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+
+
+#define IsCTIDVar(node)  \
+	((node) != NULL && \
+	 IsA((node), Var) && \
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
+	 ((Var *) (node))->varlevelsup == 0)
+
+typedef enum
+{
+	TIDEXPR_UPPER_BOUND,
+	TIDEXPR_LOWER_BOUND
+} TidExprType;
+
+/* Upper or lower range bound for scan */
+typedef struct TidOpExpr
+{
+	TidExprType exprtype;		/* type of op; lower or upper */
+	ExprState  *exprstate;		/* ExprState for a TID-yielding subexpr */
+	bool		inclusive;		/* whether op is inclusive */
+} TidOpExpr;
+
+/*
+ * For the given 'expr', build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.
+ */
+static TidOpExpr *
+MakeTidOpExpr(OpExpr *expr, TidRangeScanState *tidstate)
+{
+	Node	   *arg1 = get_leftop((Expr *) expr);
+	Node	   *arg2 = get_rightop((Expr *) expr);
+	ExprState  *exprstate = NULL;
+	bool		invert = false;
+	TidOpExpr  *tidopexpr;
+
+	if (IsCTIDVar(arg1))
+		exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
+	else if (IsCTIDVar(arg2))
+	{
+		exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
+		invert = true;
+	}
+	else
+		elog(ERROR, "could not identify CTID variable");
+
+	tidopexpr = (TidOpExpr *) palloc(sizeof(TidOpExpr));
+	tidopexpr->inclusive = false;	/* for now */
+
+	switch (expr->opno)
+	{
+		case TIDLessEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDLessOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
+			break;
+		case TIDGreaterEqOperator:
+			tidopexpr->inclusive = true;
+			/* fall through */
+		case TIDGreaterOperator:
+			tidopexpr->exprtype = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
+			break;
+		default:
+			elog(ERROR, "could not identify CTID operator");
+	}
+
+	tidopexpr->exprstate = exprstate;
+
+	return tidopexpr;
+}
+
+/*
+ * Extract the qual subexpressions that yield TIDs to search for,
+ * and compile them into ExprStates if they're ordinary expressions.
+ */
+static void
+TidExprListCreate(TidRangeScanState *tidrangestate)
+{
+	TidRangeScan *node = (TidRangeScan *) tidrangestate->ss.ps.plan;
+	List	   *tidexprs = NIL;
+	ListCell   *l;
+
+	foreach(l, node->tidrangequals)
+	{
+		OpExpr	   *opexpr = lfirst(l);
+		TidOpExpr  *tidopexpr;
+
+		if (!IsA(opexpr, OpExpr))
+			elog(ERROR, "could not identify CTID expression");
+
+		tidopexpr = MakeTidOpExpr(opexpr, tidrangestate);
+		tidexprs = lappend(tidexprs, tidopexpr);
+	}
+
+	tidrangestate->trss_tidexprs = tidexprs;
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeEval
+ *
+ *		Compute and set node's block and offset range to scan by evaluating
+ *		the trss_tidexprs.  Returns false if we detect the range cannot
+ *		contain any tuples.  Returns true if it's possible for the range to
+ *		contain tuples.
+ * ----------------------------------------------------------------
+ */
+static bool
+TidRangeEval(TidRangeScanState *node)
+{
+	ExprContext *econtext = node->ss.ps.ps_ExprContext;
+	ItemPointerData lowerBound;
+	ItemPointerData upperBound;
+	ListCell   *l;
+
+	/*
+	 * Set the upper and lower bounds to the absolute limits of the range of
+	 * the ItemPointer type.  Below we'll try to narrow this range on either
+	 * side by looking at the TidOpExprs.
+	 */
+	ItemPointerSet(&lowerBound, 0, 0);
+	ItemPointerSet(&upperBound, InvalidBlockNumber, PG_UINT16_MAX);
+
+	foreach(l, node->trss_tidexprs)
+	{
+		TidOpExpr  *tidopexpr = (TidOpExpr *) lfirst(l);
+		ItemPointer itemptr;
+		bool		isNull;
+
+		/* Evaluate this bound. */
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(tidopexpr->exprstate,
+													  econtext,
+													  &isNull));
+
+		/* If the bound is NULL, *nothing* matches the qual. */
+		if (isNull)
+			return false;
+
+		if (tidopexpr->exprtype == TIDEXPR_LOWER_BOUND)
+		{
+			ItemPointerData lb;
+
+			ItemPointerCopy(itemptr, &lb);
+
+			/*
+			 * Normalize non-inclusive ranges to become inclusive.  The
+			 * resulting ItemPointer here may not be a valid item pointer.
+			 */
+			if (!tidopexpr->inclusive)
+				ItemPointerInc(&lb);
+
+			/* Check if we can narrow the range using this qual */
+			if (ItemPointerCompare(&lb, &lowerBound) > 0)
+				ItemPointerCopy(&lb, &lowerBound);
+		}
+
+		else if (tidopexpr->exprtype == TIDEXPR_UPPER_BOUND)
+		{
+			ItemPointerData ub;
+
+			ItemPointerCopy(itemptr, &ub);
+
+			/*
+			 * Normalize non-inclusive ranges to become inclusive.  The
+			 * resulting ItemPointer here may not be a valid item pointer.
+			 */
+			if (!tidopexpr->inclusive)
+				ItemPointerDec(&ub);
+
+			/* Check if we can narrow the range using this qual */
+			if (ItemPointerCompare(&ub, &upperBound) < 0)
+				ItemPointerCopy(&ub, &upperBound);
+		}
+	}
+
+	ItemPointerCopy(&lowerBound, &node->trss_mintid);
+	ItemPointerCopy(&upperBound, &node->trss_maxtid);
+
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		TidRangeNext
+ *
+ *		Retrieve a tuple from the TidRangeScan node's currentRelation
+ *		using the tids in the TidRangeScanState information.
+ *
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+TidRangeNext(TidRangeScanState *node)
+{
+	TableScanDesc scandesc;
+	EState	   *estate;
+	ScanDirection direction;
+	TupleTableSlot *slot;
+
+	/*
+	 * extract necessary information from tid scan node
+	 */
+	scandesc = node->ss.ss_currentScanDesc;
+	estate = node->ss.ps.state;
+	slot = node->ss.ss_ScanTupleSlot;
+	direction = estate->es_direction;
+
+	if (!node->trss_inScan)
+	{
+		/* First time through, compute TID range to scan */
+		if (!TidRangeEval(node))
+			return NULL;
+
+		if (scandesc == NULL)
+		{
+			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
+												estate->es_snapshot,
+												&node->trss_mintid,
+												&node->trss_maxtid);
+			node->ss.ss_currentScanDesc = scandesc;
+		}
+		else
+		{
+			/* set the new TID range and rescan */
+			table_set_tidrange(scandesc, &node->trss_mintid,
+							   &node->trss_maxtid);
+		}
+
+		node->trss_inScan = true;
+	}
+
+	/* Fetch the next tuple. */
+	if (!table_scan_getnextslot_tidrange(scandesc, direction, slot))
+	{
+		node->trss_inScan = false;
+		ExecClearTuple(slot);
+	}
+
+	return slot;
+}
+
+/*
+ * TidRangeRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecTidRangeScan(node)
+ *
+ *		Scans the relation using tids and returns the next qualifying tuple.
+ *		We call the ExecScan() routine and pass it the appropriate
+ *		access method functions.
+ *
+ *		Conditions:
+ *		  -- the "cursor" maintained by the AMI is positioned at the tuple
+ *			 returned previously.
+ *
+ *		Initial States:
+ *		  -- the relation indicated is opened for TID range scanning.
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecTidRangeScan(PlanState *pstate)
+{
+	TidRangeScanState *node = castNode(TidRangeScanState, pstate);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) TidRangeNext,
+					(ExecScanRecheckMtd) TidRangeRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanTidRangeScan(node)
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanTidRangeScan(TidRangeScanState *node)
+{
+	/* mark scan as not in progress, and tid range list as not computed yet */
+	node->trss_inScan = false;
+
+	/*
+	 * The table_rescan() will be performed when we call table_set_tidrange()
+	 * again.
+	 */
+	ExecScanReScan(&node->ss);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecEndTidRangeScan
+ *
+ *		Releases any storage allocated through C routines.
+ *		Returns nothing.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndTidRangeScan(TidRangeScanState *node)
+{
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+
+	if (scan != NULL)
+		table_endscan(scan);
+
+	/*
+	 * Free the exprcontext
+	 */
+	ExecFreeExprContext(&node->ss.ps);
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitTidRangeScan
+ *
+ *		Initializes the tid range scan's state information, creates
+ *		scan keys, and opens the scan relation.
+ *
+ *		Parameters:
+ *		  node: TidRangeScan node produced by the planner.
+ *		  estate: the execution state initialized in InitPlan.
+ * ----------------------------------------------------------------
+ */
+TidRangeScanState *
+ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
+{
+	TidRangeScanState *tidrangestate;
+	Relation	currentRelation;
+
+	/*
+	 * create state structure
+	 */
+	tidrangestate = makeNode(TidRangeScanState);
+	tidrangestate->ss.ps.plan = (Plan *) node;
+	tidrangestate->ss.ps.state = estate;
+	tidrangestate->ss.ps.ExecProcNode = ExecTidRangeScan;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &tidrangestate->ss.ps);
+
+	/*
+	 * mark scan as not in progress, and TID range as not computed yet
+	 */
+	tidrangestate->trss_inScan = false;
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	tidrangestate->ss.ss_currentRelation = currentRelation;
+	tidrangestate->ss.ss_currentScanDesc = NULL;	/* no table scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &tidrangestate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTypeTL(&tidrangestate->ss.ps);
+	ExecAssignScanProjectionInfo(&tidrangestate->ss);
+
+	/*
+	 * initialize child expressions
+	 */
+	tidrangestate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) tidrangestate);
+
+	TidExprListCreate(tidrangestate);
+
+	/*
+	 * all done.
+	 */
+	return tidrangestate;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 65bbc18ecb..aaba1ec2c4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -585,6 +585,27 @@ _copyTidScan(const TidScan *from)
 	return newnode;
 }
 
+/*
+ * _copyTidRangeScan
+ */
+static TidRangeScan *
+_copyTidRangeScan(const TidRangeScan *from)
+{
+	TidRangeScan *newnode = makeNode(TidRangeScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_NODE_FIELD(tidrangequals);
+
+	return newnode;
+}
+
 /*
  * _copySubqueryScan
  */
@@ -4938,6 +4959,9 @@ copyObjectImpl(const void *from)
 		case T_TidScan:
 			retval = _copyTidScan(from);
 			break;
+		case T_TidRangeScan:
+			retval = _copyTidRangeScan(from);
+			break;
 		case T_SubqueryScan:
 			retval = _copySubqueryScan(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f5dcedf6e8..8fc432bfe1 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -608,6 +608,16 @@ _outTidScan(StringInfo str, const TidScan *node)
 	WRITE_NODE_FIELD(tidquals);
 }
 
+static void
+_outTidRangeScan(StringInfo str, const TidRangeScan *node)
+{
+	WRITE_NODE_TYPE("TIDRANGESCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_NODE_FIELD(tidrangequals);
+}
+
 static void
 _outSubqueryScan(StringInfo str, const SubqueryScan *node)
 {
@@ -2314,6 +2324,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	WRITE_NODE_FIELD(subroot);
 	WRITE_NODE_FIELD(subplan_params);
 	WRITE_INT_FIELD(rel_parallel_workers);
+	WRITE_UINT_FIELD(amflags);
 	WRITE_OID_FIELD(serverid);
 	WRITE_OID_FIELD(userid);
 	WRITE_BOOL_FIELD(useridiscurrent);
@@ -3810,6 +3821,9 @@ outNode(StringInfo str, const void *obj)
 			case T_TidScan:
 				_outTidScan(str, obj);
 				break;
+			case T_TidRangeScan:
+				_outTidRangeScan(str, obj);
+				break;
 			case T_SubqueryScan:
 				_outSubqueryScan(str, obj);
 				break;
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index efb52858c8..4a6c348162 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -374,6 +374,7 @@ RelOptInfo      - a relation or joined relations
   IndexPath     - index scan
   BitmapHeapPath - top of a bitmapped index scan
   TidPath       - scan by CTID
+  TidRangePath  - scan a contiguous range of CTIDs
   SubqueryScanPath - scan a subquery-in-FROM
   ForeignPath   - scan a foreign table, foreign join or foreign upper-relation
   CustomPath    - for custom scan providers
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index aab06c7d21..a25b674a19 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1283,6 +1283,101 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	path->total_cost = startup_cost + run_cost;
 }
 
+/*
+ * cost_tidrangescan
+ *	  Determines and sets the costs of scanning a relation using a range of
+ *	  TIDs for 'path'
+ *
+ * 'baserel' is the relation to be scanned
+ * 'tidrangequals' is the list of TID-checkable range quals
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_tidrangescan(Path *path, PlannerInfo *root,
+				  RelOptInfo *baserel, List *tidrangequals,
+				  ParamPathInfo *param_info)
+{
+	Selectivity selectivity;
+	double		pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	QualCost	tid_qual_cost;
+	double		ntuples;
+	double		nseqpages;
+	double		spc_random_page_cost;
+	double		spc_seq_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (param_info)
+		path->rows = param_info->ppi_rows;
+	else
+		path->rows = baserel->rows;
+
+	/* Count how many tuples and pages we expect to scan */
+	selectivity = clauselist_selectivity(root, tidrangequals, baserel->relid,
+										 JOIN_INNER, NULL);
+	pages = ceil(selectivity * baserel->pages);
+
+	if (pages <= 0.0)
+		pages = 1.0;
+
+	/*
+	 * The first page in a range requires a random seek, but each subsequent
+	 * page is just a normal sequential page read. NOTE: it's desirable for
+	 * TID Range Scans to cost more than the equivalent Sequential Scans,
+	 * because Seq Scans have some performance advantages such as scan
+	 * synchronization and parallelizability, and we'd prefer one of them to
+	 * be picked unless a TID Range Scan really is better.
+	 */
+	ntuples = selectivity * baserel->tuples;
+	nseqpages = pages - 1.0;
+
+	if (!enable_tidscan)
+		startup_cost += disable_cost;
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrieved tuple.
+	 */
+	cost_qual_eval(&tid_qual_cost, tidrangequals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/* disk costs; 1 random page and the remainder as seq pages */
+	run_cost += spc_random_page_cost + spc_seq_page_cost * nseqpages;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+	/*
+	 * XXX currently we assume TID quals are a subset of qpquals at this
+	 * point; they will be removed (if possible) when we create the plan, so
+	 * we subtract their cost from the total qpqual cost.  (If the TID quals
+	 * can't be removed, this is a mistake and we're going to underestimate
+	 * the CPU cost a bit.)
+	 */
+	startup_cost += qpqual_cost.startup + tid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		tid_qual_cost.per_tuple;
+	run_cost += cpu_per_tuple * ntuples;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->pathtarget->cost.startup;
+	run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+	path->startup_cost = startup_cost;
+	path->total_cost = startup_cost + run_cost;
+}
+
 /*
  * cost_subqueryscan
  *	  Determines and returns the cost of scanning a subquery RTE.
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 0845b460e2..41d86e42e0 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -2,9 +2,9 @@
  *
  * tidpath.c
  *	  Routines to determine which TID conditions are usable for scanning
- *	  a given relation, and create TidPaths accordingly.
+ *	  a given relation, and create TidPaths and TidRangePaths accordingly.
  *
- * What we are looking for here is WHERE conditions of the form
+ * For TidPaths, we look for WHERE conditions of the form
  * "CTID = pseudoconstant", which can be implemented by just fetching
  * the tuple directly via heap_fetch().  We can also handle OR'd conditions
  * such as (CTID = const1) OR (CTID = const2), as well as ScalarArrayOpExpr
@@ -23,6 +23,9 @@
  * a function, but in practice it works better to keep the special node
  * representation all the way through to execution.
  *
+ * Additionally, TidRangePaths may be created for conditions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=, and
+ * AND-clauses composed of such conditions.
  *
  * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -63,14 +66,14 @@ IsCTIDVar(Var *var, RelOptInfo *rel)
 
 /*
  * Check to see if a RestrictInfo is of the form
- *		CTID = pseudoconstant
+ *		CTID OP pseudoconstant
  * or
- *		pseudoconstant = CTID
- * where the CTID Var belongs to relation "rel", and nothing on the
- * other side of the clause does.
+ *		pseudoconstant OP CTID
+ * where OP is a binary operation, the CTID Var belongs to relation "rel",
+ * and nothing on the other side of the clause does.
  */
 static bool
-IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+IsBinaryTidClause(RestrictInfo *rinfo, RelOptInfo *rel)
 {
 	OpExpr	   *node;
 	Node	   *arg1,
@@ -83,10 +86,9 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 		return false;
 	node = (OpExpr *) rinfo->clause;
 
-	/* Operator must be tideq */
-	if (node->opno != TIDEqualOperator)
+	/* OpExpr must have two arguments */
+	if (list_length(node->args) != 2)
 		return false;
-	Assert(list_length(node->args) == 2);
 	arg1 = linitial(node->args);
 	arg2 = lsecond(node->args);
 
@@ -116,6 +118,50 @@ IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
 	return true;				/* success */
 }
 
+/*
+ * Check to see if a RestrictInfo is of the form
+ *		CTID = pseudoconstant
+ * or
+ *		pseudoconstant = CTID
+ * where the CTID Var belongs to relation "rel", and nothing on the
+ * other side of the clause does.
+ */
+static bool
+IsTidEqualClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	if (!IsBinaryTidClause(rinfo, rel))
+		return false;
+
+	if (((OpExpr *) rinfo->clause)->opno == TIDEqualOperator)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check to see if a RestrictInfo is of the form
+ *		CTID OP pseudoconstant
+ * or
+ *		pseudoconstant OP CTID
+ * where OP is a range operator such as <, <=, >, or >=, the CTID Var belongs
+ * to relation "rel", and nothing on the other side of the clause does.
+ */
+static bool
+IsTidRangeClause(RestrictInfo *rinfo, RelOptInfo *rel)
+{
+	Oid			opno;
+
+	if (!IsBinaryTidClause(rinfo, rel))
+		return false;
+	opno = ((OpExpr *) rinfo->clause)->opno;
+
+	if (opno == TIDLessOperator || opno == TIDLessEqOperator ||
+		opno == TIDGreaterOperator || opno == TIDGreaterEqOperator)
+		return true;
+
+	return false;
+}
+
 /*
  * Check to see if a RestrictInfo is of the form
  *		CTID = ANY (pseudoconstant_array)
@@ -222,7 +268,7 @@ TidQualFromRestrictInfo(PlannerInfo *root, RestrictInfo *rinfo, RelOptInfo *rel)
  *
  * Returns a List of CTID qual RestrictInfos for the specified rel (with
  * implicit OR semantics across the list), or NIL if there are no usable
- * conditions.
+ * equality conditions.
  *
  * This function is just concerned with handling AND/OR recursion.
  */
@@ -301,6 +347,34 @@ TidQualFromRestrictInfoList(PlannerInfo *root, List *rlist, RelOptInfo *rel)
 	return rlst;
 }
 
+/*
+ * Extract a set of CTID range conditions from implicit-AND List of RestrictInfos
+ *
+ * Returns a List of CTID range qual RestrictInfos for the specified rel
+ * (with implicit AND semantics across the list), or NIL if there are no
+ * usable range conditions or if the rel's table AM does not support TID range
+ * scans.
+ */
+static List *
+TidRangeQualFromRestrictInfoList(List *rlist, RelOptInfo *rel)
+{
+	List	   *rlst = NIL;
+	ListCell   *l;
+
+	if ((rel->amflags & AMFLAG_HAS_TID_RANGE) == 0)
+		return NIL;
+
+	foreach(l, rlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (IsTidRangeClause(rinfo, rel))
+			rlst = lappend(rlst, rinfo);
+	}
+
+	return rlst;
+}
+
 /*
  * Given a list of join clauses involving our rel, create a parameterized
  * TidPath for each one that is a suitable TidEqual clause.
@@ -385,6 +459,7 @@ void
 create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 {
 	List	   *tidquals;
+	List	   *tidrangequals;
 
 	/*
 	 * If any suitable quals exist in the rel's baserestrict list, generate a
@@ -404,6 +479,26 @@ create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel)
 												   required_outer));
 	}
 
+	/*
+	 * If there are range quals in the baserestrict list, generate a
+	 * TidRangePath.
+	 */
+	tidrangequals = TidRangeQualFromRestrictInfoList(rel->baserestrictinfo,
+													 rel);
+
+	if (tidrangequals)
+	{
+		/*
+		 * This path uses no join clauses, but it could still have required
+		 * parameterization due to LATERAL refs in its tlist.
+		 */
+		Relids		required_outer = rel->lateral_relids;
+
+		add_path(rel, (Path *) create_tidrangescan_path(root, rel,
+														tidrangequals,
+														required_outer));
+	}
+
 	/*
 	 * Try to generate parameterized TidPaths using equality clauses extracted
 	 * from EquivalenceClasses.  (This is important since simple "t1.ctid =
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 6c8305c977..906cab7053 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -129,6 +129,10 @@ static Plan *create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
 static void bitmap_subplan_mark_shared(Plan *plan);
 static TidScan *create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 									List *tlist, List *scan_clauses);
+static TidRangeScan *create_tidrangescan_plan(PlannerInfo *root,
+											  TidRangePath *best_path,
+											  List *tlist,
+											  List *scan_clauses);
 static SubqueryScan *create_subqueryscan_plan(PlannerInfo *root,
 											  SubqueryScanPath *best_path,
 											  List *tlist, List *scan_clauses);
@@ -193,6 +197,8 @@ static BitmapHeapScan *make_bitmap_heapscan(List *qptlist,
 											Index scanrelid);
 static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid,
 							 List *tidquals);
+static TidRangeScan *make_tidrangescan(List *qptlist, List *qpqual,
+									   Index scanrelid, List *tidrangequals);
 static SubqueryScan *make_subqueryscan(List *qptlist,
 									   List *qpqual,
 									   Index scanrelid,
@@ -384,6 +390,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
 		case T_TidScan:
+		case T_TidRangeScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -679,6 +686,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												scan_clauses);
 			break;
 
+		case T_TidRangeScan:
+			plan = (Plan *) create_tidrangescan_plan(root,
+													 (TidRangePath *) best_path,
+													 tlist,
+													 scan_clauses);
+			break;
+
 		case T_SubqueryScan:
 			plan = (Plan *) create_subqueryscan_plan(root,
 													 (SubqueryScanPath *) best_path,
@@ -3436,6 +3450,71 @@ create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_tidrangescan_plan
+ *	 Returns a tidrangescan plan for the base relation scanned by 'best_path'
+ *	 with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static TidRangeScan *
+create_tidrangescan_plan(PlannerInfo *root, TidRangePath *best_path,
+						 List *tlist, List *scan_clauses)
+{
+	TidRangeScan *scan_plan;
+	Index		scan_relid = best_path->path.parent->relid;
+	List	   *tidrangequals = best_path->tidrangequals;
+
+	/* it should be a base rel... */
+	Assert(scan_relid > 0);
+	Assert(best_path->path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * The qpqual list must contain all restrictions not enforced by the
+	 * tidrangequals list.  tidrangequals has AND semantics, so we can simply
+	 * remove any qual that appears in it.
+	 */
+	{
+		List	   *qpqual = NIL;
+		ListCell   *l;
+
+		foreach(l, scan_clauses)
+		{
+			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+			if (rinfo->pseudoconstant)
+				continue;		/* we may drop pseudoconstants here */
+			if (list_member_ptr(tidrangequals, rinfo))
+				continue;		/* simple duplicate */
+			qpqual = lappend(qpqual, rinfo);
+		}
+		scan_clauses = qpqual;
+	}
+
+	/* Sort clauses into best execution order */
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+
+	/* Reduce RestrictInfo lists to bare expressions; ignore pseudoconstants */
+	tidrangequals = extract_actual_clauses(tidrangequals, false);
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/* Replace any outer-relation variables with nestloop params */
+	if (best_path->path.param_info)
+	{
+		tidrangequals = (List *)
+			replace_nestloop_params(root, (Node *) tidrangequals);
+		scan_clauses = (List *)
+			replace_nestloop_params(root, (Node *) scan_clauses);
+	}
+
+	scan_plan = make_tidrangescan(tlist,
+								  scan_clauses,
+								  scan_relid,
+								  tidrangequals);
+
+	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
+
+	return scan_plan;
+}
+
 /*
  * create_subqueryscan_plan
  *	 Returns a subqueryscan plan for the base relation scanned by 'best_path'
@@ -5369,6 +5448,25 @@ make_tidscan(List *qptlist,
 	return node;
 }
 
+static TidRangeScan *
+make_tidrangescan(List *qptlist,
+				  List *qpqual,
+				  Index scanrelid,
+				  List *tidrangequals)
+{
+	TidRangeScan *node = makeNode(TidRangeScan);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->tidrangequals = tidrangequals;
+
+	return node;
+}
+
 static SubqueryScan *
 make_subqueryscan(List *qptlist,
 				  List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index c3c36be13e..42f088ad71 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -619,6 +619,22 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 								  rtoffset, 1);
 			}
 			break;
+		case T_TidRangeScan:
+			{
+				TidRangeScan *splan = (TidRangeScan *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist,
+								  rtoffset, NUM_EXEC_TLIST(plan));
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+				splan->tidrangequals =
+					fix_scan_list(root, splan->tidrangequals,
+								  rtoffset, 1);
+			}
+			break;
 		case T_SubqueryScan:
 			/* Needs special treatment, see comments below */
 			return set_subqueryscan_references(root,
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 54ef61bfb3..f3e46e0959 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2367,6 +2367,12 @@ finalize_plan(PlannerInfo *root, Plan *plan,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_TidRangeScan:
+			finalize_primnode((Node *) ((TidRangeScan *) plan)->tidrangequals,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			break;
+
 		case T_SubqueryScan:
 			{
 				SubqueryScan *sscan = (SubqueryScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 9be0c4a6af..6a66e23351 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1203,6 +1203,35 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 	return pathnode;
 }
 
+/*
+ * create_tidscan_path
+ *	  Creates a path corresponding to a scan by a range of TIDs, returning
+ *	  the pathnode.
+ */
+TidRangePath *
+create_tidrangescan_path(PlannerInfo *root, RelOptInfo *rel,
+						 List *tidrangequals, Relids required_outer)
+{
+	TidRangePath *pathnode = makeNode(TidRangePath);
+
+	pathnode->path.pathtype = T_TidRangeScan;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = rel->reltarget;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_workers = 0;
+	pathnode->path.pathkeys = NIL;	/* always unordered */
+
+	pathnode->tidrangequals = tidrangequals;
+
+	cost_tidrangescan(&pathnode->path, root, rel, tidrangequals,
+					  pathnode->path.param_info);
+
+	return pathnode;
+}
+
 /*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 177e6e336a..c5947fa418 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -467,6 +467,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	/* Collect info about relation's foreign keys, if relevant */
 	get_relation_foreign_keys(root, rel, relation, inhparent);
 
+	/* Collect info about functions implemented by the rel's table AM. */
+	if (relation->rd_tableam &&
+		relation->rd_tableam->scan_set_tidrange != NULL &&
+		relation->rd_tableam->scan_getnextslot_tidrange != NULL)
+		rel->amflags |= AMFLAG_HAS_TID_RANGE;
+
 	/*
 	 * Collect info about relation's partitioning scheme, if any. Only
 	 * inheritance parents may be partitioned.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 731ff708b9..345c877aeb 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -234,6 +234,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->subroot = NULL;
 	rel->subplan_params = NIL;
 	rel->rel_parallel_workers = -1; /* set up in get_relation_info */
+	rel->amflags = 0;
 	rel->serverid = InvalidOid;
 	rel->userid = rte->checkAsUser;
 	rel->useridiscurrent = false;
@@ -646,6 +647,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->subroot = NULL;
 	joinrel->subplan_params = NIL;
 	joinrel->rel_parallel_workers = -1;
+	joinrel->amflags = 0;
 	joinrel->serverid = InvalidOid;
 	joinrel->userid = InvalidOid;
 	joinrel->useridiscurrent = false;
@@ -826,6 +828,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	joinrel->eclass_indexes = NULL;
 	joinrel->subroot = NULL;
 	joinrel->subplan_params = NIL;
+	joinrel->amflags = 0;
 	joinrel->serverid = InvalidOid;
 	joinrel->userid = InvalidOid;
 	joinrel->useridiscurrent = false;
diff --git a/src/backend/storage/page/itemptr.c b/src/backend/storage/page/itemptr.c
index 55759c383b..4e3644e3ab 100644
--- a/src/backend/storage/page/itemptr.c
+++ b/src/backend/storage/page/itemptr.c
@@ -71,3 +71,61 @@ ItemPointerCompare(ItemPointer arg1, ItemPointer arg2)
 	else
 		return 0;
 }
+
+/*
+ * ItemPointerInc
+ *		Increment 'pointer' by 1 only paying attention to the ItemPointer's
+ *		type's range limits and not MaxOffsetNumber and FirstOffsetNumber.
+ *		This may result in 'pointer' becoming !OffsetNumberIsValid.
+ *
+ * If the pointer is already the maximum possible values permitted by the
+ * range of the ItemPointer's types, then do nothing.
+ */
+void
+ItemPointerInc(ItemPointer pointer)
+{
+	BlockNumber blk = ItemPointerGetBlockNumberNoCheck(pointer);
+	OffsetNumber off = ItemPointerGetOffsetNumberNoCheck(pointer);
+
+	if (off == PG_UINT16_MAX)
+	{
+		if (blk != InvalidBlockNumber)
+		{
+			off = 0;
+			blk++;
+		}
+	}
+	else
+		off++;
+
+	ItemPointerSet(pointer, blk, off);
+}
+
+/*
+ * ItemPointerDec
+ *		Decrement 'pointer' by 1 only paying attention to the ItemPointer's
+ *		type's range limits and not MaxOffsetNumber and FirstOffsetNumber.
+ *		This may result in 'pointer' becoming !OffsetNumberIsValid.
+ *
+ * If the pointer is already the minimum possible values permitted by the
+ * range of the ItemPointer's types, then do nothing.
+ */
+void
+ItemPointerDec(ItemPointer pointer)
+{
+	BlockNumber blk = ItemPointerGetBlockNumberNoCheck(pointer);
+	OffsetNumber off = ItemPointerGetOffsetNumberNoCheck(pointer);
+
+	if (off == 0)
+	{
+		if (blk != 0)
+		{
+			off = PG_UINT16_MAX;
+			blk--;
+		}
+	}
+	else
+		off--;
+
+	ItemPointerSet(pointer, blk, off);
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index d96a47b1ce..50e6158537 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -121,7 +121,11 @@ extern void heap_endscan(TableScanDesc scan);
 extern HeapTuple heap_getnext(TableScanDesc scan, ScanDirection direction);
 extern bool heap_getnextslot(TableScanDesc sscan,
 							 ScanDirection direction, struct TupleTableSlot *slot);
-
+extern void heap_set_tidrange(TableScanDesc sscan, ItemPointer mintid,
+							  ItemPointer maxtid);
+extern bool heap_getnextslot_tidrange(TableScanDesc sscan,
+									  ScanDirection direction,
+									  TupleTableSlot *slot);
 extern bool heap_fetch(Relation relation, Snapshot snapshot,
 					   HeapTuple tuple, Buffer *userbuf);
 extern bool heap_hot_search_buffer(ItemPointer tid, Relation relation,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 005f3fdd2b..0ef6d8edf7 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -36,6 +36,10 @@ typedef struct TableScanDescData
 	int			rs_nkeys;		/* number of scan keys */
 	struct ScanKeyData *rs_key; /* array of scan key descriptors */
 
+	/* Range of ItemPointers for table_scan_getnextslot_tidrange() to scan. */
+	ItemPointerData rs_mintid;
+	ItemPointerData rs_maxtid;
+
 	/*
 	 * Information about type and behaviour of the scan, a bitmask of members
 	 * of the ScanOptions enum (see tableam.h).
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 33bffb6815..ef2873311e 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -49,18 +49,19 @@ typedef enum ScanOptions
 	SO_TYPE_BITMAPSCAN = 1 << 1,
 	SO_TYPE_SAMPLESCAN = 1 << 2,
 	SO_TYPE_TIDSCAN = 1 << 3,
-	SO_TYPE_ANALYZE = 1 << 4,
+	SO_TYPE_TIDRANGESCAN = 1 << 4,
+	SO_TYPE_ANALYZE = 1 << 5,
 
 	/* several of SO_ALLOW_* may be specified */
 	/* allow or disallow use of access strategy */
-	SO_ALLOW_STRAT = 1 << 5,
+	SO_ALLOW_STRAT = 1 << 6,
 	/* report location to syncscan logic? */
-	SO_ALLOW_SYNC = 1 << 6,
+	SO_ALLOW_SYNC = 1 << 7,
 	/* verify visibility page-at-a-time? */
-	SO_ALLOW_PAGEMODE = 1 << 7,
+	SO_ALLOW_PAGEMODE = 1 << 8,
 
 	/* unregister snapshot at scan end? */
-	SO_TEMP_SNAPSHOT = 1 << 8
+	SO_TEMP_SNAPSHOT = 1 << 9
 } ScanOptions;
 
 /*
@@ -325,6 +326,30 @@ typedef struct TableAmRoutine
 									 ScanDirection direction,
 									 TupleTableSlot *slot);
 
+	/*-----------
+	 * Optional functions to provide scanning for ranges of ItemPointers.
+	 * Implementations must either provide both of these functions, or neither
+	 * of them.
+	 *
+	 * Implementations of scan_set_tidrange must themselves handle
+	 * ItemPointers of any value. i.e, they must handle each of the following:
+	 *
+	 * 1) mintid or maxtid is beyond the end of the table; and
+	 * 2) mintid is above maxtid; and
+	 * 3) item offset for mintid or maxtid is beyond the maximum offset
+	 * allowed by the AM.
+	 */
+	void		(*scan_set_tidrange) (TableScanDesc scan,
+									  ItemPointer mintid,
+									  ItemPointer maxtid);
+
+	/*
+	 * Return next tuple from `scan` that's in the range of TIDs defined by
+	 * scan_set_tidrange.
+	 */
+	bool		(*scan_getnextslot_tidrange) (TableScanDesc scan,
+											  ScanDirection direction,
+											  TupleTableSlot *slot);
 
 	/* ------------------------------------------------------------------------
 	 * Parallel table scan related functions.
@@ -1015,6 +1040,64 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 	return sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, slot);
 }
 
+/* ----------------------------------------------------------------------------
+ * TID Range scanning related functions.
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * table_beginscan_tidrange is the entry point for setting up a TableScanDesc
+ * for a TID range scan.
+ */
+static inline TableScanDesc
+table_beginscan_tidrange(Relation rel, Snapshot snapshot,
+						 ItemPointer mintid,
+						 ItemPointer maxtid)
+{
+	TableScanDesc sscan;
+	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+
+	sscan = rel->rd_tableam->scan_begin(rel, snapshot, 0, NULL, NULL, flags);
+
+	/* Set the range of TIDs to scan */
+	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
+
+	return sscan;
+}
+
+/*
+ * table_set_tidrange resets the scan position and sets the minimum and
+ * maximum TID range to scan for a TableScanDesc created by
+ * table_beginscan_tidrange.
+ */
+static inline void
+table_set_tidrange(TableScanDesc sscan, ItemPointer mintid,
+				   ItemPointer maxtid)
+{
+	/* Ensure table_beginscan_tidrange() was used. */
+	Assert((sscan->rs_flags & SO_TYPE_TIDRANGESCAN) != 0);
+
+	sscan->rs_rd->rd_tableam->scan_rescan(sscan, NULL, false, false, false, false);
+	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
+}
+
+/*
+ * Fetch the next tuple from `sscan` for a TID range scan created by
+ * table_beginscan_tidrange().  Stores the tuple in `slot` and returns true,
+ * or returns false if no more tuples exist in the range.
+ */
+static inline bool
+table_scan_getnextslot_tidrange(TableScanDesc sscan, ScanDirection direction,
+								TupleTableSlot *slot)
+{
+	/* Ensure the TID range was properly set */
+	Assert((sscan->rs_flags & SO_TYPE_TIDRANGESCAN) != 0);
+
+	return sscan->rs_rd->rd_tableam->scan_getnextslot_tidrange(sscan,
+															   direction,
+															   slot);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Parallel table scan related functions.
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 0d4eac8f96..85395a81ee 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -237,15 +237,15 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git a/src/include/executor/nodeTidrangescan.h b/src/include/executor/nodeTidrangescan.h
new file mode 100644
index 0000000000..e53783a3bf
--- /dev/null
+++ b/src/include/executor/nodeTidrangescan.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeTidrangescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * src/include/executor/nodeTidrangescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODETIDRANGESCAN_H
+#define NODETIDRANGESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern TidRangeScanState *ExecInitTidRangeScan(TidRangeScan *node,
+											   EState *estate, int eflags);
+extern void ExecEndTidRangeScan(TidRangeScanState *node);
+extern void ExecReScanTidRangeScan(TidRangeScanState *node);
+
+#endif							/* NODETIDRANGESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b6a88ff76b..3234b87ea2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1621,6 +1621,24 @@ typedef struct TidScanState
 	HeapTupleData tss_htup;
 } TidScanState;
 
+/* ----------------
+ *	 TidRangeScanState information
+ *
+ *		trss_tidexprs		list of TidOpExpr structs (see nodeTidrangescan.c)
+ *		trss_mintid			the lowest TID in the scan range
+ *		trss_maxtid			the highest TID in the scan range
+ *		trss_inScan			is a scan currently in progress?
+ * ----------------
+ */
+typedef struct TidRangeScanState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	List	   *trss_tidexprs;
+	ItemPointerData trss_mintid;
+	ItemPointerData trss_maxtid;
+	bool		trss_inScan;
+} TidRangeScanState;
+
 /* ----------------
  *	 SubqueryScanState information
  *
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 40ae489c23..e22df890ef 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -59,6 +59,7 @@ typedef enum NodeTag
 	T_BitmapIndexScan,
 	T_BitmapHeapScan,
 	T_TidScan,
+	T_TidRangeScan,
 	T_SubqueryScan,
 	T_FunctionScan,
 	T_ValuesScan,
@@ -116,6 +117,7 @@ typedef enum NodeTag
 	T_BitmapIndexScanState,
 	T_BitmapHeapScanState,
 	T_TidScanState,
+	T_TidRangeScanState,
 	T_SubqueryScanState,
 	T_FunctionScanState,
 	T_TableFuncScanState,
@@ -229,6 +231,7 @@ typedef enum NodeTag
 	T_BitmapAndPath,
 	T_BitmapOrPath,
 	T_TidPath,
+	T_TidRangePath,
 	T_SubqueryScanPath,
 	T_ForeignPath,
 	T_CustomPath,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ec93e648c..b8a6e0fc9f 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -621,6 +621,10 @@ typedef struct PartitionSchemeData *PartitionScheme;
  * to simplify matching join clauses to those lists.
  *----------
  */
+
+/* Bitmask of flags supported by table AMs */
+#define AMFLAG_HAS_TID_RANGE (1 << 0)
+
 typedef enum RelOptKind
 {
 	RELOPT_BASEREL,
@@ -710,6 +714,8 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	uint32		amflags;		/* Bitmask of optional features supported by
+								 * the table AM */
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
@@ -1323,6 +1329,18 @@ typedef struct TidPath
 	List	   *tidquals;		/* qual(s) involving CTID = something */
 } TidPath;
 
+/*
+ * TidRangePath represents a scan by a continguous range of TIDs
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ */
+typedef struct TidRangePath
+{
+	Path		path;
+	List	   *tidrangequals;
+} TidRangePath;
+
 /*
  * SubqueryScanPath represents a scan of an unflattened subquery-in-FROM
  *
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 43160439f0..6e62104d0b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -485,6 +485,19 @@ typedef struct TidScan
 	List	   *tidquals;		/* qual(s) involving CTID = something */
 } TidScan;
 
+/* ----------------
+ *		tid range scan node
+ *
+ * tidrangequals is an implicitly AND'ed list of qual expressions of the form
+ * "CTID relop pseudoconstant", where relop is one of >,>=,<,<=.
+ * ----------------
+ */
+typedef struct TidRangeScan
+{
+	Scan		scan;
+	List	   *tidrangequals;	/* qual(s) involving CTID op something */
+} TidRangeScan;
+
 /* ----------------
  *		subquery scan node
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ed2e4af4be..1be93be098 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -83,6 +83,9 @@ extern void cost_bitmap_or_node(BitmapOrPath *path, PlannerInfo *root);
 extern void cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec);
 extern void cost_tidscan(Path *path, PlannerInfo *root,
 						 RelOptInfo *baserel, List *tidquals, ParamPathInfo *param_info);
+extern void cost_tidrangescan(Path *path, PlannerInfo *root,
+							  RelOptInfo *baserel, List *tidrangequals,
+							  ParamPathInfo *param_info);
 extern void cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
 							  RelOptInfo *baserel, ParamPathInfo *param_info);
 extern void cost_functionscan(Path *path, PlannerInfo *root,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 8dfc36a4e1..54f4b782fc 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,6 +63,10 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 										   List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 									List *tidquals, Relids required_outer);
+extern TidRangePath *create_tidrangescan_path(PlannerInfo *root,
+											  RelOptInfo *rel,
+											  List *tidrangequals,
+											  Relids required_outer);
 extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel,
 									  List *subpaths, List *partial_subpaths,
 									  List *pathkeys, Relids required_outer,
diff --git a/src/include/storage/itemptr.h b/src/include/storage/itemptr.h
index 0e6990140b..cd4b8fbacb 100644
--- a/src/include/storage/itemptr.h
+++ b/src/include/storage/itemptr.h
@@ -202,5 +202,7 @@ typedef ItemPointerData *ItemPointer;
 
 extern bool ItemPointerEquals(ItemPointer pointer1, ItemPointer pointer2);
 extern int32 ItemPointerCompare(ItemPointer arg1, ItemPointer arg2);
+extern void ItemPointerInc(ItemPointer pointer);
+extern void ItemPointerDec(ItemPointer pointer);
 
 #endif							/* ITEMPTR_H */
diff --git a/src/test/regress/expected/tidrangescan.out b/src/test/regress/expected/tidrangescan.out
new file mode 100644
index 0000000000..0384304c7f
--- /dev/null
+++ b/src/test/regress/expected/tidrangescan.out
@@ -0,0 +1,302 @@
+-- tests for tidrangescans
+SET enable_seqscan TO off;
+CREATE TABLE tidrangescan(id integer, data text);
+-- insert enough tuples to fill at least two pages
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,200) AS s(i);
+-- remove all tuples after the 10th tuple on each page.  Trying to ensure
+-- we get the same layout with all CPU architectures and smaller than standard
+-- page sizes.
+DELETE FROM tidrangescan
+WHERE substring(ctid::text FROM ',(\d+)\)')::integer > 10 OR substring(ctid::text FROM '\((\d+),')::integer > 2;
+VACUUM tidrangescan;
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid <= '(1,5)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+  ctid  
+--------
+ (0,1)
+ (0,2)
+ (0,3)
+ (0,4)
+ (0,5)
+ (0,6)
+ (0,7)
+ (0,8)
+ (0,9)
+ (0,10)
+ (1,1)
+ (1,2)
+ (1,3)
+ (1,4)
+ (1,5)
+(15 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(0,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid > '(2,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+  ctid  
+--------
+ (2,9)
+ (2,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ('(2,8)'::tid < ctid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+  ctid  
+--------
+ (2,9)
+ (2,10)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+             QUERY PLAN             
+------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(2,8)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+  ctid  
+--------
+ (2,8)
+ (2,9)
+ (2,10)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid >= '(100,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+ ctid 
+------
+(0 rows)
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: ((ctid > '(1,4)'::tid) AND ('(1,7)'::tid >= ctid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+ ctid  
+-------
+ (1,5)
+ (1,6)
+ (1,7)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (('(1,7)'::tid >= ctid) AND (ctid > '(1,4)'::tid))
+(2 rows)
+
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+ ctid  
+-------
+ (1,5)
+ (1,6)
+ (1,7)
+(3 rows)
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan WHERE ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)' LIMIT 1;
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(4294967295,65535)';
+ ctid 
+------
+(0 rows)
+
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+ ctid 
+------
+(0 rows)
+
+-- NULLs in the range cannot return tuples
+SELECT ctid FROM tidrangescan WHERE ctid >= (SELECT NULL::tid);
+ ctid 
+------
+(0 rows)
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+ ctid 
+------
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+              QUERY PLAN              
+--------------------------------------
+ Tid Range Scan on tidrangescan_empty
+   TID Cond: (ctid > '(9,0)'::tid)
+(2 rows)
+
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+ ctid 
+------
+(0 rows)
+
+-- rescans
+EXPLAIN (COSTS OFF)
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+                  QUERY PLAN                   
+-----------------------------------------------
+ Nested Loop
+   ->  Tid Range Scan on tidrangescan t
+         TID Cond: (ctid < '(1,0)'::tid)
+   ->  Aggregate
+         ->  Tid Range Scan on tidrangescan t2
+               TID Cond: (ctid <= t.ctid)
+(6 rows)
+
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+  ctid  | c  
+--------+----
+ (0,1)  |  1
+ (0,2)  |  2
+ (0,3)  |  3
+ (0,4)  |  4
+ (0,5)  |  5
+ (0,6)  |  6
+ (0,7)  |  7
+ (0,8)  |  8
+ (0,9)  |  9
+ (0,10) | 10
+(10 rows)
+
+-- cursors
+-- Ensure we get a TID Range scan without a Materialize node.
+EXPLAIN (COSTS OFF)
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+            QUERY PLAN             
+-----------------------------------
+ Tid Range Scan on tidrangescan
+   TID Cond: (ctid < '(1,0)'::tid)
+(2 rows)
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH NEXT c;
+ ctid  
+-------
+ (0,2)
+(1 row)
+
+FETCH PRIOR c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH FIRST c;
+ ctid  
+-------
+ (0,1)
+(1 row)
+
+FETCH LAST c;
+  ctid  
+--------
+ (0,10)
+(1 row)
+
+COMMIT;
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
+RESET enable_seqscan;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 12bb67e491..c77b0d7342 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -80,7 +80,7 @@ test: brin gin gist spgist privileges init_privs security_label collate matview
 # ----------
 # Another group of parallel tests
 # ----------
-test: create_table_like alter_generic alter_operator misc async dbsize misc_functions sysviews tsrf tid tidscan collate.icu.utf8 incremental_sort
+test: create_table_like alter_generic alter_operator misc async dbsize misc_functions sysviews tsrf tid tidscan tidrangescan collate.icu.utf8 incremental_sort
 
 # rules cannot run concurrently with any test that creates
 # a view or rule in the public schema
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 59b416fd80..0264a97324 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -138,6 +138,7 @@ test: sysviews
 test: tsrf
 test: tid
 test: tidscan
+test: tidrangescan
 test: collate.icu.utf8
 test: rules
 test: psql
diff --git a/src/test/regress/sql/tidrangescan.sql b/src/test/regress/sql/tidrangescan.sql
new file mode 100644
index 0000000000..2da35807ff
--- /dev/null
+++ b/src/test/regress/sql/tidrangescan.sql
@@ -0,0 +1,104 @@
+-- tests for tidrangescans
+
+SET enable_seqscan TO off;
+CREATE TABLE tidrangescan(id integer, data text);
+
+-- insert enough tuples to fill at least two pages
+INSERT INTO tidrangescan SELECT i,repeat('x', 100) FROM generate_series(1,200) AS s(i);
+
+-- remove all tuples after the 10th tuple on each page.  Trying to ensure
+-- we get the same layout with all CPU architectures and smaller than standard
+-- page sizes.
+DELETE FROM tidrangescan
+WHERE substring(ctid::text FROM ',(\d+)\)')::integer > 10 OR substring(ctid::text FROM '\((\d+),')::integer > 2;
+VACUUM tidrangescan;
+
+-- range scans with upper bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+SELECT ctid FROM tidrangescan WHERE ctid <= '(1,5)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- range scans with lower bound
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+SELECT ctid FROM tidrangescan WHERE ctid > '(2,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+SELECT ctid FROM tidrangescan WHERE '(2,8)' < ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(2,8)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+SELECT ctid FROM tidrangescan WHERE ctid >= '(100,0)';
+
+-- range scans with both bounds
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+SELECT ctid FROM tidrangescan WHERE ctid > '(1,4)' AND '(1,7)' >= ctid;
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+SELECT ctid FROM tidrangescan WHERE '(1,7)' >= ctid AND ctid > '(1,4)';
+
+-- extreme offsets
+SELECT ctid FROM tidrangescan WHERE ctid > '(0,65535)' AND ctid < '(1,0)' LIMIT 1;
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)' LIMIT 1;
+
+SELECT ctid FROM tidrangescan WHERE ctid > '(4294967295,65535)';
+SELECT ctid FROM tidrangescan WHERE ctid < '(0,0)';
+
+-- NULLs in the range cannot return tuples
+SELECT ctid FROM tidrangescan WHERE ctid >= (SELECT NULL::tid);
+
+-- empty table
+CREATE TABLE tidrangescan_empty(id integer, data text);
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid < '(1, 0)';
+
+EXPLAIN (COSTS OFF)
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+SELECT ctid FROM tidrangescan_empty WHERE ctid > '(9, 0)';
+
+-- rescans
+EXPLAIN (COSTS OFF)
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+
+SELECT t.ctid,t2.c FROM tidrangescan t,
+LATERAL (SELECT count(*) c FROM tidrangescan t2 WHERE t2.ctid <= t.ctid) t2
+WHERE t.ctid < '(1,0)';
+
+-- cursors
+
+-- Ensure we get a TID Range scan without a Materialize node.
+EXPLAIN (COSTS OFF)
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+
+BEGIN;
+DECLARE c SCROLL CURSOR FOR SELECT ctid FROM tidrangescan WHERE ctid < '(1,0)';
+FETCH NEXT c;
+FETCH NEXT c;
+FETCH PRIOR c;
+FETCH FIRST c;
+FETCH LAST c;
+COMMIT;
+
+DROP TABLE tidrangescan;
+DROP TABLE tidrangescan_empty;
+
+RESET enable_seqscan;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bab4f3adb3..c57682ba15 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2530,8 +2530,13 @@ TextPositionState
 TheLexeme
 TheSubstitute
 TidExpr
+TidExprType
 TidHashKey
+TidOpExpr
 TidPath
+TidRangePath
+TidRangeScan
+TidRangeScanState
 TidScan
 TidScanState
 TimeADT
-- 
2.21.0.windows.1

#93

David Rowley

dgrowleyml@gmail.com

almost 5 years ago

In reply to: David Rowley (#92)

Re: Tid scan improvements

On Fri, 19 Feb 2021 at 20:37, David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 18 Feb 2021 at 09:45, David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 17 Feb 2021 at 11:05, Andres Freund <andres@anarazel.de> wrote:

How does this interact with rescans?

We must call table_rescan() before calling table_set_tidrange() again.
That perhaps could be documented better. I'm just unsure if that
should be documented in tableam.h or if it's a restriction that only
needs to exist in heapam.c

I've changed things around so that we no longer explicitly call
table_rescan() in nodeTidrangescan.c. Instead table_set_tidrange()
does a rescan call. I also adjusted the documentation to mention that
changing the tid range starts the scan again. This does mean we'll do
a ->scan_rescan() the first time we do table_set_tidrange(). I'm not
all that sure that matters.

I've pushed this now. I did end up changing the function name in
tableam.h so that we no longer expose the table_set_tidrange().
Instead, the range is set by either table_beginscan_tidrange() or
table_rescan_tidrange(). There's no need to worry about what would
happen if someone were to change the TID range mid-scan.

Apart from that, I adjusted a few comments and changed the regression
tests a little to get rid of the tidrangescan_empty table. This was
created to ensure empty tables work correctly. Instead, I just did
those tests before populating the tidrangescan table. This just makes
the test run a little faster since we're creating and dropping 1 less
table.

David

#94

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 4 years ago

In reply to: David Rowley (#93)

1 attachment(s)

Re: Tid scan improvements

This patch didn't add _outTidRangePath() even though we have outNode()
coverage for most/all path nodes. Was this just forgotten? See
attached patch.

Attachments:

0001-Add-_outTidRangePath.patchtext/plain; charset=UTF-8; name=0001-Add-_outTidRangePath.patch; x-mac-creator=0; x-mac-type=0Download

From 3c696f812d4c6f8c66bc75105c3c1af79c3b2922 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 7 Jun 2021 12:04:49 +0200
Subject: [PATCH] Add _outTidRangePath()

We have outNode() coverage for all path nodes, but this one was
missed when it was added.
---
 src/backend/nodes/outfuncs.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 04696f613c..e32b92e299 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1859,6 +1859,16 @@ _outTidPath(StringInfo str, const TidPath *node)
 	WRITE_NODE_FIELD(tidquals);
 }
 
+static void
+_outTidRangePath(StringInfo str, const TidRangePath *node)
+{
+	WRITE_NODE_TYPE("TIDRANGEPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(tidrangequals);
+}
+
 static void
 _outSubqueryScanPath(StringInfo str, const SubqueryScanPath *node)
 {
@@ -4166,6 +4176,9 @@ outNode(StringInfo str, const void *obj)
 			case T_TidPath:
 				_outTidPath(str, obj);
 				break;
+			case T_TidRangePath:
+				_outTidRangePath(str, obj);
+				break;
 			case T_SubqueryScanPath:
 				_outSubqueryScanPath(str, obj);
 				break;
-- 
2.31.1

#95

Edmund Horner

ejrh00@gmail.com

over 4 years ago

In reply to: Peter Eisentraut (#94)

Re: Tid scan improvements

On Mon, 7 Jun 2021 at 22:11, Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

This patch didn't add _outTidRangePath() even though we have outNode()
coverage for most/all path nodes. Was this just forgotten? See
attached patch.

Yes, it looks like an omission. Thanks for spotting it. Patch looks good
to me.

Edmund

#96

David Rowley

dgrowleyml@gmail.com

over 4 years ago

In reply to: Edmund Horner (#95)

Re: Tid scan improvements

On Mon, 7 Jun 2021 at 23:46, Edmund Horner <ejrh00@gmail.com> wrote:

On Mon, 7 Jun 2021 at 22:11, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

This patch didn't add _outTidRangePath() even though we have outNode()
coverage for most/all path nodes. Was this just forgotten? See
attached patch.

Yes, it looks like an omission. Thanks for spotting it. Patch looks good to me.

Yeah. That was forgotten. Patch also looks fine to me. Do you want
to push it, Peter?

David

#97

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 4 years ago

In reply to: David Rowley (#96)

Re: Tid scan improvements

On 07.06.21 13:50, David Rowley wrote:

On Mon, 7 Jun 2021 at 23:46, Edmund Horner <ejrh00@gmail.com> wrote:

On Mon, 7 Jun 2021 at 22:11, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

This patch didn't add _outTidRangePath() even though we have outNode()
coverage for most/all path nodes. Was this just forgotten? See
attached patch.

Yes, it looks like an omission. Thanks for spotting it. Patch looks good to me.

Yeah. That was forgotten. Patch also looks fine to me. Do you want
to push it, Peter?

done