Qual push down to table AM

Started by Julien Tachoires5 months ago15 messages

julien@tachoires.me

5 months ago

6 attachment(s)

Hi,

Please find attached a patch set proposal intended to implement WHERE
clauses (qual) push down to the underlying table AM during
table/sequential scan execution.

The primary goal of this project is to convert quals to ScanKeys and
pass them to the table AMs. Table AMs are then allowed to apply early
tuple filtering during table (sequential) scans. Applying filtering at
the table storage level is something necessary for non row-oriented
table storage like columnar storage. Index organized table is another
table storage that would need quals push down.

AFAIK, CustomScan is the one and only way to go for having table scan
using quals pushed down, but each table AM must implement its own
mechanism. IMHO, having this feature available in core would help the
development of new table AMs. About Heap, some performance testing
(detailed at the end of this message) shows between 45% and 60%
improvement in seq scan execution time when only one tuple is returned
from the table.

Only a few expressions are supported: OpExpr (<key> <operator> <value>),
ScalarArrayOpExpr (<key> <operator> ANY|ALL(ARRAY[...]), and NullTest.
Row comparison is not yet supported as this part is still not clear to
me. On the right part of the expression, we support: constant, variable,
function call, and subquery (InitPlan only).

In terms of security, we check if the function related to the operator
is not user defined: only functions from the catalog are supported. We
also check that the function is "leakproof".

Pushing down quals does not guaranty to the executor that the tuples
returned during table scan satisfy a qual, as we don't know if the table
AM (potentially implemented via an extension) has applied tuple
filtering. In order to ensure to produce the right response to the where
clause, pushed down quals are executed twice per tuple returned: once by
the table AM, and once by the executor. This produces a performance
regression (15-17%) where almost the entire table is returned (see perf.
test results at the end of the message). This could be optimized by
flagging the tuples filtered by the table AM, this way we could avoid
the re-execution of the pushed down quals.

Details about the patch files

v1-0001-Pass-the-number-of-ScanKeys-to-scan_rescan.patch: This patch
adds the number of ScanKeys passed via scan_rescan() as a new argument.
The number of ScanKeys was only passed to the table AM via begin_scan(),
but not in scan_rescan().

v1-0002-Simple-quals-push-down-to-table-AMs.patch: Core of the feature,
this patch adds qual push down support for OpExpr expressions.

v1-0003-Add-the-table-reloption-quals_push_down.patch: Adds a new
reloption: quals_push_down used to enable/disable qual push down for a
table. Disabled by default.

v1-0004-Add-tests-for-quals-push-down-to-table-AM.patch: Regression
tests.

v1-0005-Push-down-IN-NOT-IN-array-quals-to-table-AMs.patch:
ScalarArrayOpExpr support.

v1-0006-Push-down-IS-IS-NOT-NULL-quals-to-table-AMs.patch: NullTest
support.

Performance testing

Head:
CREATE TABLE t (i INTEGER);

Patch:
CREATE TABLE t (i INTEGER) WITH (quals_push_down = on);

n=1M:
INSERT INTO t SELECT generate_series(1, 1000000);
VACUUM t;

n=10M:
TRUNCATE t;
INSERT INTO t SELECT generate_series(1, 10000000);
VACUUM t;

n=100M:
TRUNCATE t;
INSERT INTO t SELECT generate_series(1, 100000000);
VACUUM t;

Case #1: SELECT COUNT(*) FROM t WHERE i = 50000;

        |       n=1M      |        n=10M      |         n=100M
        +--------+--------+---------+---------+----------+---------
        |  Head  |  Patch |  Head   |  Patch  |  Head    |  Patch
--------+--------+--------+---------+---------+----------+---------
Test #1	| 38.903 | 21.308 | 365.707 | 155.429 | 3939.937 | 1564.182
Test #2	| 39.239 | 21.271 | 364.206 | 153.127 | 3872.370 | 1527.988
Test #3	| 39.015 | 21.958 | 365.434 | 154.498 | 3812.382 | 1525.535
--------+--------+--------+---------+---------+----------+---------

--------+--------+--------+---------+---------+----------+---------
Average | 39.052 | 21.512 | 365.116 | 154.351 | 3874.896 | 1539.235
Std dev | 0.171 | 0.386 | 0.800 | 1.158 | 63.815 | 21.640
--------+--------+--------+---------+---------+----------+---------
Gain | 44.91% | 57.73% | 60.28%

Case #2: SELECT COUNT(*) FROM t WHERE i >= 2;

        |       n=1M      |        n=10M      |         n=100M
        +--------+--------+---------+---------+----------+---------
        |  Head  |  Patch |  Head   |  Patch  |  Head    |  Patch
--------+--------+--------+---------+---------+----------+---------
Test #1 | 68.422 | 81.233 | 674.397 | 778.427 | 6845.165 | 8071.627
Test #2 | 69.237 | 80.868 | 682.976 | 774.417 | 6533.091 | 7668.477
Test #3 | 69.579 | 80.418 | 676.072 | 791.465 | 6917.964 | 7916.182
--------+--------+--------+---------+---------+----------+---------

--------+--------+--------+---------+---------+----------+---------
Average | 69.079 | 80.840 | 677.815 | 781.436 | 6765.407 | 7885.429
Std dev | 0.594 | 0.408 | 4.547 | 8.914 | 204.457 | 203.327
--------+--------+--------+---------+---------+----------+---------
Gain | -17.02% | -15.29% | -16.56%

Thoughts?

Best regards,

--
Julien Tachoires

Attachments:

v1-0001-Pass-the-number-of-ScanKeys-to-scan_rescan.patchtext/x-diff; charset=us-asciiDownload

From 7d33449cc51f2713213093208f98720ef3adf3ad Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julien@tachoires.me>
Date: Mon, 25 Aug 2025 17:01:57 +0200
Subject: [PATCH 1/6] Pass the number of ScanKeys to scan_rescan()

The number of ScanKeys passed to the table AM API routine scan_rescan()
was not specified, forcing the table AM to keep in memory the initial
number of ScanKeys passed via scan_begin(). Currenlty, there isn't any
real use of the ScanKeys during a table scan, so, this is not an issue,
but it could become a blocking point in the future if we want to
implement quals push down - as ScanKeys - to the table AM. Due to
runtime keys evaluation, this number of ScanKeys can vary between the
initial call to scan_begin() and a potential further call
to scan_rescan().

table_rescan() is modified in order to reflect the changes on
scan_rescan().

table_beginscan_parallel() signature is slightly modified in order to
pass eventual ScanKeys and their numbers to scan_begin().

table_rescan_set_params() now takes the number of ScanKeys as a new
argument.
---
 src/backend/access/brin/brin.c            |  3 ++-
 src/backend/access/gin/gininsert.c        |  3 ++-
 src/backend/access/heap/heapam.c          |  2 +-
 src/backend/access/nbtree/nbtsort.c       |  3 ++-
 src/backend/access/table/tableam.c        |  5 +++--
 src/backend/executor/execReplication.c    |  4 ++--
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeSamplescan.c     |  2 +-
 src/backend/executor/nodeSeqscan.c        |  5 +++--
 src/backend/executor/nodeTidscan.c        |  2 +-
 src/include/access/heapam.h               |  5 +++--
 src/include/access/tableam.h              | 24 +++++++++++++----------
 12 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 7ff7467e462..5995bd1243e 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2828,7 +2828,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0, NULL);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e9d4b27427e..deaa42cffa4 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2030,7 +2030,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0, NULL);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7491cc3cb93..a5c74d8948e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1251,7 +1251,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 }
 
 void
-heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
+heap_rescan(TableScanDesc sscan, int nkeys, ScanKey key, bool set_params,
 			bool allow_strat, bool allow_sync, bool allow_pagemode)
 {
 	HeapScanDesc scan = (HeapScanDesc) sscan;
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 8828a7a8f89..d576ba3a762 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1927,7 +1927,8 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared),
+									0, NULL);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..46bed1614f0 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,7 +163,8 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan,
+						 int nkeys, struct ScanKeyData *key)
 {
 	Snapshot	snapshot;
 	uint32		flags = SO_TYPE_SEQSCAN |
@@ -184,7 +185,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		snapshot = SnapshotAny;
 	}
 
-	return relation->rd_tableam->scan_begin(relation, snapshot, 0, NULL,
+	return relation->rd_tableam->scan_begin(relation, snapshot, nkeys, key,
 											pscan, flags);
 }
 
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index b409d4ecbf5..1b0c97243a5 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -388,7 +388,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 retry:
 	found = false;
 
-	table_rescan(scan, NULL);
+	table_rescan(scan, 0, NULL);
 
 	/* Try to find the tuple */
 	while (table_scan_getnextslot(scan, ForwardScanDirection, scanslot))
@@ -604,7 +604,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
 	scanslot = table_slot_create(rel, NULL);
 
-	table_rescan(scan, NULL);
+	table_rescan(scan, 0, NULL);
 
 	/* Try to find the tuple */
 	while (table_scan_getnextslot(scan, ForwardScanDirection, scanslot))
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..fb778e0ae3b 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -239,7 +239,7 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
 			tbm_end_iterate(&scan->st.rs_tbmiterator);
 
 		/* rescan to release any page pin */
-		table_rescan(node->ss.ss_currentScanDesc, NULL);
+		table_rescan(node->ss.ss_currentScanDesc, 0, NULL);
 	}
 
 	/* release bitmaps and buffers if any */
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b3db7548ed..a7e172d83a4 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -301,7 +301,7 @@ tablesample_init(SampleScanState *scanstate)
 	}
 	else
 	{
-		table_rescan_set_params(scanstate->ss.ss_currentScanDesc, NULL,
+		table_rescan_set_params(scanstate->ss.ss_currentScanDesc, 0, NULL,
 								scanstate->use_bulkread,
 								allow_sync,
 								scanstate->use_pagemode);
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..c89aa6c6616 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -326,6 +326,7 @@ ExecReScanSeqScan(SeqScanState *node)
 
 	if (scan != NULL)
 		table_rescan(scan,		/* scan desc */
+					 0,			/* number of scan keys */
 					 NULL);		/* new scan keys */
 
 	ExecScanReScan((ScanState *) node);
@@ -374,7 +375,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0, NULL);
 }
 
 /* ----------------------------------------------------------------
@@ -407,5 +408,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0, NULL);
 }
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 5e56e29a15f..6b37d1fcb74 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -454,7 +454,7 @@ ExecReScanTidScan(TidScanState *node)
 
 	/* not really necessary, but seems good form */
 	if (node->ss.ss_currentScanDesc)
-		table_rescan(node->ss.ss_currentScanDesc, NULL);
+		table_rescan(node->ss.ss_currentScanDesc, 0, NULL);
 
 	ExecScanReScan(&node->ss);
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..252f5e661c1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -293,8 +293,9 @@ extern TableScanDesc heap_beginscan(Relation relation, Snapshot snapshot,
 extern void heap_setscanlimits(TableScanDesc sscan, BlockNumber startBlk,
 							   BlockNumber numBlks);
 extern void heap_prepare_pagescan(TableScanDesc sscan);
-extern void heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
-						bool allow_strat, bool allow_sync, bool allow_pagemode);
+extern void heap_rescan(TableScanDesc sscan, int nkeys, ScanKey key,
+						bool set_params, bool allow_strat, bool allow_sync,
+						bool allow_pagemode);
 extern void heap_endscan(TableScanDesc sscan);
 extern HeapTuple heap_getnext(TableScanDesc sscan, ScanDirection direction);
 extern bool heap_getnextslot(TableScanDesc sscan,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..6fa0fa55a33 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -334,9 +334,10 @@ typedef struct TableAmRoutine
 	 * Restart relation scan.  If set_params is set to true, allow_{strat,
 	 * sync, pagemode} (see scan_begin) changes should be taken into account.
 	 */
-	void		(*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key,
-								bool set_params, bool allow_strat,
-								bool allow_sync, bool allow_pagemode);
+	void		(*scan_rescan) (TableScanDesc scan, int nkeys,
+								struct ScanKeyData *key, bool set_params,
+								bool allow_strat, bool allow_sync,
+								bool allow_pagemode);
 
 	/*
 	 * Return next tuple from `scan`, store in slot.
@@ -985,10 +986,10 @@ table_endscan(TableScanDesc scan)
  * Restart a relation scan.
  */
 static inline void
-table_rescan(TableScanDesc scan,
-			 struct ScanKeyData *key)
+table_rescan(TableScanDesc scan, int nkeys, struct ScanKeyData *key)
 {
-	scan->rs_rd->rd_tableam->scan_rescan(scan, key, false, false, false, false);
+	scan->rs_rd->rd_tableam->scan_rescan(scan, nkeys, key, false, false, false,
+										 false);
 }
 
 /*
@@ -1000,10 +1001,10 @@ table_rescan(TableScanDesc scan,
  * previously selected startblock will be kept.
  */
 static inline void
-table_rescan_set_params(TableScanDesc scan, struct ScanKeyData *key,
+table_rescan_set_params(TableScanDesc scan, int nkeys, struct ScanKeyData *key,
 						bool allow_strat, bool allow_sync, bool allow_pagemode)
 {
-	scan->rs_rd->rd_tableam->scan_rescan(scan, key, true,
+	scan->rs_rd->rd_tableam->scan_rescan(scan, nkeys, key, true,
 										 allow_strat, allow_sync,
 										 allow_pagemode);
 }
@@ -1068,7 +1069,8 @@ table_rescan_tidrange(TableScanDesc sscan, ItemPointer mintid,
 	/* Ensure table_beginscan_tidrange() was used. */
 	Assert((sscan->rs_flags & SO_TYPE_TIDRANGESCAN) != 0);
 
-	sscan->rs_rd->rd_tableam->scan_rescan(sscan, NULL, false, false, false, false);
+	sscan->rs_rd->rd_tableam->scan_rescan(sscan, 0, NULL, false, false, false,
+										  false);
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
 }
 
@@ -1123,7 +1125,9 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  int nkeys,
+											  struct ScanKeyData *key);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
-- 
2.39.5

v1-0002-Simple-quals-push-down-to-table-AMs.patchtext/x-diff; charset=us-asciiDownload

From 7988159e44d7786769bc2cb4fc7ab243e32a31a5 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julien@tachoires.me>
Date: Mon, 25 Aug 2025 18:23:19 +0200
Subject: [PATCH 2/6] Simple quals push down to table AMs

Simple quals like: <column> <op> <const|func|var|subquery> are now
converted to ScanKeys and then passed to the underlying layer via the
table AM API. During the execution of sequential scans, the table AM can
use the given ScanKeys to filter out the tuples not satisfying the
condition before returning them to the executor. Doing this kind of
early tuples filtering speed up sequential scans execution time when a
large portion of the table must be excluded from the final result.

The query planner, via fix_tablequal_references(), is in charge of
pre-processing the quals and exclude those that cannot be used as
ScanKeys. Pre-processing quals consists in making sure that the key is
on left part of the expression, the value is on the right, and both left
and right are not relabeled.

Non-constant values are registered as run-time keys and then evaluated
and converted to ScanKeys when a rescan is requested via
ExecReScanSeqScan(). InitPlan (sub-SELECT executed only once) is the
only type of SubQuery supported for now.

A new instrumention counter is added in order to make the distinction
between the tuples excluded by the table AM and those excluded by the
executor. The explain command output is modified in that sense too.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   4 +-
 src/backend/access/heap/heapam.c              |  30 +-
 src/backend/commands/explain.c                |  51 ++-
 src/backend/executor/instrument.c             |   1 +
 src/backend/executor/nodeSeqscan.c            | 356 +++++++++++++++++-
 src/backend/optimizer/plan/createplan.c       | 213 ++++++++++-
 src/include/access/relscan.h                  |   1 +
 src/include/executor/instrument.h             |   7 +-
 src/include/executor/nodeSeqscan.h            |   3 +
 src/include/nodes/execnodes.h                 |  40 +-
 src/include/nodes/plannodes.h                 |   2 +
 src/test/isolation/expected/stats.out         |  26 +-
 src/test/regress/expected/memoize.out         |  21 +-
 src/test/regress/expected/merge.out           |   2 +-
 src/test/regress/expected/partition_prune.out |  28 +-
 src/test/regress/expected/select_parallel.out |   4 +-
 src/test/regress/expected/updatable_views.out |   3 -
 src/test/regress/sql/partition_prune.sql      |   2 +-
 18 files changed, 701 insertions(+), 93 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index d3323b04676..bc7242835df 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11930,7 +11930,7 @@ SELECT * FROM local_tbl, async_pt WHERE local_tbl.a = async_pt.a AND local_tbl.c
  Nested Loop (actual rows=1.00 loops=1)
    ->  Seq Scan on local_tbl (actual rows=1.00 loops=1)
          Filter: (c = 'bar'::text)
-         Rows Removed by Filter: 1
+         Rows Removed In Table AM by Filter: 1
    ->  Append (actual rows=1.00 loops=1)
          ->  Async Foreign Scan on async_p1 async_pt_1 (never executed)
          ->  Async Foreign Scan on async_p2 async_pt_2 (actual rows=1.00 loops=1)
@@ -12225,7 +12225,7 @@ SELECT * FROM async_pt t1 WHERE t1.b === 505 LIMIT 1;
                Filter: (b === 505)
          ->  Seq Scan on async_p3 t1_3 (actual rows=1.00 loops=1)
                Filter: (b === 505)
-               Rows Removed by Filter: 101
+               Rows Removed In Executor by Filter: 101
 (9 rows)
 
 SELECT * FROM async_pt t1 WHERE t1.b === 505 LIMIT 1;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a5c74d8948e..71d8e06d8dd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -347,7 +347,8 @@ bitmapheap_stream_read_next(ReadStream *pgsr, void *private_data,
  * ----------------
  */
 static void
-initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
+initscan(HeapScanDesc scan, int nkeys, ScanKey keys, bool keep_startblock,
+		 bool update_stats)
 {
 	ParallelBlockTableScanDesc bpscan = NULL;
 	bool		allow_strat;
@@ -456,17 +457,20 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
 	/* page-at-a-time fields are always invalid when not rs_inited */
 
 	/*
-	 * copy the scan key, if appropriate
+	 * copy the scan keys, if appropriate
 	 */
-	if (key != NULL && scan->rs_base.rs_nkeys > 0)
-		memcpy(scan->rs_base.rs_key, key, scan->rs_base.rs_nkeys * sizeof(ScanKeyData));
+	if (keys != NULL && nkeys > 0)
+	{
+		scan->rs_base.rs_nkeys = nkeys;
+		memcpy(scan->rs_base.rs_key, keys, nkeys * sizeof(ScanKeyData));
+	}
 
 	/*
 	 * Currently, we only have a stats counter for sequential heap scans (but
 	 * e.g for bitmap scans the underlying bitmap index scans will be counted,
 	 * and for sample scans we update stats for tuple fetches).
 	 */
-	if (scan->rs_base.rs_flags & SO_TYPE_SEQSCAN)
+	if (update_stats && (scan->rs_base.rs_flags & SO_TYPE_SEQSCAN))
 		pgstat_count_heap_scan(scan->rs_base.rs_rd);
 }
 
@@ -964,7 +968,10 @@ continue_page:
 			if (key != NULL &&
 				!HeapKeyTest(tuple, RelationGetDescr(scan->rs_base.rs_rd),
 							 nkeys, key))
+			{
+				scan->rs_base.rs_nskip++;
 				continue;
+			}
 
 			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 			scan->rs_coffset = lineoff;
@@ -1072,7 +1079,10 @@ continue_page:
 			if (key != NULL &&
 				!HeapKeyTest(tuple, RelationGetDescr(scan->rs_base.rs_rd),
 							 nkeys, key))
+			{
+				scan->rs_base.rs_nskip++;
 				continue;
+			}
 
 			scan->rs_cindex = lineindex;
 			return;
@@ -1098,7 +1108,7 @@ continue_page:
 
 TableScanDesc
 heap_beginscan(Relation relation, Snapshot snapshot,
-			   int nkeys, ScanKey key,
+			   int nkeys, ScanKey keys,
 			   ParallelTableScanDesc parallel_scan,
 			   uint32 flags)
 {
@@ -1132,6 +1142,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 	scan->rs_base.rs_rd = relation;
 	scan->rs_base.rs_snapshot = snapshot;
 	scan->rs_base.rs_nkeys = nkeys;
+	scan->rs_base.rs_nskip = 0;
 	scan->rs_base.rs_flags = flags;
 	scan->rs_base.rs_parallel = parallel_scan;
 	scan->rs_strategy = NULL;	/* set in initscan */
@@ -1199,7 +1210,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 	else
 		scan->rs_base.rs_key = NULL;
 
-	initscan(scan, key, false);
+	initscan(scan, nkeys, keys, false, true);
 
 	scan->rs_read_stream = NULL;
 
@@ -1251,7 +1262,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 }
 
 void
-heap_rescan(TableScanDesc sscan, int nkeys, ScanKey key, bool set_params,
+heap_rescan(TableScanDesc sscan, int nkeys, ScanKey keys, bool set_params,
 			bool allow_strat, bool allow_sync, bool allow_pagemode)
 {
 	HeapScanDesc scan = (HeapScanDesc) sscan;
@@ -1297,10 +1308,11 @@ heap_rescan(TableScanDesc sscan, int nkeys, ScanKey key, bool set_params,
 	if (scan->rs_read_stream)
 		read_stream_reset(scan->rs_read_stream);
 
+
 	/*
 	 * reinitialize scan descriptor
 	 */
-	initscan(scan, key, true);
+	initscan(scan, nkeys, keys, true, false);
 }
 
 void
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8345bc0264b..a3c8889632a 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1963,7 +1963,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 						   "Order By", planstate, ancestors, es);
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			show_indexsearches_info(planstate, es);
 			break;
@@ -1977,7 +1977,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 						   "Order By", planstate, ancestors, es);
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			if (es->analyze)
 				ExplainPropertyFloat("Heap Fetches", NULL,
@@ -1997,7 +1997,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
 			break;
@@ -2007,6 +2007,15 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			/* fall through to print additional fields the same as SeqScan */
 			/* FALLTHROUGH */
 		case T_SeqScan:
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+			{
+				show_instrumentation_count("Rows Removed In Table AM by Filter", 3,
+										   planstate, es);
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
+										   planstate, es);
+			}
+			break;
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_NamedTuplestoreScan:
@@ -2014,7 +2023,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SubqueryScan:
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			if (IsA(plan, CteScan))
 				show_ctescan_info(castNode(CteScanState, planstate), es);
@@ -2025,7 +2034,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 
 				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 				if (plan->qual)
-					show_instrumentation_count("Rows Removed by Filter", 1,
+					show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 											   planstate, es);
 				ExplainPropertyInteger("Workers Planned", NULL,
 									   gather->num_workers, es);
@@ -2049,7 +2058,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 
 				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 				if (plan->qual)
-					show_instrumentation_count("Rows Removed by Filter", 1,
+					show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 											   planstate, es);
 				ExplainPropertyInteger("Workers Planned", NULL,
 									   gm->num_workers, es);
@@ -2083,7 +2092,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			}
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			break;
 		case T_TableFuncScan:
@@ -2097,7 +2106,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			}
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			show_table_func_scan_info(castNode(TableFuncScanState,
 											   planstate), es);
@@ -2115,7 +2124,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				show_scan_qual(tidquals, "TID Cond", planstate, ancestors, es);
 				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 				if (plan->qual)
-					show_instrumentation_count("Rows Removed by Filter", 1,
+					show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 											   planstate, es);
 			}
 			break;
@@ -2132,14 +2141,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				show_scan_qual(tidquals, "TID Cond", planstate, ancestors, es);
 				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 				if (plan->qual)
-					show_instrumentation_count("Rows Removed by Filter", 1,
+					show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 											   planstate, es);
 			}
 			break;
 		case T_ForeignScan:
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			show_foreignscan_info((ForeignScanState *) planstate, es);
 			break;
@@ -2149,7 +2158,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 
 				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 				if (plan->qual)
-					show_instrumentation_count("Rows Removed by Filter", 1,
+					show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 											   planstate, es);
 				if (css->methods->ExplainCustomScan)
 					css->methods->ExplainCustomScan(css, ancestors, es);
@@ -2163,7 +2172,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 2,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 2,
 										   planstate, es);
 			break;
 		case T_MergeJoin:
@@ -2176,7 +2185,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 2,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 2,
 										   planstate, es);
 			break;
 		case T_HashJoin:
@@ -2189,7 +2198,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 2,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 2,
 										   planstate, es);
 			break;
 		case T_Agg:
@@ -2197,7 +2206,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			show_hashagg_info((AggState *) planstate, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			break;
 		case T_WindowAgg:
@@ -2206,7 +2215,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 							"Run Condition", planstate, ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			show_windowagg_info(castNode(WindowAggState, planstate), es);
 			break;
@@ -2214,7 +2223,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			show_group_keys(castNode(GroupState, planstate), ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			break;
 		case T_Sort:
@@ -2236,7 +2245,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 							"One-Time Filter", planstate, ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			break;
 		case T_ModifyTable:
@@ -3990,7 +3999,9 @@ show_instrumentation_count(const char *qlabel, int which,
 	if (!es->analyze || !planstate->instrument)
 		return;
 
-	if (which == 2)
+	if (which == 3)
+		nfiltered = planstate->instrument->nfiltered3;
+	else if (which == 2)
 		nfiltered = planstate->instrument->nfiltered2;
 	else
 		nfiltered = planstate->instrument->nfiltered1;
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 56e635f4700..e9ddf39c42a 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -186,6 +186,7 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
 	dst->nloops += add->nloops;
 	dst->nfiltered1 += add->nfiltered1;
 	dst->nfiltered2 += add->nfiltered2;
+	dst->nfiltered3 += add->nfiltered3;
 
 	/* Add delta of buffer usage since entry to node's totals */
 	if (dst->need_bufusage)
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index c89aa6c6616..0562377c42a 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -29,9 +29,12 @@
 
 #include "access/relscan.h"
 #include "access/tableam.h"
+#include "executor/execExpr.h"
 #include "executor/execScan.h"
 #include "executor/executor.h"
 #include "executor/nodeSeqscan.h"
+#include "nodes/nodeFuncs.h"
+#include "utils/lsyscache.h"
 #include "utils/rel.h"
 
 static TupleTableSlot *SeqNext(SeqScanState *node);
@@ -41,6 +44,146 @@ static TupleTableSlot *SeqNext(SeqScanState *node);
  * ----------------------------------------------------------------
  */
 
+/* ----------------------------------------------------------------
+ *		ExecSeqBuildScanKeys
+ *
+ *		Builds the scan keys pushed to the table AM API. Scan keys
+ *		are used to filter out tuples before returning them to the
+ *		executor, based on the quals list.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecSeqBuildScanKeys(PlanState *planstate, List *quals, int *numScanKeys,
+					 ScanKey *scanKeys, SeqScanRuntimeKeyInfo * *runtimeKeys,
+					 int *numRuntimeKeys)
+{
+	ListCell   *qual_cell;
+	ScanKey		scan_keys;
+	int			n_scan_keys = 0;
+	int			n_quals;
+	SeqScanRuntimeKeyInfo *runtime_keys;
+	int			n_runtime_keys;
+	int			max_runtime_keys;
+
+	n_quals = list_length(quals);
+
+	/*
+	 * If quals list is empty we have nothing to do.
+	 */
+	if (n_quals == 0)
+		return;
+
+	/*
+	 * Allocate an array of ScanKeyData structs: one per qual.
+	 *
+	 * Note: when we cannot convert all the quals to ScanKeys, then we waste
+	 * some memory but this avoids memory reallocation on the fly.
+	 */
+	scan_keys = (ScanKey) palloc(n_quals * sizeof(ScanKeyData));
+
+	/*
+	 * run-time_keys array is dynamically resized as needed. Caller must be
+	 * sure to pass in NULL/0 for first call.
+	 */
+	runtime_keys = *runtimeKeys;
+	n_runtime_keys = max_runtime_keys = *numRuntimeKeys;
+
+	foreach(qual_cell, quals)
+	{
+		Expr	   *clause = (Expr *) lfirst(qual_cell);
+		ScanKey		this_scan_key = &scan_keys[n_scan_keys];
+		RegProcedure opfuncid;	/* operator proc id used in scan */
+		Expr	   *leftop;		/* expr on lhs of operator */
+		Expr	   *rightop;	/* expr on rhs ... */
+		AttrNumber	varattno;	/* att number used in scan */
+
+		/*
+		 * Simple qual case: <leftop> <op> <rightop>
+		 */
+		if (IsA(clause, OpExpr))
+		{
+			int			flags = 0;
+			Datum		scanvalue;
+
+			opfuncid = ((OpExpr *) clause)->opfuncid;
+
+			/*
+			 * leftop and rightop are not relabeled and can be used as they
+			 * are because they have been pre-computed by
+			 * fix_tablequal_references(), so, the key Var is always on the
+			 * left.
+			 */
+			leftop = (Expr *) get_leftop(clause);
+			rightop = (Expr *) get_rightop(clause);
+
+			varattno = ((Var *) leftop)->varattno;
+
+			if (IsA(rightop, Const))
+			{
+				/*
+				 * OK, simple constant comparison value
+				 */
+				scanvalue = ((Const *) rightop)->constvalue;
+				if (((Const *) rightop)->constisnull)
+					flags |= SK_ISNULL;
+			}
+			else
+			{
+				/* Need to treat this one as a run-time key */
+				if (n_runtime_keys >= max_runtime_keys)
+				{
+					if (max_runtime_keys == 0)
+					{
+						max_runtime_keys = 8;
+						runtime_keys = (SeqScanRuntimeKeyInfo *)
+							palloc(max_runtime_keys * sizeof(SeqScanRuntimeKeyInfo));
+					}
+					else
+					{
+						max_runtime_keys *= 2;
+						runtime_keys = (SeqScanRuntimeKeyInfo *)
+							repalloc(runtime_keys,
+									 max_runtime_keys * sizeof(SeqScanRuntimeKeyInfo));
+					}
+				}
+				runtime_keys[n_runtime_keys].scan_key = this_scan_key;
+				runtime_keys[n_runtime_keys].key_expr =
+					ExecInitExpr(rightop, planstate);
+				runtime_keys[n_runtime_keys].key_toastable =
+					TypeIsToastable(((Var *) leftop)->vartype);
+				n_runtime_keys++;
+				scanvalue = (Datum) 0;
+			}
+
+			n_scan_keys++;
+
+			ScanKeyEntryInitialize(this_scan_key,
+								   flags,
+								   varattno,
+								   InvalidStrategy, /* no strategy */
+								   InvalidOid,	/* no subtype */
+								   ((OpExpr *) clause)->inputcollid,
+								   opfuncid,
+								   scanvalue);
+		}
+		else
+		{
+			/*
+			 * Unsupported qual, then do not push it to the table AM.
+			 */
+			continue;
+		}
+	}
+
+	/*
+	 * Return info to our caller.
+	 */
+	*scanKeys = scan_keys;
+	*numScanKeys = n_scan_keys;
+	*runtimeKeys = runtime_keys;
+	*numRuntimeKeys = n_runtime_keys;
+}
+
 /* ----------------------------------------------------------------
  *		SeqNext
  *
@@ -71,15 +214,47 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   node->sss_NumScanKeys,
+								   node->sss_ScanKeys);
 		node->ss.ss_currentScanDesc = scandesc;
+
+		/*
+		 * If no run-time key to calculate or if they are ready to use, go
+		 * ahead and pass the ScanKeys to the table AM.
+		 */
+		if (node->sss_NumRuntimeKeys == 0 || node->sss_RuntimeKeysReady)
+			table_rescan(node->ss.ss_currentScanDesc, node->sss_NumScanKeys,
+						 node->sss_ScanKeys);
 	}
 
 	/*
 	 * get the next tuple from the table
 	 */
 	if (table_scan_getnextslot(scandesc, direction, slot))
+	{
+		/*
+		 * Update the instrumentation counter in charge of tracking the number
+		 * of tuples skipped during table/seq scan.
+		 *
+		 * Note: it seems necessary to do it after getting each tuple only
+		 * when the table scan is executed by the postgres_fdw. In all other
+		 * cases, we can update the counter only once when there is no next
+		 * tuple to return.
+		 */
+		InstrCountFiltered3(node, scandesc->rs_nskip);
+
+		/*
+		 * We have to reset the local counter once the instrumentation counter
+		 * has been updated.
+		 */
+		scandesc->rs_nskip = 0;
+
 		return slot;
+	}
+
+	InstrCountFiltered3(node, scandesc->rs_nskip);
+	scandesc->rs_nskip = 0;
+
 	return NULL;
 }
 
@@ -115,6 +290,15 @@ ExecSeqScan(PlanState *pstate)
 	Assert(pstate->qual == NULL);
 	Assert(pstate->ps_ProjInfo == NULL);
 
+	/*
+	 * If we have run-time keys and they've not already been set up, do it
+	 * now.
+	 */
+	if (node->sss_NumRuntimeKeys != 0 && !node->sss_RuntimeKeysReady)
+	{
+		ExecReScan((PlanState *) node);
+	}
+
 	return ExecScanExtended(&node->ss,
 							(ExecScanAccessMtd) SeqNext,
 							(ExecScanRecheckMtd) SeqRecheck,
@@ -139,6 +323,15 @@ ExecSeqScanWithQual(PlanState *pstate)
 	pg_assume(pstate->qual != NULL);
 	Assert(pstate->ps_ProjInfo == NULL);
 
+	/*
+	 * If we have run-time keys and they've not already been set up, do it
+	 * now.
+	 */
+	if (node->sss_NumRuntimeKeys != 0 && !node->sss_RuntimeKeysReady)
+	{
+		ExecReScan((PlanState *) node);
+	}
+
 	return ExecScanExtended(&node->ss,
 							(ExecScanAccessMtd) SeqNext,
 							(ExecScanRecheckMtd) SeqRecheck,
@@ -159,6 +352,15 @@ ExecSeqScanWithProject(PlanState *pstate)
 	Assert(pstate->qual == NULL);
 	pg_assume(pstate->ps_ProjInfo != NULL);
 
+	/*
+	 * If we have run-time keys and they've not already been set up, do it
+	 * now.
+	 */
+	if (node->sss_NumRuntimeKeys != 0 && !node->sss_RuntimeKeysReady)
+	{
+		ExecReScan((PlanState *) node);
+	}
+
 	return ExecScanExtended(&node->ss,
 							(ExecScanAccessMtd) SeqNext,
 							(ExecScanRecheckMtd) SeqRecheck,
@@ -180,6 +382,15 @@ ExecSeqScanWithQualProject(PlanState *pstate)
 	pg_assume(pstate->qual != NULL);
 	pg_assume(pstate->ps_ProjInfo != NULL);
 
+	/*
+	 * If we have run-time keys and they've not already been set up, do it
+	 * now.
+	 */
+	if (node->sss_NumRuntimeKeys != 0 && !node->sss_RuntimeKeysReady)
+	{
+		ExecReScan((PlanState *) node);
+	}
+
 	return ExecScanExtended(&node->ss,
 							(ExecScanAccessMtd) SeqNext,
 							(ExecScanRecheckMtd) SeqRecheck,
@@ -198,6 +409,15 @@ ExecSeqScanEPQ(PlanState *pstate)
 {
 	SeqScanState *node = castNode(SeqScanState, pstate);
 
+	/*
+	 * If we have run-time keys and they've not already been set up, do it
+	 * now.
+	 */
+	if (node->sss_NumRuntimeKeys != 0 && !node->sss_RuntimeKeysReady)
+	{
+		ExecReScan((PlanState *) node);
+	}
+
 	return ExecScan(&node->ss,
 					(ExecScanAccessMtd) SeqNext,
 					(ExecScanRecheckMtd) SeqRecheck);
@@ -225,6 +445,11 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate = makeNode(SeqScanState);
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
+	scanstate->sss_ScanKeys = NULL;
+	scanstate->sss_NumScanKeys = 0;
+	scanstate->sss_RuntimeKeysReady = false;
+	scanstate->sss_RuntimeKeys = NULL;
+	scanstate->sss_NumRuntimeKeys = 0;
 
 	/*
 	 * Miscellaneous initialization
@@ -258,6 +483,14 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.qual =
 		ExecInitQual(node->scan.plan.qual, (PlanState *) scanstate);
 
+	/* Build sequential scan keys */
+	ExecSeqBuildScanKeys((PlanState *) scanstate,
+						 node->tablequal,
+						 &scanstate->sss_NumScanKeys,
+						 &scanstate->sss_ScanKeys,
+						 &scanstate->sss_RuntimeKeys,
+						 &scanstate->sss_NumRuntimeKeys);
+
 	/*
 	 * When EvalPlanQual() is not in use, assign ExecProcNode for this node
 	 * based on the presence of qual and projection. Each ExecSeqScan*()
@@ -280,6 +513,24 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 			scanstate->ss.ps.ExecProcNode = ExecSeqScanWithQualProject;
 	}
 
+	/*
+	 * If we have runtime keys, we need an ExprContext to evaluate them. The
+	 * node's standard context won't do because we want to reset that context
+	 * for every tuple.  So, build another context just like the other one...
+	 */
+	if (scanstate->sss_NumRuntimeKeys != 0)
+	{
+		ExprContext *stdecontext = scanstate->ss.ps.ps_ExprContext;
+
+		ExecAssignExprContext(estate, &scanstate->ss.ps);
+		scanstate->sss_RuntimeContext = scanstate->ss.ps.ps_ExprContext;
+		scanstate->ss.ps.ps_ExprContext = stdecontext;
+	}
+	else
+	{
+		scanstate->sss_RuntimeContext = NULL;
+	}
+
 	return scanstate;
 }
 
@@ -322,16 +573,91 @@ ExecReScanSeqScan(SeqScanState *node)
 {
 	TableScanDesc scan;
 
+	/*
+	 * If we are doing runtime key calculations (ie, any of the scan key
+	 * values weren't simple Consts), compute the new key values.  But first,
+	 * reset the context so we don't leak memory as each outer tuple is
+	 * scanned.  Note this assumes that we will recalculate *all* runtime keys
+	 * on each call.
+	 */
+	if (node->sss_NumRuntimeKeys != 0)
+	{
+		ExprContext *econtext = node->sss_RuntimeContext;
+
+		ResetExprContext(econtext);
+		ExecSeqScanEvalRuntimeKeys(econtext,
+								   node->sss_RuntimeKeys,
+								   node->sss_NumRuntimeKeys);
+	}
+	node->sss_RuntimeKeysReady = true;
+
+
 	scan = node->ss.ss_currentScanDesc;
 
 	if (scan != NULL)
 		table_rescan(scan,		/* scan desc */
-					 0,			/* number of scan keys */
-					 NULL);		/* new scan keys */
+					 node->sss_NumScanKeys, /* number of scan keys */
+					 node->sss_ScanKeys);	/* scan keys */
 
 	ExecScanReScan((ScanState *) node);
 }
 
+/* ----------------------------------------------------------------
+ * 		ExecSeqScanEvalRuntimeKeys
+ *
+ * 		Evaluate any run-time key values, and update the scankeys.
+ * ----------------------------------------------------------------
+ */
+void
+ExecSeqScanEvalRuntimeKeys(ExprContext *econtext,
+						   SeqScanRuntimeKeyInfo * runtimeKeys,
+						   int numRuntimeKeys)
+{
+	int			j;
+	MemoryContext oldContext;
+
+	/* We want to keep the key values in per-tuple memory */
+	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
+	for (j = 0; j < numRuntimeKeys; j++)
+	{
+		ScanKey		scan_key = runtimeKeys[j].scan_key;
+		ExprState  *key_expr = runtimeKeys[j].key_expr;
+		Datum		scanvalue;
+		bool		isNull;
+
+		/*
+		 * For each run-time key, extract the run-time expression and evaluate
+		 * it with respect to the current context.  We then stick the result
+		 * into the proper scan key.
+		 *
+		 * Note: the result of the eval could be a pass-by-ref value that's
+		 * stored in some outer scan's tuple, not in
+		 * econtext->ecxt_per_tuple_memory.  We assume that the outer tuple
+		 * will stay put throughout our scan.  If this is wrong, we could copy
+		 * the result into our context explicitly, but I think that's not
+		 * necessary.
+		 */
+		scanvalue = ExecEvalExpr(key_expr,
+								 econtext,
+								 &isNull);
+		if (isNull)
+		{
+			scan_key->sk_argument = scanvalue;
+			scan_key->sk_flags |= SK_ISNULL;
+		}
+		else
+		{
+			if (runtimeKeys[j].key_toastable)
+				scanvalue = PointerGetDatum(PG_DETOAST_DATUM(scanvalue));
+			scan_key->sk_argument = scanvalue;
+			scan_key->sk_flags &= ~SK_ISNULL;
+		}
+	}
+
+	MemoryContextSwitchTo(oldContext);
+}
+
 /* ----------------------------------------------------------------
  *						Parallel Scan Support
  * ----------------------------------------------------------------
@@ -375,7 +701,17 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0, NULL);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 node->sss_NumScanKeys,
+								 node->sss_ScanKeys);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the table AM.
+	 */
+	if (node->sss_NumRuntimeKeys == 0 || node->sss_RuntimeKeysReady)
+		table_rescan(node->ss.ss_currentScanDesc, node->sss_NumScanKeys,
+					 node->sss_ScanKeys);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +744,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0, NULL);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 node->sss_NumScanKeys,
+								 node->sss_ScanKeys);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the table AM.
+	 */
+	if (node->sss_NumRuntimeKeys == 0 || node->sss_RuntimeKeysReady)
+		table_rescan(node->ss.ss_currentScanDesc, node->sss_NumScanKeys,
+					 node->sss_ScanKeys);
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 6791cbeb416..bb0856ac0bc 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -20,6 +20,7 @@
 
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
+#include "executor/executor.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/extensible.h"
@@ -40,8 +41,10 @@
 #include "optimizer/tlist.h"
 #include "parser/parse_clause.h"
 #include "parser/parsetree.h"
+#include "parser/parse_relation.h"
 #include "partitioning/partprune.h"
 #include "tcop/tcopprot.h"
+#include "utils/acl.h"
 #include "utils/lsyscache.h"
 
 
@@ -170,6 +173,8 @@ static Node *fix_indexqual_clause(PlannerInfo *root,
 								  IndexOptInfo *index, int indexcol,
 								  Node *clause, List *indexcolnos);
 static Node *fix_indexqual_operand(Node *node, IndexOptInfo *index, int indexcol);
+static void fix_tablequal_references(PlannerInfo *root, Path *best_path,
+									 List *scan_clauses, List **fixed_tablequals_p);
 static List *get_switched_clauses(List *clauses, Relids outerrelids);
 static List *order_qual_clauses(PlannerInfo *root, List *clauses);
 static void copy_generic_path_info(Plan *dest, Path *src);
@@ -178,7 +183,7 @@ static void label_sort_with_costsize(PlannerInfo *root, Sort *plan,
 									 double limit_tuples);
 static void label_incrementalsort_with_costsize(PlannerInfo *root, IncrementalSort *plan,
 												List *pathkeys, double limit_tuples);
-static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
+static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid, List *tablequal);
 static SampleScan *make_samplescan(List *qptlist, List *qpqual, Index scanrelid,
 								   TableSampleClause *tsc);
 static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
@@ -2760,10 +2765,16 @@ create_seqscan_plan(PlannerInfo *root, Path *best_path,
 {
 	SeqScan    *scan_plan;
 	Index		scan_relid = best_path->parent->relid;
+	List	   *fixed_tablequals = NIL;
+	RangeTblEntry *rte;
+	RTEPermissionInfo *perminfo;
+	bool		do_fix_tablequal_ref = true;
 
 	/* it should be a base rel... */
 	Assert(scan_relid > 0);
+	rte = planner_rt_fetch(scan_relid, root);
 	Assert(best_path->parent->rtekind == RTE_RELATION);
+	Assert(rte->rtekind == RTE_RELATION);
 
 	/* Sort clauses into best execution order */
 	scan_clauses = order_qual_clauses(root, scan_clauses);
@@ -2778,9 +2789,25 @@ create_seqscan_plan(PlannerInfo *root, Path *best_path,
 			replace_nestloop_params(root, (Node *) scan_clauses);
 	}
 
+	/*
+	 * Check relation permission before doing any preliminary work on quals.
+	 * If the permissions can't be checked, then we won't do unnecessary work
+	 * related to quals push down.
+	 */
+	if (rte->perminfoindex != 0)
+	{
+		perminfo = getRTEPermissionInfo(root->parse->rteperminfos, rte);
+		if (!ExecCheckOneRelPerms(perminfo))
+			do_fix_tablequal_ref = false;
+	}
+
+	if (do_fix_tablequal_ref)
+		fix_tablequal_references(root, best_path, scan_clauses, &fixed_tablequals);
+
 	scan_plan = make_seqscan(tlist,
 							 scan_clauses,
-							 scan_relid);
+							 scan_relid,
+							 fixed_tablequals);
 
 	copy_generic_path_info(&scan_plan->scan.plan, best_path);
 
@@ -5172,6 +5199,184 @@ fix_indexqual_operand(Node *node, IndexOptInfo *index, int indexcol)
 	return NULL;				/* keep compiler quiet */
 }
 
+/*
+ * Check if the right part of a qual can be used in a ScanKey that will
+ * later be pushed down during sequential scan.
+ */
+static bool inline
+check_tablequal_rightop(Expr *rightop)
+{
+	switch (nodeTag((Node *) rightop))
+	{
+			/* Supported nodes */
+		case T_Const:
+		case T_Param:
+			break;
+
+			/*
+			 * In case of function expression, make sure function args do not
+			 * contain any reference to the table being scanned.
+			 */
+		case T_FuncExpr:
+			{
+				FuncExpr   *func = (FuncExpr *) rightop;
+				ListCell   *temp;
+
+				foreach(temp, func->args)
+				{
+					Node	   *arg = lfirst(temp);
+
+					if (IsA(arg, Var) && ((Var *) arg)->varattno > 0)
+						return false;
+				}
+
+				break;
+			}
+
+			/*
+			 * In case of Var, check if this is an attribute of a relation,
+			 * which is not supported.
+			 */
+		case T_Var:
+			{
+				if (((Var *) rightop)->varattno > 0)
+					return false;
+				break;
+			}
+			/* Unsupported nodes */
+		default:
+			return false;
+			break;
+	}
+
+	return true;
+}
+
+/*
+ * fix_tablequal_references
+ *    Precompute scan clauses in order to pass them ready to be pushed down by
+ *    the executor during table scan.
+ *
+ * We do left/right commutation if needed because we want to keep the scan key
+ * on left.
+ */
+static void
+fix_tablequal_references(PlannerInfo *root, Path *best_path,
+						 List *scan_clauses, List **fixed_tablequals_p)
+{
+	List	   *fixed_tablequals;
+	ListCell   *lc;
+
+	fixed_tablequals = NIL;
+
+	scan_clauses = (List *) replace_nestloop_params(root, (Node *) scan_clauses);
+
+	foreach(lc, scan_clauses)
+	{
+		/*
+		 * Let work with a "deep" copy of the original scan clause in order to
+		 * avoid any update on the initial scan clause.
+		 */
+		Expr	   *clause = (Expr *) copyObject(lfirst(lc));
+
+		switch (nodeTag((Node *) clause))
+		{
+				/*
+				 * Simple qual case: <leftop> <op> <rightop>
+				 */
+			case T_OpExpr:
+				{
+					OpExpr	   *opexpr = (OpExpr *) clause;
+					Expr	   *leftop;
+					Expr	   *rightop;
+
+					leftop = (Expr *) get_leftop(clause);
+					rightop = (Expr *) get_rightop(clause);
+
+					if (leftop && IsA(leftop, RelabelType))
+						leftop = ((RelabelType *) leftop)->arg;
+
+					if (rightop && IsA(rightop, RelabelType))
+						rightop = ((RelabelType *) rightop)->arg;
+
+					if (leftop == NULL || rightop == NULL)
+						continue;
+
+					/*
+					 * Ignore qual if the operator is user defined
+					 */
+					if (opexpr->opno >= FirstNormalObjectId)
+						continue;
+
+					/*
+					 * Ignore qual if the function is not leakproof
+					 */
+					if (!get_func_leakproof(opexpr->opfuncid))
+						continue;
+
+					/*
+					 * Commute left and right if needed and reflect those
+					 * changes on the clause, this way, the executor won't
+					 * have to check positions of Var and Const/other: Var is
+					 * always on the left while Const/other is on the right.
+					 */
+					if (IsA(rightop, Var) && !IsA(leftop, Var)
+						&& ((Var *) rightop)->varattno > 0)
+					{
+						Expr	   *tmpop = leftop;
+						Oid			commutator;
+
+						leftop = rightop;
+						rightop = tmpop;
+
+						commutator = get_commutator(opexpr->opno);
+
+						if (OidIsValid(commutator))
+						{
+							opexpr->opno = commutator;
+							opexpr->opfuncid = get_opcode(opexpr->opno);
+						}
+						else
+						{
+							/*
+							 * If we don't have any commutator function
+							 * available for this operator, then ignore the
+							 * qual because we cannot commute it.
+							 */
+							continue;
+						}
+					}
+
+					/*
+					 * Make sure our left part is a Var referencing an
+					 * attribute.
+					 */
+					if (!(IsA(leftop, Var) && ((Var *) leftop)->varattno > 0))
+						continue;
+
+					if (!check_tablequal_rightop(rightop))
+						continue;
+
+					/*
+					 * Even if there is no left/right commutation, update the
+					 * clause in order to avoid unnecessary checks by the
+					 * executor.
+					 */
+					list_free(opexpr->args);
+					opexpr->args = list_make2(leftop, rightop);
+
+					/* Append the modified clause to fixed_tablequals */
+					fixed_tablequals = lappend(fixed_tablequals, clause);
+					break;
+				}
+			default:
+				continue;
+		}
+	}
+
+	*fixed_tablequals_p = fixed_tablequals;
+}
+
 /*
  * get_switched_clauses
  *	  Given a list of merge or hash joinclauses (as RestrictInfo nodes),
@@ -5484,7 +5689,8 @@ bitmap_subplan_mark_shared(Plan *plan)
 static SeqScan *
 make_seqscan(List *qptlist,
 			 List *qpqual,
-			 Index scanrelid)
+			 Index scanrelid,
+			 List *tablequal)
 {
 	SeqScan    *node = makeNode(SeqScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5494,6 +5700,7 @@ make_seqscan(List *qptlist,
 	plan->lefttree = NULL;
 	plan->righttree = NULL;
 	node->scan.scanrelid = scanrelid;
+	node->tablequal = tablequal;
 
 	return node;
 }
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..9549bc29f38 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -65,6 +65,7 @@ typedef struct TableScanDescData
 
 	struct ParallelTableScanDescData *rs_parallel;	/* parallel scan
 													 * information */
+	uint64		rs_nskip;		/* number of tuples skipped during table scan */
 } TableScanDescData;
 typedef struct TableScanDescData *TableScanDesc;
 
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 03653ab6c6c..8e07a57a767 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -87,8 +87,11 @@ typedef struct Instrumentation
 	double		ntuples;		/* total tuples produced */
 	double		ntuples2;		/* secondary node-specific tuple counter */
 	double		nloops;			/* # of run cycles for this node */
-	double		nfiltered1;		/* # of tuples removed by scanqual or joinqual */
-	double		nfiltered2;		/* # of tuples removed by "other" quals */
+	double		nfiltered1;		/* # of tuples in executor removed by scanqual
+								 * or joinqual */
+	double		nfiltered2;		/* # of tuples in executor removed by "other"
+								 * quals */
+	double		nfiltered3;		/* # of tuples in table AM removed by quals */
 	BufferUsage bufusage;		/* total buffer usage */
 	WalUsage	walusage;		/* total WAL usage */
 } Instrumentation;
diff --git a/src/include/executor/nodeSeqscan.h b/src/include/executor/nodeSeqscan.h
index 3adad8b585b..6285ebc0e58 100644
--- a/src/include/executor/nodeSeqscan.h
+++ b/src/include/executor/nodeSeqscan.h
@@ -20,6 +20,9 @@
 extern SeqScanState *ExecInitSeqScan(SeqScan *node, EState *estate, int eflags);
 extern void ExecEndSeqScan(SeqScanState *node);
 extern void ExecReScanSeqScan(SeqScanState *node);
+extern void ExecSeqScanEvalRuntimeKeys(ExprContext *econtext,
+									   SeqScanRuntimeKeyInfo * runtimeKeys,
+									   int numRuntimeKeys);
 
 /* parallel scan support */
 extern void ExecSeqScanEstimate(SeqScanState *node, ParallelContext *pcxt);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index de782014b2d..ec2d42c57b4 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1273,6 +1273,11 @@ typedef struct PlanState
 		if (((PlanState *)(node))->instrument) \
 			((PlanState *)(node))->instrument->nfiltered2 += (delta); \
 	} while(0)
+#define InstrCountFiltered3(node, delta) \
+	do { \
+		if (((PlanState *)(node))->instrument) \
+			((PlanState *)(node))->instrument->nfiltered3 += (delta); \
+	} while(0)
 
 /*
  * EPQState is state for executing an EvalPlanQual recheck on a candidate
@@ -1621,14 +1626,38 @@ typedef struct ScanState
 	TupleTableSlot *ss_ScanTupleSlot;
 } ScanState;
 
+typedef struct RuntimeKeyInfo
+{
+	struct ScanKeyData *scan_key;	/* scankey to put value into */
+	ExprState  *key_expr;		/* expr to evaluate to get value */
+	bool		key_toastable;	/* is expr's result a toastable datatype? */
+}			RuntimeKeyInfo;
+
+typedef struct RuntimeKeyInfo SeqScanRuntimeKeyInfo;
+
 /* ----------------
  *	 SeqScanState information
+ *
+ *		ss					its first field is NodeTag
+ *		pscan_len			size of parallel heap scan descriptor
+ *		sss_ScanKeys		Skeys array used to push down quals
+ *		sss_NumScanKeys		number of Skeys
+ *		sss_RuntimeKeys		info about Skeys that must be evaluated at runtime
+ *		sss_NumRuntimeKeys	number of RuntimeKeys
+ *		sss_RuntimeKeysReady true if runtime Skeys have been computed
+ *		sss_RuntimeContext	expr context for evaling runtime Skeys
  * ----------------
  */
 typedef struct SeqScanState
 {
-	ScanState	ss;				/* its first field is NodeTag */
-	Size		pscan_len;		/* size of parallel heap scan descriptor */
+	ScanState	ss;
+	Size		pscan_len;
+	struct ScanKeyData *sss_ScanKeys;
+	int			sss_NumScanKeys;
+	SeqScanRuntimeKeyInfo *sss_RuntimeKeys;
+	int			sss_NumRuntimeKeys;
+	bool		sss_RuntimeKeysReady;
+	ExprContext *sss_RuntimeContext;
 } SeqScanState;
 
 /* ----------------
@@ -1657,12 +1686,7 @@ typedef struct SampleScanState
  * constant right-hand sides.  See comments for ExecIndexBuildScanKeys()
  * for discussion.
  */
-typedef struct
-{
-	struct ScanKeyData *scan_key;	/* scankey to put value into */
-	ExprState  *key_expr;		/* expr to evaluate to get value */
-	bool		key_toastable;	/* is expr's result a toastable datatype? */
-} IndexRuntimeKeyInfo;
+typedef struct RuntimeKeyInfo IndexRuntimeKeyInfo;
 
 typedef struct
 {
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 29d7732d6a0..595bb7f5e5a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -501,6 +501,8 @@ typedef struct Scan
 typedef struct SeqScan
 {
 	Scan		scan;
+	/* list of quals (usually OpExprs) pushed down to the table AM */
+	List	   *tablequal;
 } SeqScan;
 
 /* ----------------
diff --git a/src/test/isolation/expected/stats.out b/src/test/isolation/expected/stats.out
index 8c7fe60217e..0064c0c8df0 100644
--- a/src/test/isolation/expected/stats.out
+++ b/src/test/isolation/expected/stats.out
@@ -2414,7 +2414,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       3|           6|        1|        1|        0|         1|         1|           0
+       3|           5|        1|        1|        0|         1|         1|           0
 (1 row)
 
 
@@ -2476,7 +2476,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       3|           5|        2|        0|        1|         1|         1|           0
+       3|           4|        2|        0|        1|         1|         1|           0
 (1 row)
 
 step s1_table_select: SELECT * FROM test_stat_tab ORDER BY key, value;
@@ -2508,7 +2508,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       5|           9|        2|        1|        1|         1|         2|           0
+       5|           7|        2|        1|        1|         1|         2|           0
 (1 row)
 
 
@@ -2571,7 +2571,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       9|          31|        4|        5|        1|         3|         6|           0
+       9|          13|        4|        5|        1|         3|         6|           0
 (1 row)
 
 
@@ -2640,7 +2640,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       9|          31|        4|        5|        1|         3|         6|           0
+       9|          13|        4|        5|        1|         3|         6|           0
 (1 row)
 
 
@@ -2701,7 +2701,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       9|          29|        4|        5|        1|         1|         8|           0
+       9|          11|        4|        5|        1|         1|         8|           0
 (1 row)
 
 
@@ -2768,7 +2768,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       9|          29|        4|        5|        1|         1|         8|           0
+       9|          11|        4|        5|        1|         1|         8|           0
 (1 row)
 
 
@@ -2808,7 +2808,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       3|           9|        5|        1|        0|         1|         1|           0
+       3|           3|        5|        1|        0|         1|         1|           0
 (1 row)
 
 
@@ -2854,7 +2854,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       3|           9|        5|        1|        0|         1|         1|           0
+       3|           3|        5|        1|        0|         1|         1|           0
 (1 row)
 
 
@@ -2894,7 +2894,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       3|           9|        4|        2|        0|         4|         2|           0
+       3|           3|        4|        2|        0|         4|         2|           0
 (1 row)
 
 
@@ -2940,7 +2940,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       3|           9|        4|        2|        0|         4|         2|           0
+       3|           3|        4|        2|        0|         4|         2|           0
 (1 row)
 
 
@@ -2981,7 +2981,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       4|          16|        5|        3|        1|         4|         4|           0
+       4|           4|        5|        3|        1|         4|         4|           0
 (1 row)
 
 
@@ -3028,7 +3028,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       4|          16|        5|        3|        1|         4|         4|           0
+       4|           4|        5|        3|        1|         4|         4|           0
 (1 row)
 
 
diff --git a/src/test/regress/expected/memoize.out b/src/test/regress/expected/memoize.out
index 150dc1b44cf..f2a92d1fdfd 100644
--- a/src/test/regress/expected/memoize.out
+++ b/src/test/regress/expected/memoize.out
@@ -43,7 +43,7 @@ WHERE t2.unique1 < 1000;', false);
    ->  Nested Loop (actual rows=1000.00 loops=N)
          ->  Seq Scan on tenk1 t2 (actual rows=1000.00 loops=N)
                Filter: (unique1 < 1000)
-               Rows Removed by Filter: 9000
+               Rows Removed In Table AM by Filter: 9000
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: t2.twenty
                Cache Mode: logical
@@ -75,7 +75,7 @@ WHERE t1.unique1 < 1000;', false);
    ->  Nested Loop (actual rows=1000.00 loops=N)
          ->  Seq Scan on tenk1 t1 (actual rows=1000.00 loops=N)
                Filter: (unique1 < 1000)
-               Rows Removed by Filter: 9000
+               Rows Removed In Table AM by Filter: 9000
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: t1.twenty
                Cache Mode: binary
@@ -117,7 +117,7 @@ WHERE t1.unique1 < 10;', false);
                Hits: 8  Misses: 2  Evictions: Zero  Overflows: 0  Memory Usage: NkB
                ->  Subquery Scan on t2 (actual rows=2.00 loops=N)
                      Filter: (t1.two = t2.two)
-                     Rows Removed by Filter: 2
+                     Rows Removed In Executor by Filter: 2
                      ->  Index Scan using tenk1_unique1 on tenk1 t2_1 (actual rows=4.00 loops=N)
                            Index Cond: (unique1 < 4)
                            Index Searches: N
@@ -146,14 +146,14 @@ WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false);
    ->  Nested Loop (actual rows=1000.00 loops=N)
          ->  Seq Scan on tenk1 t1 (actual rows=1000.00 loops=N)
                Filter: (unique1 < 1000)
-               Rows Removed by Filter: 9000
+               Rows Removed In Table AM by Filter: 9000
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: (t1.two + 1)
                Cache Mode: binary
                Hits: 998  Misses: 2  Evictions: Zero  Overflows: 0  Memory Usage: NkB
                ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1.00 loops=N)
                      Filter: ((t1.two + 1) = unique1)
-                     Rows Removed by Filter: 9999
+                     Rows Removed In Executor by Filter: 9999
                      Heap Fetches: N
                      Index Searches: N
 (14 rows)
@@ -179,15 +179,16 @@ WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false);
    ->  Nested Loop (actual rows=1000.00 loops=N)
          ->  Seq Scan on tenk1 t1 (actual rows=1000.00 loops=N)
                Filter: (unique1 < 1000)
-               Rows Removed by Filter: 9000
+               Rows Removed In Table AM by Filter: 9000
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: t1.two, t1.twenty
                Cache Mode: binary
                Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
                ->  Seq Scan on tenk1 t2 (actual rows=1.00 loops=N)
                      Filter: ((t1.twenty = unique1) AND (t1.two = two))
-                     Rows Removed by Filter: 9999
-(12 rows)
+                     Rows Removed In Table AM by Filter: 5000
+                     Rows Removed In Executor by Filter: 4999
+(13 rows)
 
 -- And check we get the expected results.
 SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
@@ -246,7 +247,7 @@ WHERE t2.unique1 < 1200;', true);
    ->  Nested Loop (actual rows=1200.00 loops=N)
          ->  Seq Scan on tenk1 t2 (actual rows=1200.00 loops=N)
                Filter: (unique1 < 1200)
-               Rows Removed by Filter: 8800
+               Rows Removed In Table AM by Filter: 8800
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: t2.thousand
                Cache Mode: logical
@@ -522,7 +523,7 @@ WHERE t2.a IS NULL;', false);
                Hits: 97  Misses: 3  Evictions: Zero  Overflows: 0  Memory Usage: NkB
                ->  Subquery Scan on t2 (actual rows=0.67 loops=N)
                      Filter: ((t1.a + 1) = t2.a)
-                     Rows Removed by Filter: 2
+                     Rows Removed In Executor by Filter: 2
                      ->  Unique (actual rows=2.67 loops=N)
                            ->  Sort (actual rows=67.33 loops=N)
                                  Sort Key: t2_1.a
diff --git a/src/test/regress/expected/merge.out b/src/test/regress/expected/merge.out
index cf2219df754..f8b9172df20 100644
--- a/src/test/regress/expected/merge.out
+++ b/src/test/regress/expected/merge.out
@@ -1801,7 +1801,7 @@ WHEN MATCHED AND t.a < 10 THEN
                Sort Method: quicksort  Memory: xxx
                ->  Seq Scan on ex_mtarget t (actual rows=0.00 loops=1)
                      Filter: (a < '-1000'::integer)
-                     Rows Removed by Filter: 54
+                     Rows Removed In Table AM by Filter: 54
          ->  Sort (never executed)
                Sort Key: s.a
                ->  Seq Scan on ex_msource s (never executed)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index d1966cd7d82..c633e7089ce 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2333,16 +2333,16 @@ explain (analyze, costs off, summary off, timing off, buffers off) select * from
  Append (actual rows=0.00 loops=1)
    ->  Seq Scan on list_part1 list_part_1 (actual rows=0.00 loops=1)
          Filter: (a = (list_part_fn(1) + a))
-         Rows Removed by Filter: 1
+         Rows Removed In Executor by Filter: 1
    ->  Seq Scan on list_part2 list_part_2 (actual rows=0.00 loops=1)
          Filter: (a = (list_part_fn(1) + a))
-         Rows Removed by Filter: 1
+         Rows Removed In Executor by Filter: 1
    ->  Seq Scan on list_part3 list_part_3 (actual rows=0.00 loops=1)
          Filter: (a = (list_part_fn(1) + a))
-         Rows Removed by Filter: 1
+         Rows Removed In Executor by Filter: 1
    ->  Seq Scan on list_part4 list_part_4 (actual rows=0.00 loops=1)
          Filter: (a = (list_part_fn(1) + a))
-         Rows Removed by Filter: 1
+         Rows Removed In Executor by Filter: 1
 (13 rows)
 
 rollback;
@@ -2368,7 +2368,7 @@ begin
     loop
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+(?:\.\d+)? loops=\d+', 'actual rows=N loops=N');
-        ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Rows Removed In Executor by Filter: \d+', 'Rows Removed In Executor by Filter: N');
         perform regexp_matches(ln, 'Index Searches: \d+');
         if found then
           continue;
@@ -2610,7 +2610,7 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
-                           Rows Removed by Filter: N
+                           Rows Removed In Executor by Filter: N
                      ->  Append (actual rows=N loops=N)
                            ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
                                  Index Cond: (a = a.a)
@@ -2644,7 +2644,7 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
-                           Rows Removed by Filter: N
+                           Rows Removed In Executor by Filter: N
                      ->  Append (actual rows=N loops=N)
                            ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
                                  Index Cond: (a = a.a)
@@ -2866,7 +2866,7 @@ explain (analyze, costs off, summary off, timing off, buffers off) execute ab_q6
          Filter: ((a = $1) AND (b = (InitPlan 1).col1))
    ->  Seq Scan on xy_1 (actual rows=0.00 loops=1)
          Filter: ((x = $1) AND (y = (InitPlan 1).col1))
-         Rows Removed by Filter: 1
+         Rows Removed In Table AM by Filter: 1
    ->  Seq Scan on ab_a1_b1 ab_4 (never executed)
          Filter: ((a = $1) AND (b = (InitPlan 1).col1))
    ->  Seq Scan on ab_a1_b2 ab_5 (never executed)
@@ -3529,7 +3529,7 @@ select * from boolp where a = (select value from boolvalues where value);
    InitPlan 1
      ->  Seq Scan on boolvalues (actual rows=1.00 loops=1)
            Filter: value
-           Rows Removed by Filter: 1
+           Rows Removed In Executor by Filter: 1
    ->  Seq Scan on boolp_f boolp_1 (never executed)
          Filter: (a = (InitPlan 1).col1)
    ->  Seq Scan on boolp_t boolp_2 (actual rows=0.00 loops=1)
@@ -3544,7 +3544,7 @@ select * from boolp where a = (select value from boolvalues where not value);
    InitPlan 1
      ->  Seq Scan on boolvalues (actual rows=1.00 loops=1)
            Filter: (NOT value)
-           Rows Removed by Filter: 1
+           Rows Removed In Executor by Filter: 1
    ->  Seq Scan on boolp_f boolp_1 (actual rows=0.00 loops=1)
          Filter: (a = (InitPlan 1).col1)
    ->  Seq Scan on boolp_t boolp_2 (never executed)
@@ -3573,11 +3573,11 @@ explain (analyze, costs off, summary off, timing off, buffers off) execute mt_q1
    Subplans Removed: 1
    ->  Index Scan using ma_test_p2_b_idx on ma_test_p2 ma_test_1 (actual rows=1.00 loops=1)
          Filter: ((a >= $1) AND ((a % 10) = 5))
-         Rows Removed by Filter: 9
+         Rows Removed In Executor by Filter: 9
          Index Searches: 1
    ->  Index Scan using ma_test_p3_b_idx on ma_test_p3 ma_test_2 (actual rows=1.00 loops=1)
          Filter: ((a >= $1) AND ((a % 10) = 5))
-         Rows Removed by Filter: 9
+         Rows Removed In Executor by Filter: 9
          Index Searches: 1
 (11 rows)
 
@@ -3596,7 +3596,7 @@ explain (analyze, costs off, summary off, timing off, buffers off) execute mt_q1
    Subplans Removed: 2
    ->  Index Scan using ma_test_p3_b_idx on ma_test_p3 ma_test_1 (actual rows=1.00 loops=1)
          Filter: ((a >= $1) AND ((a % 10) = 5))
-         Rows Removed by Filter: 9
+         Rows Removed In Executor by Filter: 9
          Index Searches: 1
 (7 rows)
 
@@ -4096,7 +4096,7 @@ select * from listp where a = (select 2) and b <> 10;
  Seq Scan on listp1 listp (actual rows=0.00 loops=1)
    Filter: ((b <> 10) AND (a = (InitPlan 1).col1))
    InitPlan 1
-     ->  Result (never executed)
+     ->  Result (actual rows=1.00 loops=1)
 (4 rows)
 
 --
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 0185ef661b1..572337c7b77 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -589,13 +589,13 @@ explain (analyze, timing off, summary off, costs off, buffers off)
    ->  Nested Loop (actual rows=98000.00 loops=1)
          ->  Seq Scan on tenk2 (actual rows=10.00 loops=1)
                Filter: (thousand = 0)
-               Rows Removed by Filter: 9990
+               Rows Removed In Table AM by Filter: 9990
          ->  Gather (actual rows=9800.00 loops=10)
                Workers Planned: 4
                Workers Launched: 4
                ->  Parallel Seq Scan on tenk1 (actual rows=1960.00 loops=50)
                      Filter: (hundred > 1)
-                     Rows Removed by Filter: 40
+                     Rows Removed In Table AM by Filter: 40
 (11 rows)
 
 alter table tenk2 reset (parallel_workers);
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 095df0a670c..8d513926b3b 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2931,7 +2931,6 @@ $$
 LANGUAGE plpgsql STRICT IMMUTABLE LEAKPROOF;
 SELECT * FROM rw_view1 WHERE snoop(person);
 NOTICE:  snooped value: Tom
-NOTICE:  snooped value: Dick
 NOTICE:  snooped value: Harry
  person 
 --------
@@ -2941,10 +2940,8 @@ NOTICE:  snooped value: Harry
 
 UPDATE rw_view1 SET person=person WHERE snoop(person);
 NOTICE:  snooped value: Tom
-NOTICE:  snooped value: Dick
 NOTICE:  snooped value: Harry
 DELETE FROM rw_view1 WHERE NOT snoop(person);
-NOTICE:  snooped value: Dick
 NOTICE:  snooped value: Tom
 NOTICE:  snooped value: Harry
 ALTER VIEW rw_view1 SET (security_barrier = true);
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d93c0c03bab..b939d725e91 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -587,7 +587,7 @@ begin
     loop
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+(?:\.\d+)? loops=\d+', 'actual rows=N loops=N');
-        ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Rows Removed In Executor by Filter: \d+', 'Rows Removed In Executor by Filter: N');
         perform regexp_matches(ln, 'Index Searches: \d+');
         if found then
           continue;
-- 
2.39.5

v1-0003-Add-the-table-reloption-quals_push_down.patchtext/x-diff; charset=us-asciiDownload

From b8cf0b028d2202d503a39f18b62a106d0ad45906 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julien@tachoires.me>
Date: Mon, 25 Aug 2025 18:43:57 +0200
Subject: [PATCH 3/6] Add the table reloption quals_push_down

The reloption quals_push_down enables or disables qualifiers to ScanKey
transformation and push down to the table access method during table
scan execution.

The default value is off, making the Quals Push Down feature disabled by
default.
---
 .../postgres_fdw/expected/postgres_fdw.out    |  2 +-
 doc/src/sgml/ref/create_table.sgml            | 19 ++++++++++++++
 src/backend/access/common/reloptions.c        | 13 +++++++++-
 src/backend/executor/nodeSeqscan.c            | 21 ++++++++++-----
 src/bin/psql/tab-complete.in.c                |  1 +
 src/include/utils/rel.h                       |  9 +++++++
 src/test/isolation/expected/stats.out         | 26 +++++++++----------
 src/test/regress/expected/memoize.out         | 15 +++++------
 src/test/regress/expected/merge.out           |  2 +-
 src/test/regress/expected/partition_prune.out |  4 +--
 src/test/regress/expected/select_parallel.out |  4 +--
 src/test/regress/expected/updatable_views.out |  3 +++
 12 files changed, 84 insertions(+), 35 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index bc7242835df..c5e3761f648 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11930,7 +11930,7 @@ SELECT * FROM local_tbl, async_pt WHERE local_tbl.a = async_pt.a AND local_tbl.c
  Nested Loop (actual rows=1.00 loops=1)
    ->  Seq Scan on local_tbl (actual rows=1.00 loops=1)
          Filter: (c = 'bar'::text)
-         Rows Removed In Table AM by Filter: 1
+         Rows Removed In Executor by Filter: 1
    ->  Append (actual rows=1.00 loops=1)
          ->  Async Foreign Scan on async_p1 async_pt_1 (never executed)
          ->  Async Foreign Scan on async_p2 async_pt_2 (actual rows=1.00 loops=1)
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index dc000e913c1..7cc52852fc0 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1997,6 +1997,25 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     </listitem>
    </varlistentry>
 
+   <varlistentry id="reloption-quals-push-down" xreflabel="quals_push_down">
+    <term><literal>quals_push_down</literal> (<type>boolean</type>)
+    <indexterm>
+     <primary><varname>quals_push_down</varname> storage parameter</primary>
+    </indexterm>
+    </term>
+    <listitem>
+     <para>
+     Enables or disables qualifiers (<literal>WHERE</literal> clause) push
+     down to the table access method. When enabled, during table scan execution,
+     the table access method is able to apply early tuple filtering and returns
+     only the tuples satisfying the qualifiers. By default, this option is
+     disabled, then the table access method returns all the visible tuples and
+     let the query executor alone in charge of doing tuple filtering based
+     on the qualifiers.
+     </para>
+    </listitem>
+   </varlistentry>
+
    </variablelist>
 
   </refsect2>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 0af3fea68fa..dd57599b080 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -166,6 +166,15 @@ static relopt_bool boolRelOpts[] =
 		},
 		true
 	},
+	{
+		{
+			"quals_push_down",
+			"Enables query qualifiers push down to the table access method during table scan",
+			RELOPT_KIND_HEAP,
+			AccessExclusiveLock
+		},
+		false
+	},
 	/* list terminator */
 	{{NULL}}
 };
@@ -1915,7 +1924,9 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		{"vacuum_truncate", RELOPT_TYPE_BOOL,
 		offsetof(StdRdOptions, vacuum_truncate), offsetof(StdRdOptions, vacuum_truncate_set)},
 		{"vacuum_max_eager_freeze_failure_rate", RELOPT_TYPE_REAL,
-		offsetof(StdRdOptions, vacuum_max_eager_freeze_failure_rate)}
+		offsetof(StdRdOptions, vacuum_max_eager_freeze_failure_rate)},
+		{"quals_push_down", RELOPT_TYPE_BOOL,
+		offsetof(StdRdOptions, quals_push_down)}
 	};
 
 	return (bytea *) build_reloptions(reloptions, validate, kind,
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 0562377c42a..f134ff591c3 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -483,13 +483,20 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.qual =
 		ExecInitQual(node->scan.plan.qual, (PlanState *) scanstate);
 
-	/* Build sequential scan keys */
-	ExecSeqBuildScanKeys((PlanState *) scanstate,
-						 node->tablequal,
-						 &scanstate->sss_NumScanKeys,
-						 &scanstate->sss_ScanKeys,
-						 &scanstate->sss_RuntimeKeys,
-						 &scanstate->sss_NumRuntimeKeys);
+	/*
+	 * Build an push the ScanKeys only if the relation's reloption
+	 * quals_push_down is on.
+	 */
+	if (RelationGetQualsPushDown(scanstate->ss.ss_currentRelation))
+	{
+		/* Build sequential scan keys */
+		ExecSeqBuildScanKeys((PlanState *) scanstate,
+							 node->tablequal,
+							 &scanstate->sss_NumScanKeys,
+							 &scanstate->sss_ScanKeys,
+							 &scanstate->sss_RuntimeKeys,
+							 &scanstate->sss_NumRuntimeKeys);
+	}
 
 	/*
 	 * When EvalPlanQual() is not in use, assign ExecProcNode for this node
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 8b10f2313f3..0827679649b 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1412,6 +1412,7 @@ static const char *const table_storage_parameters[] = {
 	"fillfactor",
 	"log_autovacuum_min_duration",
 	"parallel_workers",
+	"quals_push_down",
 	"toast.autovacuum_enabled",
 	"toast.autovacuum_freeze_max_age",
 	"toast.autovacuum_freeze_min_age",
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index b552359915f..8907d53a4ca 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -354,6 +354,7 @@ typedef struct StdRdOptions
 	 * to freeze. 0 if disabled, -1 if unspecified.
 	 */
 	double		vacuum_max_eager_freeze_failure_rate;
+	bool		quals_push_down;	/* enable quals push down to the table AM */
 } StdRdOptions;
 
 #define HEAP_MIN_FILLFACTOR			10
@@ -409,6 +410,14 @@ typedef struct StdRdOptions
 	((relation)->rd_options ? \
 	 ((StdRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))
 
+/*
+ * RelationGetQualsPushDown
+ *		Returns the relation's quals_push_down reloption setting.
+ */
+#define RelationGetQualsPushDown(relation) \
+	((relation)->rd_options ? \
+	 ((StdRdOptions *) (relation)->rd_options)->quals_push_down : false)
+
 /* ViewOptions->check_option values */
 typedef enum ViewOptCheckOption
 {
diff --git a/src/test/isolation/expected/stats.out b/src/test/isolation/expected/stats.out
index 0064c0c8df0..8c7fe60217e 100644
--- a/src/test/isolation/expected/stats.out
+++ b/src/test/isolation/expected/stats.out
@@ -2414,7 +2414,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       3|           5|        1|        1|        0|         1|         1|           0
+       3|           6|        1|        1|        0|         1|         1|           0
 (1 row)
 
 
@@ -2476,7 +2476,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       3|           4|        2|        0|        1|         1|         1|           0
+       3|           5|        2|        0|        1|         1|         1|           0
 (1 row)
 
 step s1_table_select: SELECT * FROM test_stat_tab ORDER BY key, value;
@@ -2508,7 +2508,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       5|           7|        2|        1|        1|         1|         2|           0
+       5|           9|        2|        1|        1|         1|         2|           0
 (1 row)
 
 
@@ -2571,7 +2571,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       9|          13|        4|        5|        1|         3|         6|           0
+       9|          31|        4|        5|        1|         3|         6|           0
 (1 row)
 
 
@@ -2640,7 +2640,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       9|          13|        4|        5|        1|         3|         6|           0
+       9|          31|        4|        5|        1|         3|         6|           0
 (1 row)
 
 
@@ -2701,7 +2701,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       9|          11|        4|        5|        1|         1|         8|           0
+       9|          29|        4|        5|        1|         1|         8|           0
 (1 row)
 
 
@@ -2768,7 +2768,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       9|          11|        4|        5|        1|         1|         8|           0
+       9|          29|        4|        5|        1|         1|         8|           0
 (1 row)
 
 
@@ -2808,7 +2808,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       3|           3|        5|        1|        0|         1|         1|           0
+       3|           9|        5|        1|        0|         1|         1|           0
 (1 row)
 
 
@@ -2854,7 +2854,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       3|           3|        5|        1|        0|         1|         1|           0
+       3|           9|        5|        1|        0|         1|         1|           0
 (1 row)
 
 
@@ -2894,7 +2894,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       3|           3|        4|        2|        0|         4|         2|           0
+       3|           9|        4|        2|        0|         4|         2|           0
 (1 row)
 
 
@@ -2940,7 +2940,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       3|           3|        4|        2|        0|         4|         2|           0
+       3|           9|        4|        2|        0|         4|         2|           0
 (1 row)
 
 
@@ -2981,7 +2981,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       4|           4|        5|        3|        1|         4|         4|           0
+       4|          16|        5|        3|        1|         4|         4|           0
 (1 row)
 
 
@@ -3028,7 +3028,7 @@ step s1_table_stats:
 
 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       4|           4|        5|        3|        1|         4|         4|           0
+       4|          16|        5|        3|        1|         4|         4|           0
 (1 row)
 
 
diff --git a/src/test/regress/expected/memoize.out b/src/test/regress/expected/memoize.out
index f2a92d1fdfd..4af6bb4ce0f 100644
--- a/src/test/regress/expected/memoize.out
+++ b/src/test/regress/expected/memoize.out
@@ -43,7 +43,7 @@ WHERE t2.unique1 < 1000;', false);
    ->  Nested Loop (actual rows=1000.00 loops=N)
          ->  Seq Scan on tenk1 t2 (actual rows=1000.00 loops=N)
                Filter: (unique1 < 1000)
-               Rows Removed In Table AM by Filter: 9000
+               Rows Removed In Executor by Filter: 9000
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: t2.twenty
                Cache Mode: logical
@@ -75,7 +75,7 @@ WHERE t1.unique1 < 1000;', false);
    ->  Nested Loop (actual rows=1000.00 loops=N)
          ->  Seq Scan on tenk1 t1 (actual rows=1000.00 loops=N)
                Filter: (unique1 < 1000)
-               Rows Removed In Table AM by Filter: 9000
+               Rows Removed In Executor by Filter: 9000
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: t1.twenty
                Cache Mode: binary
@@ -146,7 +146,7 @@ WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false);
    ->  Nested Loop (actual rows=1000.00 loops=N)
          ->  Seq Scan on tenk1 t1 (actual rows=1000.00 loops=N)
                Filter: (unique1 < 1000)
-               Rows Removed In Table AM by Filter: 9000
+               Rows Removed In Executor by Filter: 9000
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: (t1.two + 1)
                Cache Mode: binary
@@ -179,16 +179,15 @@ WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false);
    ->  Nested Loop (actual rows=1000.00 loops=N)
          ->  Seq Scan on tenk1 t1 (actual rows=1000.00 loops=N)
                Filter: (unique1 < 1000)
-               Rows Removed In Table AM by Filter: 9000
+               Rows Removed In Executor by Filter: 9000
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: t1.two, t1.twenty
                Cache Mode: binary
                Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
                ->  Seq Scan on tenk1 t2 (actual rows=1.00 loops=N)
                      Filter: ((t1.twenty = unique1) AND (t1.two = two))
-                     Rows Removed In Table AM by Filter: 5000
-                     Rows Removed In Executor by Filter: 4999
-(13 rows)
+                     Rows Removed In Executor by Filter: 9999
+(12 rows)
 
 -- And check we get the expected results.
 SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
@@ -247,7 +246,7 @@ WHERE t2.unique1 < 1200;', true);
    ->  Nested Loop (actual rows=1200.00 loops=N)
          ->  Seq Scan on tenk1 t2 (actual rows=1200.00 loops=N)
                Filter: (unique1 < 1200)
-               Rows Removed In Table AM by Filter: 8800
+               Rows Removed In Executor by Filter: 8800
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: t2.thousand
                Cache Mode: logical
diff --git a/src/test/regress/expected/merge.out b/src/test/regress/expected/merge.out
index f8b9172df20..3029bb6ba10 100644
--- a/src/test/regress/expected/merge.out
+++ b/src/test/regress/expected/merge.out
@@ -1801,7 +1801,7 @@ WHEN MATCHED AND t.a < 10 THEN
                Sort Method: quicksort  Memory: xxx
                ->  Seq Scan on ex_mtarget t (actual rows=0.00 loops=1)
                      Filter: (a < '-1000'::integer)
-                     Rows Removed In Table AM by Filter: 54
+                     Rows Removed In Executor by Filter: 54
          ->  Sort (never executed)
                Sort Key: s.a
                ->  Seq Scan on ex_msource s (never executed)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index c633e7089ce..dbbd7b05e11 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2866,7 +2866,7 @@ explain (analyze, costs off, summary off, timing off, buffers off) execute ab_q6
          Filter: ((a = $1) AND (b = (InitPlan 1).col1))
    ->  Seq Scan on xy_1 (actual rows=0.00 loops=1)
          Filter: ((x = $1) AND (y = (InitPlan 1).col1))
-         Rows Removed In Table AM by Filter: 1
+         Rows Removed In Executor by Filter: 1
    ->  Seq Scan on ab_a1_b1 ab_4 (never executed)
          Filter: ((a = $1) AND (b = (InitPlan 1).col1))
    ->  Seq Scan on ab_a1_b2 ab_5 (never executed)
@@ -4096,7 +4096,7 @@ select * from listp where a = (select 2) and b <> 10;
  Seq Scan on listp1 listp (actual rows=0.00 loops=1)
    Filter: ((b <> 10) AND (a = (InitPlan 1).col1))
    InitPlan 1
-     ->  Result (actual rows=1.00 loops=1)
+     ->  Result (never executed)
 (4 rows)
 
 --
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 572337c7b77..b1e6f5681ac 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -589,13 +589,13 @@ explain (analyze, timing off, summary off, costs off, buffers off)
    ->  Nested Loop (actual rows=98000.00 loops=1)
          ->  Seq Scan on tenk2 (actual rows=10.00 loops=1)
                Filter: (thousand = 0)
-               Rows Removed In Table AM by Filter: 9990
+               Rows Removed In Executor by Filter: 9990
          ->  Gather (actual rows=9800.00 loops=10)
                Workers Planned: 4
                Workers Launched: 4
                ->  Parallel Seq Scan on tenk1 (actual rows=1960.00 loops=50)
                      Filter: (hundred > 1)
-                     Rows Removed In Table AM by Filter: 40
+                     Rows Removed In Executor by Filter: 40
 (11 rows)
 
 alter table tenk2 reset (parallel_workers);
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 8d513926b3b..095df0a670c 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2931,6 +2931,7 @@ $$
 LANGUAGE plpgsql STRICT IMMUTABLE LEAKPROOF;
 SELECT * FROM rw_view1 WHERE snoop(person);
 NOTICE:  snooped value: Tom
+NOTICE:  snooped value: Dick
 NOTICE:  snooped value: Harry
  person 
 --------
@@ -2940,8 +2941,10 @@ NOTICE:  snooped value: Harry
 
 UPDATE rw_view1 SET person=person WHERE snoop(person);
 NOTICE:  snooped value: Tom
+NOTICE:  snooped value: Dick
 NOTICE:  snooped value: Harry
 DELETE FROM rw_view1 WHERE NOT snoop(person);
+NOTICE:  snooped value: Dick
 NOTICE:  snooped value: Tom
 NOTICE:  snooped value: Harry
 ALTER VIEW rw_view1 SET (security_barrier = true);
-- 
2.39.5

v1-0004-Add-tests-for-quals-push-down-to-table-AM.patchtext/x-diff; charset=us-asciiDownload

From 60abe9a8a9aec7bb618d1f7113941a7f4b65e164 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julien@tachoires.me>
Date: Mon, 25 Aug 2025 18:46:06 +0200
Subject: [PATCH 4/6] Add tests for quals push down to table AM

With the help of the EXPLAIN command, we check if the rows are filtered
out by the executor or by the table AM. We also make sure that by
default, quals are not pushed down.
---
 src/test/regress/expected/qual_pushdown.out | 253 ++++++++++++++++++++
 src/test/regress/parallel_schedule          |   2 +-
 src/test/regress/sql/qual_pushdown.sql      |  48 ++++
 3 files changed, 302 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/qual_pushdown.out
 create mode 100644 src/test/regress/sql/qual_pushdown.sql

diff --git a/src/test/regress/expected/qual_pushdown.out b/src/test/regress/expected/qual_pushdown.out
new file mode 100644
index 00000000000..5b43553c945
--- /dev/null
+++ b/src/test/regress/expected/qual_pushdown.out
@@ -0,0 +1,253 @@
+DROP TABLE IF EXISTS qa;
+NOTICE:  table "qa" does not exist, skipping
+DROP TABLE IF EXISTS qb;
+NOTICE:  table "qb" does not exist, skipping
+CREATE TABLE qa (i INTEGER, ii INTEGER);
+CREATE TABLE qb (j INTEGER);
+INSERT INTO qa SELECT n, n * n  FROM generate_series(1, 1000) as n;
+INSERT INTO qb SELECT n FROM generate_series(1, 1000) as n;
+ANALYZE qa;
+ANALYZE qb;
+-- By default, the quals are not pushed down. The tuples are filtered out by
+-- the executor.
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = 100;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (i = 100)
+   Rows Removed In Executor by Filter: 999
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i < 10;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=9.00 loops=1)
+   Filter: (i < 10)
+   Rows Removed In Executor by Filter: 991
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 100 = i;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (100 = i)
+   Rows Removed In Executor by Filter: 999
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 10 > i;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=9.00 loops=1)
+   Filter: (10 > i)
+   Rows Removed In Executor by Filter: 991
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = SQRT(25)::INT;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (i = 5)
+   Rows Removed In Executor by Filter: 999
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT 100);
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (i = (InitPlan 1).col1)
+   Rows Removed In Executor by Filter: 999
+   InitPlan 1
+     ->  Result (actual rows=1.00 loops=1)
+(5 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT SQRT(j)::INT FROM qb WHERE j = 100);
+                    QUERY PLAN                     
+---------------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (i = (InitPlan 1).col1)
+   Rows Removed In Executor by Filter: 999
+   InitPlan 1
+     ->  Seq Scan on qb (actual rows=1.00 loops=1)
+           Filter: (j = 100)
+           Rows Removed In Executor by Filter: 999
+(7 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Nested Loop (actual rows=1.00 loops=1)
+   ->  Seq Scan on qa (actual rows=1.00 loops=1)
+         Filter: (i = 100)
+         Rows Removed In Executor by Filter: 999
+   ->  Seq Scan on qb (actual rows=1.00 loops=1)
+         Filter: (j = 100)
+         Rows Removed In Executor by Filter: 999
+(7 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ii AND ii < 10;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: ((ii < 10) AND (i = ii))
+   Rows Removed In Executor by Filter: 999
+(3 rows)
+
+-- Enable quals push down
+ALTER TABLE qa SET (quals_push_down=on);
+ALTER TABLE qb SET (quals_push_down=on);
+-- Now, we expect to see the tuples being filtered out by the table AM
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = 100;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (i = 100)
+   Rows Removed In Table AM by Filter: 999
+(3 rows)
+
+SELECT ii FROM qa WHERE i = 100;
+  ii   
+-------
+ 10000
+(1 row)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i < 10;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=9.00 loops=1)
+   Filter: (i < 10)
+   Rows Removed In Table AM by Filter: 991
+(3 rows)
+
+SELECT ii FROM qa WHERE i < 10;
+ ii 
+----
+  1
+  4
+  9
+ 16
+ 25
+ 36
+ 49
+ 64
+ 81
+(9 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 100 = i;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (100 = i)
+   Rows Removed In Table AM by Filter: 999
+(3 rows)
+
+SELECT ii FROM qa WHERE 100 = i;
+  ii   
+-------
+ 10000
+(1 row)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 10 > i;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=9.00 loops=1)
+   Filter: (10 > i)
+   Rows Removed In Table AM by Filter: 991
+(3 rows)
+
+SELECT ii FROM qa WHERE 10 > i;
+ ii 
+----
+  1
+  4
+  9
+ 16
+ 25
+ 36
+ 49
+ 64
+ 81
+(9 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = SQRT(25)::INT;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (i = 5)
+   Rows Removed In Table AM by Filter: 999
+(3 rows)
+
+SELECT ii FROM qa WHERE i = SQRT(25)::INT;
+ ii 
+----
+ 25
+(1 row)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT 100);
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (i = (InitPlan 1).col1)
+   Rows Removed In Table AM by Filter: 999
+   InitPlan 1
+     ->  Result (actual rows=1.00 loops=1)
+(5 rows)
+
+SELECT ii FROM qa WHERE i = (SELECT 100);
+  ii   
+-------
+ 10000
+(1 row)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT SQRT(j)::INT FROM qb WHERE j = 100);
+                    QUERY PLAN                     
+---------------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (i = (InitPlan 1).col1)
+   Rows Removed In Table AM by Filter: 999
+   InitPlan 1
+     ->  Seq Scan on qb (actual rows=1.00 loops=1)
+           Filter: (j = 100)
+           Rows Removed In Table AM by Filter: 999
+(7 rows)
+
+SELECT ii FROM qa WHERE i = (SELECT SQRT(j)::INT FROM qb WHERE j = 100);
+ ii  
+-----
+ 100
+(1 row)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Nested Loop (actual rows=1.00 loops=1)
+   ->  Seq Scan on qa (actual rows=1.00 loops=1)
+         Filter: (i = 100)
+         Rows Removed In Table AM by Filter: 999
+   ->  Seq Scan on qb (actual rows=1.00 loops=1)
+         Filter: (j = 100)
+         Rows Removed In Table AM by Filter: 999
+(7 rows)
+
+SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
+  ii   
+-------
+ 10000
+(1 row)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ii AND ii < 10;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: ((ii < 10) AND (i = ii))
+   Rows Removed In Table AM by Filter: 997
+   Rows Removed In Executor by Filter: 2
+(4 rows)
+
+SELECT ii FROM qa WHERE i = ii AND ii < 10;
+ ii 
+----
+  1
+(1 row)
+
+DROP TABLE IF EXISTS qa;
+DROP TABLE IF EXISTS qb;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..21291f38db3 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # The stats test resets stats, so nothing else needing stats access can be in
 # this group.
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa qual_pushdown
 
 # event_trigger depends on create_am and cannot run concurrently with
 # any test that runs DDL
diff --git a/src/test/regress/sql/qual_pushdown.sql b/src/test/regress/sql/qual_pushdown.sql
new file mode 100644
index 00000000000..0f0410cd1d5
--- /dev/null
+++ b/src/test/regress/sql/qual_pushdown.sql
@@ -0,0 +1,48 @@
+DROP TABLE IF EXISTS qa;
+DROP TABLE IF EXISTS qb;
+
+CREATE TABLE qa (i INTEGER, ii INTEGER);
+CREATE TABLE qb (j INTEGER);
+INSERT INTO qa SELECT n, n * n  FROM generate_series(1, 1000) as n;
+INSERT INTO qb SELECT n FROM generate_series(1, 1000) as n;
+ANALYZE qa;
+ANALYZE qb;
+
+-- By default, the quals are not pushed down. The tuples are filtered out by
+-- the executor.
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = 100;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i < 10;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 100 = i;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 10 > i;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = SQRT(25)::INT;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT 100);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT SQRT(j)::INT FROM qb WHERE j = 100);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ii AND ii < 10;
+
+-- Enable quals push down
+ALTER TABLE qa SET (quals_push_down=on);
+ALTER TABLE qb SET (quals_push_down=on);
+
+-- Now, we expect to see the tuples being filtered out by the table AM
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = 100;
+SELECT ii FROM qa WHERE i = 100;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i < 10;
+SELECT ii FROM qa WHERE i < 10;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 100 = i;
+SELECT ii FROM qa WHERE 100 = i;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 10 > i;
+SELECT ii FROM qa WHERE 10 > i;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = SQRT(25)::INT;
+SELECT ii FROM qa WHERE i = SQRT(25)::INT;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT 100);
+SELECT ii FROM qa WHERE i = (SELECT 100);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT SQRT(j)::INT FROM qb WHERE j = 100);
+SELECT ii FROM qa WHERE i = (SELECT SQRT(j)::INT FROM qb WHERE j = 100);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
+SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ii AND ii < 10;
+SELECT ii FROM qa WHERE i = ii AND ii < 10;
+
+DROP TABLE IF EXISTS qa;
+DROP TABLE IF EXISTS qb;
-- 
2.39.5

v1-0005-Push-down-IN-NOT-IN-array-quals-to-table-AMs.patchtext/x-diff; charset=us-asciiDownload

From c8c7d2460af8e3b2d580d1c44101319eb2f90646 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julien@tachoires.me>
Date: Mon, 25 Aug 2025 20:16:47 +0200
Subject: [PATCH 5/6] Push down IN/NOT IN <array> quals to table AMs

In order to allow table AMs to apply key filtering against scalar array
values, when a such qualifier is found then the executor is in charge of
collecting the required informations to later build a hash table. The
table AM is then able to create a simple hash table and use it to quickly
check existence or absence of the key in the given array in a O(1)
fashion.

The new structure ScanKeyHashInfoData is used to store the hash
informations that are passed to the table AM via the new ScanKey field:
sk_hashinfo
In case of Index scan, this field is set to NULL and unused.
---
 src/backend/access/common/scankey.c         |   9 +
 src/backend/access/heap/Makefile            |   1 +
 src/backend/access/heap/heapam.c            |   1 -
 src/backend/access/heap/heapam_valid.c      | 290 ++++++++++++++++++++
 src/backend/access/heap/meson.build         |   1 +
 src/backend/executor/nodeSeqscan.c          | 171 +++++++++++-
 src/backend/optimizer/plan/createplan.c     |  60 ++++
 src/include/access/heapam.h                 |   2 +
 src/include/access/skey.h                   |  34 +++
 src/include/access/valid.h                  |  58 ----
 src/test/regress/expected/qual_pushdown.out | 126 +++++++++
 src/test/regress/sql/qual_pushdown.sql      |  12 +
 12 files changed, 694 insertions(+), 71 deletions(-)
 create mode 100644 src/backend/access/heap/heapam_valid.c
 delete mode 100644 src/include/access/valid.h

diff --git a/src/backend/access/common/scankey.c b/src/backend/access/common/scankey.c
index 2d65ab02dd3..0d34bab755c 100644
--- a/src/backend/access/common/scankey.c
+++ b/src/backend/access/common/scankey.c
@@ -44,6 +44,7 @@ ScanKeyEntryInitialize(ScanKey entry,
 	entry->sk_subtype = subtype;
 	entry->sk_collation = collation;
 	entry->sk_argument = argument;
+	entry->sk_hashinfo = NULL;
 	if (RegProcedureIsValid(procedure))
 	{
 		fmgr_info(procedure, &entry->sk_func);
@@ -85,6 +86,7 @@ ScanKeyInit(ScanKey entry,
 	entry->sk_subtype = InvalidOid;
 	entry->sk_collation = C_COLLATION_OID;
 	entry->sk_argument = argument;
+	entry->sk_hashinfo = NULL;
 	fmgr_info(procedure, &entry->sk_func);
 }
 
@@ -113,5 +115,12 @@ ScanKeyEntryInitializeWithInfo(ScanKey entry,
 	entry->sk_subtype = subtype;
 	entry->sk_collation = collation;
 	entry->sk_argument = argument;
+	entry->sk_hashinfo = NULL;
 	fmgr_info_copy(&entry->sk_func, finfo, CurrentMemoryContext);
 }
+
+void
+ScanKeyEntrySetHashInfo(ScanKey entry, ScanKeyHashInfo hashinfo)
+{
+	entry->sk_hashinfo = hashinfo;
+}
diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile
index 394534172fa..b796a4ccdff 100644
--- a/src/backend/access/heap/Makefile
+++ b/src/backend/access/heap/Makefile
@@ -15,6 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = \
 	heapam.o \
 	heapam_handler.o \
+	heapam_valid.o \
 	heapam_visibility.o \
 	heapam_xlog.o \
 	heaptoast.o \
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 71d8e06d8dd..ad19804b5e1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -37,7 +37,6 @@
 #include "access/multixact.h"
 #include "access/subtrans.h"
 #include "access/syncscan.h"
-#include "access/valid.h"
 #include "access/visibilitymap.h"
 #include "access/xloginsert.h"
 #include "catalog/pg_database.h"
diff --git a/src/backend/access/heap/heapam_valid.c b/src/backend/access/heap/heapam_valid.c
new file mode 100644
index 00000000000..7261723e378
--- /dev/null
+++ b/src/backend/access/heap/heapam_valid.c
@@ -0,0 +1,290 @@
+/*-------------------------------------------------------------------------
+ *
+ * heapam_valid.c
+ *	  Heap tuple qualification validity definitions
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/heap/heapam_valid.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "utils/array.h"
+#include "utils/lsyscache.h"
+#include "access/heapam.h"
+#include "access/htup.h"
+#include "access/htup_details.h"
+#include "access/skey.h"
+#include "access/tupdesc.h"
+
+/*
+ * SearchArrayHashEntry - Hash table entry type used by SK_SEARCHARRAY
+ */
+typedef struct SearchArrayHashEntry
+{
+	Datum		key;
+	uint32		status;			/* hash status */
+	uint32		hash;			/* hash value (cached) */
+}			SearchArrayHashEntry;
+
+#define SH_PREFIX searcharray
+#define SH_ELEMENT_TYPE SearchArrayHashEntry
+#define SH_KEY_TYPE Datum
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static bool searcharray_hash_element_match(struct searcharray_hash *tb, Datum key1,
+										   Datum key2);
+static uint32 searcharray_element_hash(struct searcharray_hash *tb, Datum key);
+
+/*
+ * SearchArrayHashTable - Hash table for SK_SEARCHARRAY
+ */
+typedef struct SearchArrayHashTable
+{
+	searcharray_hash *tab;		/* underlying hash table */
+	FmgrInfo	hash_finfo;		/* hash function */
+	FunctionCallInfo hash_fcinfo;	/* arguments etc */
+	FmgrInfo	match_finfo;	/* comparison function */
+	FunctionCallInfo match_fcinfo;	/* arguments etc */
+	bool		has_nulls;
+}			SearchArrayHashTable;
+
+/* Define parameters for SearchArray hash table code generation. */
+#define SH_PREFIX searcharray
+#define SH_ELEMENT_TYPE SearchArrayHashEntry
+#define SH_KEY_TYPE Datum
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) searcharray_element_hash(tb, key)
+#define SH_EQUAL(tb, a, b) searcharray_hash_element_match(tb, a, b)
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * Hash function for scalar array hash op elements.
+ *
+ * We use the element type's default hash opclass, and the column collation
+ * if the type is collation-sensitive.
+ */
+static uint32
+searcharray_element_hash(struct searcharray_hash *tb, Datum key)
+{
+	SearchArrayHashTable *elements_tab = (SearchArrayHashTable *) tb->private_data;
+	FunctionCallInfo fcinfo = elements_tab->hash_fcinfo;
+	Datum		hash;
+
+	fcinfo->args[0].value = key;
+	fcinfo->args[0].isnull = false;
+
+	hash = elements_tab->hash_finfo.fn_addr(fcinfo);
+
+	return DatumGetUInt32(hash);
+}
+
+/*
+ * Matching function for scalar array hash op elements, to be used in hashtable
+ * lookups.
+ */
+static bool
+searcharray_hash_element_match(struct searcharray_hash *tb, Datum key1, Datum key2)
+{
+	Datum		result;
+
+	SearchArrayHashTable *elements_tab = (SearchArrayHashTable *) tb->private_data;
+	FunctionCallInfo fcinfo = elements_tab->match_fcinfo;
+
+	fcinfo->args[0].value = key1;
+	fcinfo->args[0].isnull = false;
+	fcinfo->args[1].value = key2;
+	fcinfo->args[1].isnull = false;
+
+	result = elements_tab->match_finfo.fn_addr(fcinfo);
+
+	return DatumGetBool(result);
+}
+
+/*
+ *		HeapKeyTest
+ *
+ *		Test a heap tuple to see if it satisfies a scan key.
+ */
+bool
+HeapKeyTest(HeapTuple tuple, TupleDesc tupdesc, int nkeys, ScanKey keys)
+{
+	int			cur_nkeys = nkeys;
+	ScanKey		cur_key = keys;
+
+	for (; cur_nkeys--; cur_key++)
+	{
+		Datum		atp;
+		bool		isnull;
+		Datum		test;
+
+		if (cur_key->sk_flags & SK_ISNULL)
+			return false;
+
+		atp = heap_getattr(tuple, cur_key->sk_attno, tupdesc, &isnull);
+
+		/* Case when the rightop was a scalar array */
+		if (cur_key->sk_flags & SK_SEARCHARRAY)
+		{
+			bool		hashfound;
+			ScanKeyHashInfo hashinfo = cur_key->sk_hashinfo;
+			SearchArrayHashTable *hashtab;
+
+			/*
+			 * Build the hash table on the first call if needed
+			 */
+			if (hashinfo->hashtab == NULL)
+			{
+				ArrayType  *arr;
+				int16		typlen;
+				bool		typbyval;
+				char		typalign;
+				int			nitems;
+				bool		has_nulls = false;
+				char	   *s;
+				bits8	   *bitmap;
+				int			bitmask;
+
+				arr = DatumGetArrayTypeP(cur_key->sk_argument);
+				nitems = ArrayGetNItems(ARR_NDIM(arr), ARR_DIMS(arr));
+
+				get_typlenbyvalalign(ARR_ELEMTYPE(arr),
+									 &typlen,
+									 &typbyval,
+									 &typalign);
+
+				hashtab = (SearchArrayHashTable *)
+					palloc0(sizeof(SearchArrayHashTable));
+
+				hashtab->hash_finfo = hashinfo->hash_finfo;
+				hashtab->match_finfo = hashinfo->match_finfo;
+				hashtab->hash_fcinfo = hashinfo->hash_fcinfo;
+				hashtab->match_fcinfo = hashinfo->match_fcinfo;
+
+				/*
+				 * Create the hash table sizing it according to the number of
+				 * elements in the array.  This does assume that the array has
+				 * no duplicates. If the array happens to contain many
+				 * duplicate values then it'll just mean that we sized the
+				 * table a bit on the large side.
+				 */
+				hashtab->tab = searcharray_create(CurrentMemoryContext,
+												  nitems,
+												  hashtab);
+
+
+				s = (char *) ARR_DATA_PTR(arr);
+				bitmap = ARR_NULLBITMAP(arr);
+				bitmask = 1;
+				for (int i = 0; i < nitems; i++)
+				{
+					/* Get array element, checking for NULL. */
+					if (bitmap && (*bitmap & bitmask) == 0)
+					{
+						has_nulls = true;
+					}
+					else
+					{
+						Datum		element;
+
+						element = fetch_att(s, typbyval, typlen);
+						s = att_addlength_pointer(s, typlen, s);
+						s = (char *) att_align_nominal(s, typalign);
+
+						searcharray_insert(hashtab->tab, element,
+										   &hashfound);
+					}
+
+					/* Advance bitmap pointer if any. */
+					if (bitmap)
+					{
+						bitmask <<= 1;
+						if (bitmask == 0x100)
+						{
+							bitmap++;
+							bitmask = 1;
+						}
+					}
+				}
+
+				/*
+				 * Remember if we had any nulls so that we know if we need to
+				 * execute non-strict functions with a null lhs value if no
+				 * match is found.
+				 */
+				hashtab->has_nulls = has_nulls;
+
+				/* Link the hash table to the current ScanKey */
+				hashinfo->hashtab = hashtab;
+			}
+			else
+				hashtab = (SearchArrayHashTable *) hashinfo->hashtab;
+
+			/* Check the hash to see if we have a match. */
+			hashfound = NULL != searcharray_lookup(hashtab->tab, atp);
+
+			/* IN case */
+			if (hashinfo->inclause && hashfound)
+				return true;
+			/* NOT IN case */
+			if (!hashinfo->inclause && !hashfound)
+				return true;
+
+			if (!hashfound && hashtab->has_nulls)
+			{
+				if (!hashtab->match_finfo.fn_strict)
+				{
+					Datum		result;
+
+					/*
+					 * Execute function will null rhs just once.
+					 */
+					hashtab->match_fcinfo->args[0].value = atp;
+					hashtab->match_fcinfo->args[0].isnull = isnull;
+					hashtab->match_fcinfo->args[1].value = (Datum) 0;
+					hashtab->match_fcinfo->args[1].isnull = true;
+
+					result = hashtab->match_finfo.fn_addr(hashtab->match_fcinfo);
+
+					/*
+					 * Reverse the result for NOT IN clauses since the above
+					 * function is the equality function and we need
+					 * not-equals.
+					 */
+					if (!hashinfo->inclause)
+						result = !result;
+
+					if (result)
+						return true;
+				}
+			}
+
+			return false;
+		}
+		else
+		{
+			if (isnull)
+				return false;
+
+			test = FunctionCall2Coll(&cur_key->sk_func,
+									 cur_key->sk_collation,
+									 atp, cur_key->sk_argument);
+
+			if (!DatumGetBool(test))
+				return false;
+		}
+	}
+
+	return true;
+}
diff --git a/src/backend/access/heap/meson.build b/src/backend/access/heap/meson.build
index 2637b24112f..2e23ca9a586 100644
--- a/src/backend/access/heap/meson.build
+++ b/src/backend/access/heap/meson.build
@@ -3,6 +3,7 @@
 backend_sources += files(
   'heapam.c',
   'heapam_handler.c',
+  'heapam_valid.c',
   'heapam_visibility.c',
   'heapam_xlog.c',
   'heaptoast.c',
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index f134ff591c3..1b181c8f254 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -96,16 +96,18 @@ ExecSeqBuildScanKeys(PlanState *planstate, List *quals, int *numScanKeys,
 		Expr	   *leftop;		/* expr on lhs of operator */
 		Expr	   *rightop;	/* expr on rhs ... */
 		AttrNumber	varattno;	/* att number used in scan */
+		int			flags = 0;
+		Datum		scanvalue;
+		Oid			collationid = InvalidOid;
+		ScanKeyHashInfo skeyhashinfo = NULL;
 
 		/*
 		 * Simple qual case: <leftop> <op> <rightop>
 		 */
 		if (IsA(clause, OpExpr))
 		{
-			int			flags = 0;
-			Datum		scanvalue;
-
 			opfuncid = ((OpExpr *) clause)->opfuncid;
+			collationid = ((OpExpr *) clause)->inputcollid;
 
 			/*
 			 * leftop and rightop are not relabeled and can be used as they
@@ -154,17 +156,149 @@ ExecSeqBuildScanKeys(PlanState *planstate, List *quals, int *numScanKeys,
 				n_runtime_keys++;
 				scanvalue = (Datum) 0;
 			}
+		}
+		/* <leftop> <op> ANY/ALL (array-expression) */
+		else if (IsA(clause, ScalarArrayOpExpr))
+		{
+			ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+			Oid			cmpfuncid;
+			Oid			hashfuncid;
+			Oid			negfuncid;
+
+			opfuncid = saop->opfuncid;
+			collationid = saop->inputcollid;
+
+			leftop = (Expr *) linitial(saop->args);
+			rightop = (Expr *) lsecond(saop->args);
+
+			varattno = ((Var *) leftop)->varattno;
+
+			flags |= SK_SEARCHARRAY;
+
+			if (IsA(rightop, Const))
+			{
+				/*
+				 * OK, simple constant comparison value
+				 */
+				scanvalue = ((Const *) rightop)->constvalue;
+				if (((Const *) rightop)->constisnull)
+					flags |= SK_ISNULL;
+			}
+			else
+			{
+				/* Need to treat this one as a run-time key */
+				if (n_runtime_keys >= max_runtime_keys)
+				{
+					if (max_runtime_keys == 0)
+					{
+						max_runtime_keys = 8;
+						runtime_keys = (SeqScanRuntimeKeyInfo *)
+							palloc(max_runtime_keys * sizeof(SeqScanRuntimeKeyInfo));
+					}
+					else
+					{
+						max_runtime_keys *= 2;
+						runtime_keys = (SeqScanRuntimeKeyInfo *)
+							repalloc(runtime_keys,
+									 max_runtime_keys * sizeof(SeqScanRuntimeKeyInfo));
+					}
+				}
+				runtime_keys[n_runtime_keys].scan_key = this_scan_key;
+				runtime_keys[n_runtime_keys].key_expr =
+					ExecInitExpr(rightop, planstate);
+				runtime_keys[n_runtime_keys].key_toastable =
+					TypeIsToastable(((Var *) leftop)->vartype);
+				n_runtime_keys++;
+				scanvalue = (Datum) 0;
+			}
+
+			hashfuncid = saop->hashfuncid;
+			negfuncid = saop->negfuncid;
+
+			/*
+			 * If there is no hash function attached to the expr., then we
+			 * need to force one.
+			 *
+			 * One reason why there is no hash function attached is that the
+			 * scalar array is too small. In this case, the executor assumes
+			 * that for small array, using hash table/functions does not worth
+			 * it. But in our case, we want to handle all arrays in the same
+			 * way, whatever the array size.
+			 *
+			 * Another reason is that the right op. is not a constant and
+			 * needs runtime evaluation.
+			 */
+			if (!OidIsValid(hashfuncid))
+			{
+				Oid			lefthashfunc;
+				Oid			righthashfunc;
+
+				if (saop->useOr)
+				{
+					if (get_op_hash_functions(saop->opno, &lefthashfunc, &righthashfunc) &&
+						lefthashfunc == righthashfunc)
+						hashfuncid = lefthashfunc;
+				}
+				else
+				{
+					Oid			negator = get_negator(saop->opno);
+
+					if (OidIsValid(negator) &&
+						get_op_hash_functions(negator, &lefthashfunc, &righthashfunc) &&
+						lefthashfunc == righthashfunc)
+					{
+						hashfuncid = lefthashfunc;
+						negfuncid = get_opcode(negator);
+					}
+				}
+			}
 
-			n_scan_keys++;
+			/*
+			 * If no hash function can be found, it means that we cannot use a
+			 * hash table to handle array search because the operator does not
+			 * support hashing.
+			 *
+			 * TODO: use an alternative to hash table in this case. For now,
+			 * we just ignore this qual and don't push it, so we let the
+			 * executor handle it for us.
+			 */
+			if (!OidIsValid(hashfuncid))
+				continue;
 
-			ScanKeyEntryInitialize(this_scan_key,
-								   flags,
-								   varattno,
-								   InvalidStrategy, /* no strategy */
-								   InvalidOid,	/* no subtype */
-								   ((OpExpr *) clause)->inputcollid,
-								   opfuncid,
-								   scanvalue);
+			/*
+			 * If we have a negative function set, let's use it as the
+			 * comparison function because we are in a NOT IN case.
+			 */
+			if (OidIsValid(negfuncid))
+				cmpfuncid = negfuncid;
+			else
+				cmpfuncid = saop->opfuncid;
+
+			skeyhashinfo = (ScanKeyHashInfo) palloc0(sizeof(ScanKeyHashInfoData));
+
+			/* IN or NOT IN */
+			skeyhashinfo->inclause = saop->useOr;
+			skeyhashinfo->hash_fcinfo = palloc0(SizeForFunctionCallInfo(1));
+			skeyhashinfo->match_fcinfo = palloc0(SizeForFunctionCallInfo(2));
+
+			fmgr_info(hashfuncid, &skeyhashinfo->hash_finfo);
+			fmgr_info_set_expr((Node *) saop, &skeyhashinfo->hash_finfo);
+			fmgr_info(cmpfuncid, &skeyhashinfo->match_finfo);
+			fmgr_info_set_expr((Node *) saop, &skeyhashinfo->match_finfo);
+
+			InitFunctionCallInfoData(*skeyhashinfo->hash_fcinfo,
+									 &skeyhashinfo->hash_finfo,
+									 1,
+									 saop->inputcollid,
+									 NULL,
+									 NULL);
+
+			InitFunctionCallInfoData(*skeyhashinfo->match_fcinfo,
+									 &skeyhashinfo->match_finfo,
+									 2,
+									 saop->inputcollid,
+									 NULL,
+									 NULL);
 		}
 		else
 		{
@@ -173,6 +307,19 @@ ExecSeqBuildScanKeys(PlanState *planstate, List *quals, int *numScanKeys,
 			 */
 			continue;
 		}
+
+		n_scan_keys++;
+
+		ScanKeyEntryInitialize(this_scan_key,
+							   flags,
+							   varattno,
+							   InvalidStrategy, /* no strategy */
+							   InvalidOid,	/* no subtype */
+							   collationid,
+							   opfuncid,
+							   scanvalue);
+
+		ScanKeyEntrySetHashInfo(this_scan_key, skeyhashinfo);
 	}
 
 	/*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index bb0856ac0bc..d301dc22661 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5369,6 +5369,66 @@ fix_tablequal_references(PlannerInfo *root, Path *best_path,
 					fixed_tablequals = lappend(fixed_tablequals, clause);
 					break;
 				}
+
+				/*
+				 * ScalarArrayOpExpr case: <leftop> <op> ANY(ARRAY(..))
+				 */
+			case T_ScalarArrayOpExpr:
+				{
+					ScalarArrayOpExpr *saopexpr = (ScalarArrayOpExpr *) clause;
+					Expr	   *leftop;
+					Expr	   *rightop;
+
+					leftop = (Expr *) get_leftop(clause);
+					rightop = (Expr *) get_rightop(clause);
+
+					if (leftop && IsA(leftop, RelabelType))
+						leftop = ((RelabelType *) leftop)->arg;
+
+					if (rightop && IsA(rightop, RelabelType))
+						rightop = ((RelabelType *) rightop)->arg;
+
+					if (leftop == NULL || rightop == NULL)
+						continue;
+
+					if (saopexpr->opno >= FirstNormalObjectId)
+						continue;
+
+					if (!get_func_leakproof(saopexpr->opfuncid))
+						continue;
+
+					if (IsA(rightop, Var) && !IsA(leftop, Var)
+						&& ((Var *) rightop)->varattno > 0)
+					{
+						Expr	   *tmpop = leftop;
+						Oid			commutator;
+
+						leftop = rightop;
+						rightop = tmpop;
+
+						commutator = get_commutator(saopexpr->opno);
+
+						if (OidIsValid(commutator))
+						{
+							saopexpr->opno = commutator;
+							saopexpr->opfuncid = get_opcode(saopexpr->opno);
+						}
+						else
+							continue;
+					}
+
+					if (!(IsA(leftop, Var) && ((Var *) leftop)->varattno > 0))
+						continue;
+
+					if (!check_tablequal_rightop(rightop))
+						continue;
+
+					list_free(saopexpr->args);
+					saopexpr->args = list_make2(leftop, rightop);
+
+					fixed_tablequals = lappend(fixed_tablequals, clause);
+					break;
+				}
 			default:
 				continue;
 		}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 252f5e661c1..091aa1ff11a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -429,6 +429,8 @@ extern bool ResolveCminCmaxDuringDecoding(struct HTAB *tuplecid_data,
 extern void HeapCheckForSerializableConflictOut(bool visible, Relation relation, HeapTuple tuple,
 												Buffer buffer, Snapshot snapshot);
 
+extern bool HeapKeyTest(HeapTuple tuple, TupleDesc tupdesc, int nkeys, ScanKey keys);
+
 /*
  * heap_execute_freeze_tuple
  *		Execute the prepared freezing of a tuple with caller's freeze plan.
diff --git a/src/include/access/skey.h b/src/include/access/skey.h
index e650c2e7baf..0f5a556df4c 100644
--- a/src/include/access/skey.h
+++ b/src/include/access/skey.h
@@ -18,6 +18,38 @@
 #include "access/stratnum.h"
 #include "fmgr.h"
 
+/*
+ * A ScanKeyHashInfoData contains the necessary informations required to apply
+ * tuple filtering in table/heap scan when the condition is "column op
+ * ANY(ARRAY[...])".
+ *
+ * This structure is only used when pushing down quals to the Table Access
+ * Method layer in a table/heap scan context. In this case, the Table AM can
+ * use it to filter out tuples, based on a hash table.
+ *
+ * hashtab is a void pointer that will be used to store the actual reference
+ * to the hash table that will be created later during table scan.
+ *
+ * inclause indicates if the IN clause is involved. If not, then, the NOT IN
+ * clause is.
+ *
+ * hash_finfo and hash_fcinfo define the function and function call in charge
+ * of hashing a value.
+ *
+ * match_finfo and match_fcinfo define the function and function call in charge
+ * of making the comparison between two hashed values.
+ */
+typedef struct ScanKeyHashInfoData
+{
+	void	   *hashtab;
+	bool		inclause;
+	FmgrInfo	hash_finfo;
+	FmgrInfo	match_finfo;
+	FunctionCallInfo hash_fcinfo;
+	FunctionCallInfo match_fcinfo;
+}			ScanKeyHashInfoData;
+
+typedef ScanKeyHashInfoData * ScanKeyHashInfo;
 
 /*
  * A ScanKey represents the application of a comparison operator between
@@ -70,6 +102,7 @@ typedef struct ScanKeyData
 	Oid			sk_collation;	/* collation to use, if needed */
 	FmgrInfo	sk_func;		/* lookup info for function to call */
 	Datum		sk_argument;	/* data to compare */
+	ScanKeyHashInfo sk_hashinfo;	/* hash table informations */
 } ScanKeyData;
 
 typedef ScanKeyData *ScanKey;
@@ -147,5 +180,6 @@ extern void ScanKeyEntryInitializeWithInfo(ScanKey entry,
 										   Oid collation,
 										   FmgrInfo *finfo,
 										   Datum argument);
+extern void ScanKeyEntrySetHashInfo(ScanKey entry, ScanKeyHashInfo hashinfo);
 
 #endif							/* SKEY_H */
diff --git a/src/include/access/valid.h b/src/include/access/valid.h
deleted file mode 100644
index 8b33089dac4..00000000000
--- a/src/include/access/valid.h
+++ /dev/null
@@ -1,58 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * valid.h
- *	  POSTGRES tuple qualification validity definitions.
- *
- *
- * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
- * Portions Copyright (c) 1994, Regents of the University of California
- *
- * src/include/access/valid.h
- *
- *-------------------------------------------------------------------------
- */
-#ifndef VALID_H
-#define VALID_H
-
-#include "access/htup.h"
-#include "access/htup_details.h"
-#include "access/skey.h"
-#include "access/tupdesc.h"
-
-/*
- *		HeapKeyTest
- *
- *		Test a heap tuple to see if it satisfies a scan key.
- */
-static inline bool
-HeapKeyTest(HeapTuple tuple, TupleDesc tupdesc, int nkeys, ScanKey keys)
-{
-	int			cur_nkeys = nkeys;
-	ScanKey		cur_key = keys;
-
-	for (; cur_nkeys--; cur_key++)
-	{
-		Datum		atp;
-		bool		isnull;
-		Datum		test;
-
-		if (cur_key->sk_flags & SK_ISNULL)
-			return false;
-
-		atp = heap_getattr(tuple, cur_key->sk_attno, tupdesc, &isnull);
-
-		if (isnull)
-			return false;
-
-		test = FunctionCall2Coll(&cur_key->sk_func,
-								 cur_key->sk_collation,
-								 atp, cur_key->sk_argument);
-
-		if (!DatumGetBool(test))
-			return false;
-	}
-
-	return true;
-}
-
-#endif							/* VALID_H */
diff --git a/src/test/regress/expected/qual_pushdown.out b/src/test/regress/expected/qual_pushdown.out
index 5b43553c945..965102b146c 100644
--- a/src/test/regress/expected/qual_pushdown.out
+++ b/src/test/regress/expected/qual_pushdown.out
@@ -92,6 +92,43 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
    Rows Removed In Executor by Filter: 999
 (3 rows)
 
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]);
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Seq Scan on qa (actual rows=10.00 loops=1)
+   Filter: (i = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
+   Rows Removed In Executor by Filter: 990
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=2.00 loops=1)
+   Filter: (i = ANY ('{1,2}'::integer[]))
+   Rows Removed In Executor by Filter: 998
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Seq Scan on qa (actual rows=10.00 loops=1)
+   Filter: (i = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
+   Rows Removed In Executor by Filter: 990
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::int[]);
+                        QUERY PLAN                        
+----------------------------------------------------------
+ Seq Scan on qa (actual rows=10.00 loops=1)
+   Filter: (i = ANY ((InitPlan 1).col1))
+   Rows Removed In Executor by Filter: 990
+   InitPlan 1
+     ->  Aggregate (actual rows=1.00 loops=1)
+           ->  Seq Scan on qb (actual rows=10.00 loops=1)
+                 Filter: ((j > 50) AND (j <= 60))
+                 Rows Removed In Executor by Filter: 990
+(8 rows)
+
 -- Enable quals push down
 ALTER TABLE qa SET (quals_push_down=on);
 ALTER TABLE qb SET (quals_push_down=on);
@@ -249,5 +286,94 @@ SELECT ii FROM qa WHERE i = ii AND ii < 10;
   1
 (1 row)
 
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]);
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Seq Scan on qa (actual rows=10.00 loops=1)
+   Filter: (i = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
+   Rows Removed In Table AM by Filter: 990
+(3 rows)
+
+SELECT ii FROM qa WHERE i = ANY('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]);
+ ii  
+-----
+   1
+   4
+   9
+  16
+  25
+  36
+  49
+  64
+  81
+ 100
+(10 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=2.00 loops=1)
+   Filter: (i = ANY ('{1,2}'::integer[]))
+   Rows Removed In Table AM by Filter: 998
+(3 rows)
+
+SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
+ ii 
+----
+  1
+  4
+(2 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Seq Scan on qa (actual rows=10.00 loops=1)
+   Filter: (i = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
+   Rows Removed In Table AM by Filter: 990
+(3 rows)
+
+SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
+ ii  
+-----
+   1
+   4
+   9
+  16
+  25
+  36
+  49
+  64
+  81
+ 100
+(10 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::INT[]);
+                        QUERY PLAN                        
+----------------------------------------------------------
+ Seq Scan on qa (actual rows=10.00 loops=1)
+   Filter: (i = ANY ((InitPlan 1).col1))
+   Rows Removed In Table AM by Filter: 990
+   InitPlan 1
+     ->  Aggregate (actual rows=1.00 loops=1)
+           ->  Seq Scan on qb (actual rows=10.00 loops=1)
+                 Filter: ((j > 50) AND (j <= 60))
+                 Rows Removed In Table AM by Filter: 990
+(8 rows)
+
+SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::INT[]);
+  ii  
+------
+ 2601
+ 2704
+ 2809
+ 2916
+ 3025
+ 3136
+ 3249
+ 3364
+ 3481
+ 3600
+(10 rows)
+
 DROP TABLE IF EXISTS qa;
 DROP TABLE IF EXISTS qb;
diff --git a/src/test/regress/sql/qual_pushdown.sql b/src/test/regress/sql/qual_pushdown.sql
index 0f0410cd1d5..38e88a50c33 100644
--- a/src/test/regress/sql/qual_pushdown.sql
+++ b/src/test/regress/sql/qual_pushdown.sql
@@ -19,6 +19,10 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT SQRT(j)::INT FROM qb WHERE j = 100);
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ii AND ii < 10;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::int[]);
 
 -- Enable quals push down
 ALTER TABLE qa SET (quals_push_down=on);
@@ -43,6 +47,14 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ii AND ii < 10;
 SELECT ii FROM qa WHERE i = ii AND ii < 10;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]);
+SELECT ii FROM qa WHERE i = ANY('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
+SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
+SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::INT[]);
+SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::INT[]);
 
 DROP TABLE IF EXISTS qa;
 DROP TABLE IF EXISTS qb;
-- 
2.39.5

v1-0006-Push-down-IS-IS-NOT-NULL-quals-to-table-AMs.patchtext/x-diff; charset=us-asciiDownload

From fbce0efb501feb2b85a68e6da789ed619641a12b Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julien@tachoires.me>
Date: Tue, 26 Aug 2025 08:49:59 +0200
Subject: [PATCH 6/6] Push down IS/IS NOT NULL quals to table AMs

This commit adds suuport for IS/IS NOT NULL where clauses push down
to table AMs during table scan.
---
 src/backend/access/heap/heapam_valid.c      |  16 ++-
 src/backend/executor/nodeSeqscan.c          |  77 +++++++++--
 src/backend/optimizer/plan/createplan.c     |  30 +++++
 src/test/regress/expected/qual_pushdown.out | 141 +++++++++++++-------
 src/test/regress/sql/qual_pushdown.sql      |   7 +
 5 files changed, 209 insertions(+), 62 deletions(-)

diff --git a/src/backend/access/heap/heapam_valid.c b/src/backend/access/heap/heapam_valid.c
index 7261723e378..a05738a9144 100644
--- a/src/backend/access/heap/heapam_valid.c
+++ b/src/backend/access/heap/heapam_valid.c
@@ -129,7 +129,12 @@ HeapKeyTest(HeapTuple tuple, TupleDesc tupdesc, int nkeys, ScanKey keys)
 		bool		isnull;
 		Datum		test;
 
-		if (cur_key->sk_flags & SK_ISNULL)
+		/*
+		 * When the SK_ISNULL flag is set but we are not handling the IS/IS
+		 * NOT NULL case
+		 */
+		if ((cur_key->sk_flags & SK_ISNULL)
+			&& !(cur_key->sk_flags & (SK_SEARCHNULL | SK_SEARCHNOTNULL)))
 			return false;
 
 		atp = heap_getattr(tuple, cur_key->sk_attno, tupdesc, &isnull);
@@ -272,6 +277,15 @@ HeapKeyTest(HeapTuple tuple, TupleDesc tupdesc, int nkeys, ScanKey keys)
 
 			return false;
 		}
+		/* IS/IS NOT NULL case */
+		else if ((cur_key->sk_flags & SK_ISNULL)
+				 && (cur_key->sk_flags & (SK_SEARCHNULL | SK_SEARCHNOTNULL)))
+		{
+			if ((cur_key->sk_flags & SK_SEARCHNULL) && !isnull)
+				return false;
+			if ((cur_key->sk_flags & SK_SEARCHNOTNULL) && isnull)
+				return false;
+		}
 		else
 		{
 			if (isnull)
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 1b181c8f254..210d4cb84e0 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -156,6 +156,18 @@ ExecSeqBuildScanKeys(PlanState *planstate, List *quals, int *numScanKeys,
 				n_runtime_keys++;
 				scanvalue = (Datum) 0;
 			}
+
+			n_scan_keys++;
+
+			ScanKeyEntryInitialize(this_scan_key,
+								   flags,
+								   varattno,
+								   InvalidStrategy, /* no strategy */
+								   InvalidOid,	/* no subtype */
+								   collationid,
+								   opfuncid,
+								   scanvalue);
+
 		}
 		/* <leftop> <op> ANY/ALL (array-expression) */
 		else if (IsA(clause, ScalarArrayOpExpr))
@@ -299,6 +311,58 @@ ExecSeqBuildScanKeys(PlanState *planstate, List *quals, int *numScanKeys,
 									 saop->inputcollid,
 									 NULL,
 									 NULL);
+
+			n_scan_keys++;
+
+			ScanKeyEntryInitialize(this_scan_key,
+								   flags,
+								   varattno,
+								   InvalidStrategy, /* no strategy */
+								   InvalidOid,	/* no subtype */
+								   collationid,
+								   opfuncid,
+								   scanvalue);
+
+			ScanKeyEntrySetHashInfo(this_scan_key, skeyhashinfo);
+
+		}
+		/* <leftop> IS/IS NOT NULL */
+		else if (IsA(clause, NullTest))
+		{
+			NullTest   *ntest = (NullTest *) clause;
+
+			leftop = ntest->arg;
+			collationid = InvalidOid;
+
+			varattno = ((Var *) leftop)->varattno;
+
+			/*
+			 * initialize the scan key's fields appropriately
+			 */
+			switch (ntest->nulltesttype)
+			{
+				case IS_NULL:
+					flags = SK_ISNULL | SK_SEARCHNULL;
+					break;
+				case IS_NOT_NULL:
+					flags = SK_ISNULL | SK_SEARCHNOTNULL;
+					break;
+				default:
+					elog(ERROR, "unrecognized nulltesttype: %d",
+						 (int) ntest->nulltesttype);
+					break;
+			}
+
+			n_scan_keys++;
+
+			ScanKeyEntryInitialize(this_scan_key,
+								   flags,
+								   varattno,
+								   InvalidStrategy, /* no strategy */
+								   InvalidOid,	/* no subtype */
+								   InvalidOid,	/* no collation */
+								   InvalidOid,	/* no reg proc for this */
+								   (Datum) 0);	/* constant */
 		}
 		else
 		{
@@ -307,19 +371,6 @@ ExecSeqBuildScanKeys(PlanState *planstate, List *quals, int *numScanKeys,
 			 */
 			continue;
 		}
-
-		n_scan_keys++;
-
-		ScanKeyEntryInitialize(this_scan_key,
-							   flags,
-							   varattno,
-							   InvalidStrategy, /* no strategy */
-							   InvalidOid,	/* no subtype */
-							   collationid,
-							   opfuncid,
-							   scanvalue);
-
-		ScanKeyEntrySetHashInfo(this_scan_key, skeyhashinfo);
 	}
 
 	/*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index d301dc22661..e868ac2e7b1 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5429,6 +5429,36 @@ fix_tablequal_references(PlannerInfo *root, Path *best_path,
 					fixed_tablequals = lappend(fixed_tablequals, clause);
 					break;
 				}
+
+				/*
+				 * NullTest: <leftop> IS/IS NOT NULL
+				 */
+			case T_NullTest:
+				{
+					NullTest   *nt = (NullTest *) clause;
+					Expr	   *leftop;
+
+					leftop = (Expr *) nt->arg;
+
+					/*
+					 * Handle relabeling and make sure our left part is a
+					 * column name.
+					 */
+					if (leftop && IsA(leftop, RelabelType))
+						leftop = ((RelabelType *) leftop)->arg;
+
+					if (leftop == NULL)
+						continue;
+
+					if (!(IsA(leftop, Var) && ((Var *) leftop)->varattno > 0))
+						continue;
+
+					/* Override Null test arg in case of relabeling */
+					nt->arg = leftop;
+
+					fixed_tablequals = lappend(fixed_tablequals, clause);
+					break;
+				}
 			default:
 				continue;
 		}
diff --git a/src/test/regress/expected/qual_pushdown.out b/src/test/regress/expected/qual_pushdown.out
index 965102b146c..bc7050b5708 100644
--- a/src/test/regress/expected/qual_pushdown.out
+++ b/src/test/regress/expected/qual_pushdown.out
@@ -6,16 +6,17 @@ CREATE TABLE qa (i INTEGER, ii INTEGER);
 CREATE TABLE qb (j INTEGER);
 INSERT INTO qa SELECT n, n * n  FROM generate_series(1, 1000) as n;
 INSERT INTO qb SELECT n FROM generate_series(1, 1000) as n;
+INSERT INTO qa VALUES (1001, NULL);
 ANALYZE qa;
 ANALYZE qb;
 -- By default, the quals are not pushed down. The tuples are filtered out by
 -- the executor.
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = 100;
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (i = 100)
-   Rows Removed In Executor by Filter: 999
+   Rows Removed In Executor by Filter: 1000
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i < 10;
@@ -23,15 +24,15 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -------------------------------------------
  Seq Scan on qa (actual rows=9.00 loops=1)
    Filter: (i < 10)
-   Rows Removed In Executor by Filter: 991
+   Rows Removed In Executor by Filter: 992
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 100 = i;
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (100 = i)
-   Rows Removed In Executor by Filter: 999
+   Rows Removed In Executor by Filter: 1000
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 10 > i;
@@ -39,23 +40,23 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -------------------------------------------
  Seq Scan on qa (actual rows=9.00 loops=1)
    Filter: (10 > i)
-   Rows Removed In Executor by Filter: 991
+   Rows Removed In Executor by Filter: 992
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = SQRT(25)::INT;
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (i = 5)
-   Rows Removed In Executor by Filter: 999
+   Rows Removed In Executor by Filter: 1000
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT 100);
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (i = (InitPlan 1).col1)
-   Rows Removed In Executor by Filter: 999
+   Rows Removed In Executor by Filter: 1000
    InitPlan 1
      ->  Result (actual rows=1.00 loops=1)
 (5 rows)
@@ -65,7 +66,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 ---------------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (i = (InitPlan 1).col1)
-   Rows Removed In Executor by Filter: 999
+   Rows Removed In Executor by Filter: 1000
    InitPlan 1
      ->  Seq Scan on qb (actual rows=1.00 loops=1)
            Filter: (j = 100)
@@ -73,23 +74,23 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 (7 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
-                   QUERY PLAN                    
--------------------------------------------------
+                    QUERY PLAN                    
+--------------------------------------------------
  Nested Loop (actual rows=1.00 loops=1)
    ->  Seq Scan on qa (actual rows=1.00 loops=1)
          Filter: (i = 100)
-         Rows Removed In Executor by Filter: 999
+         Rows Removed In Executor by Filter: 1000
    ->  Seq Scan on qb (actual rows=1.00 loops=1)
          Filter: (j = 100)
          Rows Removed In Executor by Filter: 999
 (7 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ii AND ii < 10;
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: ((ii < 10) AND (i = ii))
-   Rows Removed In Executor by Filter: 999
+   Rows Removed In Executor by Filter: 1000
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]);
@@ -97,7 +98,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -----------------------------------------------------------
  Seq Scan on qa (actual rows=10.00 loops=1)
    Filter: (i = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
-   Rows Removed In Executor by Filter: 990
+   Rows Removed In Executor by Filter: 991
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
@@ -105,7 +106,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -------------------------------------------
  Seq Scan on qa (actual rows=2.00 loops=1)
    Filter: (i = ANY ('{1,2}'::integer[]))
-   Rows Removed In Executor by Filter: 998
+   Rows Removed In Executor by Filter: 999
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
@@ -113,7 +114,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -----------------------------------------------------------
  Seq Scan on qa (actual rows=10.00 loops=1)
    Filter: (i = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
-   Rows Removed In Executor by Filter: 990
+   Rows Removed In Executor by Filter: 991
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::int[]);
@@ -121,7 +122,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 ----------------------------------------------------------
  Seq Scan on qa (actual rows=10.00 loops=1)
    Filter: (i = ANY ((InitPlan 1).col1))
-   Rows Removed In Executor by Filter: 990
+   Rows Removed In Executor by Filter: 991
    InitPlan 1
      ->  Aggregate (actual rows=1.00 loops=1)
            ->  Seq Scan on qb (actual rows=10.00 loops=1)
@@ -129,16 +130,32 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
                  Rows Removed In Executor by Filter: 990
 (8 rows)
 
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT i, ii FROM qa WHERE ii IS NULL;
+                 QUERY PLAN                 
+--------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (ii IS NULL)
+   Rows Removed In Executor by Filter: 1000
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT i, ii FROM qa WHERE ii IS NOT NULL AND i >= 1000;;
+                  QUERY PLAN                  
+----------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: ((ii IS NOT NULL) AND (i >= 1000))
+   Rows Removed In Executor by Filter: 1000
+(3 rows)
+
 -- Enable quals push down
 ALTER TABLE qa SET (quals_push_down=on);
 ALTER TABLE qb SET (quals_push_down=on);
 -- Now, we expect to see the tuples being filtered out by the table AM
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = 100;
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (i = 100)
-   Rows Removed In Table AM by Filter: 999
+   Rows Removed In Table AM by Filter: 1000
 (3 rows)
 
 SELECT ii FROM qa WHERE i = 100;
@@ -152,7 +169,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -------------------------------------------
  Seq Scan on qa (actual rows=9.00 loops=1)
    Filter: (i < 10)
-   Rows Removed In Table AM by Filter: 991
+   Rows Removed In Table AM by Filter: 992
 (3 rows)
 
 SELECT ii FROM qa WHERE i < 10;
@@ -170,11 +187,11 @@ SELECT ii FROM qa WHERE i < 10;
 (9 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 100 = i;
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (100 = i)
-   Rows Removed In Table AM by Filter: 999
+   Rows Removed In Table AM by Filter: 1000
 (3 rows)
 
 SELECT ii FROM qa WHERE 100 = i;
@@ -188,7 +205,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -------------------------------------------
  Seq Scan on qa (actual rows=9.00 loops=1)
    Filter: (10 > i)
-   Rows Removed In Table AM by Filter: 991
+   Rows Removed In Table AM by Filter: 992
 (3 rows)
 
 SELECT ii FROM qa WHERE 10 > i;
@@ -206,11 +223,11 @@ SELECT ii FROM qa WHERE 10 > i;
 (9 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = SQRT(25)::INT;
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (i = 5)
-   Rows Removed In Table AM by Filter: 999
+   Rows Removed In Table AM by Filter: 1000
 (3 rows)
 
 SELECT ii FROM qa WHERE i = SQRT(25)::INT;
@@ -220,11 +237,11 @@ SELECT ii FROM qa WHERE i = SQRT(25)::INT;
 (1 row)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT 100);
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (i = (InitPlan 1).col1)
-   Rows Removed In Table AM by Filter: 999
+   Rows Removed In Table AM by Filter: 1000
    InitPlan 1
      ->  Result (actual rows=1.00 loops=1)
 (5 rows)
@@ -240,7 +257,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 ---------------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (i = (InitPlan 1).col1)
-   Rows Removed In Table AM by Filter: 999
+   Rows Removed In Table AM by Filter: 1000
    InitPlan 1
      ->  Seq Scan on qb (actual rows=1.00 loops=1)
            Filter: (j = 100)
@@ -254,12 +271,12 @@ SELECT ii FROM qa WHERE i = (SELECT SQRT(j)::INT FROM qb WHERE j = 100);
 (1 row)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
-                   QUERY PLAN                    
--------------------------------------------------
+                    QUERY PLAN                    
+--------------------------------------------------
  Nested Loop (actual rows=1.00 loops=1)
    ->  Seq Scan on qa (actual rows=1.00 loops=1)
          Filter: (i = 100)
-         Rows Removed In Table AM by Filter: 999
+         Rows Removed In Table AM by Filter: 1000
    ->  Seq Scan on qb (actual rows=1.00 loops=1)
          Filter: (j = 100)
          Rows Removed In Table AM by Filter: 999
@@ -276,7 +293,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: ((ii < 10) AND (i = ii))
-   Rows Removed In Table AM by Filter: 997
+   Rows Removed In Table AM by Filter: 998
    Rows Removed In Executor by Filter: 2
 (4 rows)
 
@@ -291,7 +308,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -----------------------------------------------------------
  Seq Scan on qa (actual rows=10.00 loops=1)
    Filter: (i = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
-   Rows Removed In Table AM by Filter: 990
+   Rows Removed In Table AM by Filter: 991
 (3 rows)
 
 SELECT ii FROM qa WHERE i = ANY('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]);
@@ -314,7 +331,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -------------------------------------------
  Seq Scan on qa (actual rows=2.00 loops=1)
    Filter: (i = ANY ('{1,2}'::integer[]))
-   Rows Removed In Table AM by Filter: 998
+   Rows Removed In Table AM by Filter: 999
 (3 rows)
 
 SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
@@ -329,7 +346,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -----------------------------------------------------------
  Seq Scan on qa (actual rows=10.00 loops=1)
    Filter: (i = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
-   Rows Removed In Table AM by Filter: 990
+   Rows Removed In Table AM by Filter: 991
 (3 rows)
 
 SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
@@ -352,7 +369,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 ----------------------------------------------------------
  Seq Scan on qa (actual rows=10.00 loops=1)
    Filter: (i = ANY ((InitPlan 1).col1))
-   Rows Removed In Table AM by Filter: 990
+   Rows Removed In Table AM by Filter: 991
    InitPlan 1
      ->  Aggregate (actual rows=1.00 loops=1)
            ->  Seq Scan on qb (actual rows=10.00 loops=1)
@@ -375,5 +392,33 @@ SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j
  3600
 (10 rows)
 
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT i, ii FROM qa WHERE ii IS NULL;
+                 QUERY PLAN                 
+--------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (ii IS NULL)
+   Rows Removed In Table AM by Filter: 1000
+(3 rows)
+
+SELECT i, ii FROM qa WHERE ii IS NULL;
+  i   | ii 
+------+----
+ 1001 |   
+(1 row)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT i, ii FROM qa WHERE ii IS NOT NULL AND i >= 1000;;
+                  QUERY PLAN                  
+----------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: ((ii IS NOT NULL) AND (i >= 1000))
+   Rows Removed In Table AM by Filter: 1000
+(3 rows)
+
+SELECT i, ii FROM qa WHERE ii IS NOT NULL AND i >= 1000;;
+  i   |   ii    
+------+---------
+ 1000 | 1000000
+(1 row)
+
 DROP TABLE IF EXISTS qa;
 DROP TABLE IF EXISTS qb;
diff --git a/src/test/regress/sql/qual_pushdown.sql b/src/test/regress/sql/qual_pushdown.sql
index 38e88a50c33..50d6f9b316a 100644
--- a/src/test/regress/sql/qual_pushdown.sql
+++ b/src/test/regress/sql/qual_pushdown.sql
@@ -5,6 +5,7 @@ CREATE TABLE qa (i INTEGER, ii INTEGER);
 CREATE TABLE qb (j INTEGER);
 INSERT INTO qa SELECT n, n * n  FROM generate_series(1, 1000) as n;
 INSERT INTO qb SELECT n FROM generate_series(1, 1000) as n;
+INSERT INTO qa VALUES (1001, NULL);
 ANALYZE qa;
 ANALYZE qb;
 
@@ -23,6 +24,8 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::int[]);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT i, ii FROM qa WHERE ii IS NULL;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT i, ii FROM qa WHERE ii IS NOT NULL AND i >= 1000;;
 
 -- Enable quals push down
 ALTER TABLE qa SET (quals_push_down=on);
@@ -55,6 +58,10 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::INT[]);
 SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::INT[]);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT i, ii FROM qa WHERE ii IS NULL;
+SELECT i, ii FROM qa WHERE ii IS NULL;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT i, ii FROM qa WHERE ii IS NOT NULL AND i >= 1000;;
+SELECT i, ii FROM qa WHERE ii IS NOT NULL AND i >= 1000;;
 
 DROP TABLE IF EXISTS qa;
 DROP TABLE IF EXISTS qb;
-- 
2.39.5

Andres Freund

andres@anarazel.de

5 months ago

In reply to: Julien Tachoires (#1)

Re: Qual push down to table AM

On 2025-08-27 22:27:37 +0200, Julien Tachoires wrote:

Please find attached a patch set proposal intended to implement WHERE
clauses (qual) push down to the underlying table AM during
table/sequential scan execution.

The primary goal of this project is to convert quals to ScanKeys and
pass them to the table AMs. Table AMs are then allowed to apply early
tuple filtering during table (sequential) scans. Applying filtering at
the table storage level is something necessary for non row-oriented
table storage like columnar storage. Index organized table is another
table storage that would need quals push down.

AFAIK, CustomScan is the one and only way to go for having table scan
using quals pushed down, but each table AM must implement its own
mechanism. IMHO, having this feature available in core would help the
development of new table AMs. About Heap, some performance testing
(detailed at the end of this message) shows between 45% and 60%
improvement in seq scan execution time when only one tuple is returned
from the table.

One problem with doing that in the case of heapam is that you're evaluating
scan keys with the buffer lock held - with basically arbitrary expressions
being evaluated. That's an easy path to undetected deadlocks. You'd have to
redesign the relevant mechanism to filter outside of the lock...

Greetings,

Andres Freund

Kirill Reshke

reshkekirill@gmail.com

5 months ago

In reply to: Julien Tachoires (#1)

Re: Qual push down to table AM

On Thu, 28 Aug 2025 at 01:27, Julien Tachoires <julien@tachoires.me> wrote:

Hi,

Please find attached a patch set proposal intended to implement WHERE
clauses (qual) push down to the underlying table AM during
table/sequential scan execution.

The primary goal of this project is to convert quals to ScanKeys and
pass them to the table AMs. Table AMs are then allowed to apply early
tuple filtering during table (sequential) scans. Applying filtering at
the table storage level is something necessary for non row-oriented
table storage like columnar storage. Index organized table is another
table storage that would need quals push down.

AFAIK, CustomScan is the one and only way to go for having table scan
using quals pushed down, but each table AM must implement its own
mechanism. IMHO, having this feature available in core would help the
development of new table AMs. About Heap, some performance testing
(detailed at the end of this message) shows between 45% and 60%
improvement in seq scan execution time when only one tuple is returned
from the table.

Only a few expressions are supported: OpExpr (<key> <operator> <value>),
ScalarArrayOpExpr (<key> <operator> ANY|ALL(ARRAY[...]), and NullTest.
Row comparison is not yet supported as this part is still not clear to
me. On the right part of the expression, we support: constant, variable,
function call, and subquery (InitPlan only).

In terms of security, we check if the function related to the operator
is not user defined: only functions from the catalog are supported. We
also check that the function is "leakproof".

Pushing down quals does not guaranty to the executor that the tuples
returned during table scan satisfy a qual, as we don't know if the table
AM (potentially implemented via an extension) has applied tuple
filtering. In order to ensure to produce the right response to the where
clause, pushed down quals are executed twice per tuple returned: once by
the table AM, and once by the executor. This produces a performance
regression (15-17%) where almost the entire table is returned (see perf.
test results at the end of the message). This could be optimized by
flagging the tuples filtered by the table AM, this way we could avoid
the re-execution of the pushed down quals.

Details about the patch files

v1-0001-Pass-the-number-of-ScanKeys-to-scan_rescan.patch: This patch
adds the number of ScanKeys passed via scan_rescan() as a new argument.
The number of ScanKeys was only passed to the table AM via begin_scan(),
but not in scan_rescan().

v1-0002-Simple-quals-push-down-to-table-AMs.patch: Core of the feature,
this patch adds qual push down support for OpExpr expressions.

v1-0003-Add-the-table-reloption-quals_push_down.patch: Adds a new
reloption: quals_push_down used to enable/disable qual push down for a
table. Disabled by default.

v1-0004-Add-tests-for-quals-push-down-to-table-AM.patch: Regression
tests.

v1-0005-Push-down-IN-NOT-IN-array-quals-to-table-AMs.patch:
ScalarArrayOpExpr support.

v1-0006-Push-down-IS-IS-NOT-NULL-quals-to-table-AMs.patch: NullTest
support.

Performance testing

Head:
CREATE TABLE t (i INTEGER);

Patch:
CREATE TABLE t (i INTEGER) WITH (quals_push_down = on);

n=1M:
INSERT INTO t SELECT generate_series(1, 1000000);
VACUUM t;

n=10M:
TRUNCATE t;
INSERT INTO t SELECT generate_series(1, 10000000);
VACUUM t;

n=100M:
TRUNCATE t;
INSERT INTO t SELECT generate_series(1, 100000000);
VACUUM t;

Case #1: SELECT COUNT(*) FROM t WHERE i = 50000;
|       n=1M      |        n=10M      |         n=100M
+--------+--------+---------+---------+----------+---------
|  Head  |  Patch |  Head   |  Patch  |  Head    |  Patch
--------+--------+--------+---------+---------+----------+---------
Test #1 | 38.903 | 21.308 | 365.707 | 155.429 | 3939.937 | 1564.182
Test #2 | 39.239 | 21.271 | 364.206 | 153.127 | 3872.370 | 1527.988
Test #3 | 39.015 | 21.958 | 365.434 | 154.498 | 3812.382 | 1525.535
--------+--------+--------+---------+---------+----------+---------
--------+--------+--------+---------+---------+----------+---------
Average | 39.052 | 21.512 | 365.116 | 154.351 | 3874.896 | 1539.235
Std dev | 0.171 | 0.386 | 0.800 | 1.158 | 63.815 | 21.640
--------+--------+--------+---------+---------+----------+---------
Gain | 44.91% | 57.73% | 60.28%

Case #2: SELECT COUNT(*) FROM t WHERE i >= 2;
|       n=1M      |        n=10M      |         n=100M
+--------+--------+---------+---------+----------+---------
|  Head  |  Patch |  Head   |  Patch  |  Head    |  Patch
--------+--------+--------+---------+---------+----------+---------
Test #1 | 68.422 | 81.233 | 674.397 | 778.427 | 6845.165 | 8071.627
Test #2 | 69.237 | 80.868 | 682.976 | 774.417 | 6533.091 | 7668.477
Test #3 | 69.579 | 80.418 | 676.072 | 791.465 | 6917.964 | 7916.182
--------+--------+--------+---------+---------+----------+---------
--------+--------+--------+---------+---------+----------+---------
Average | 69.079 | 80.840 | 677.815 | 781.436 | 6765.407 | 7885.429
Std dev | 0.594 | 0.408 | 4.547 | 8.914 | 204.457 | 203.327
--------+--------+--------+---------+---------+----------+---------
Gain | -17.02% | -15.29% | -16.56%

Thoughts?

Best regards,

--
Julien Tachoires

Hi!
I was also always wondering if something like quals pushing can be
implemented in Postgres. It is indeed very beneficial for Column-based
processing in MPP databases, Greenplum and Cloudberry to name a few. I
did my own micro-research a while ago (while working on some
Cloudberry features), so here are my thoughts on the subject.

What this patchset is doing, is passing ScanKeys directly to tableam
somewhat blindly. In speedups processing execution-phase. While I do
not have strong objections against this approach, I suspect this
breaks some layers of abstractions and *silent* (or maybe documented)
agreements of what are responsibilities of TableAM functions. So,
passing ScanKeys directly to TAM is used on HEAD for catalog-access
only. Correct me if I'm wrong. For all other types of relation each
query is planned, which includes

(1) building data access patch thought various data access methods (indexes)
(2) Decide for each Qual which indexes can be used to satisfy this qual
(3) Using Cost Model for filtering best options

All of this can not be done with your approach?

Cost model can give hints to the optimizer that this TAM will process
some qual much faster than any by-index access. Smart cost
model/optimizer can realise that selecting only few of all attributes
from column-orietired relation + filter when using SIMD etc can be
really cheap.

So maybe the good shape of this patch would be something that could
choose between seqscan and indexscan in planner time?

--
Best regards,
Kirill Reshke

Julien Tachoires

julien@tachoires.me

5 months ago

In reply to: Andres Freund (#2)

Re: Qual push down to table AM

Hi,

On Wed, Aug 27, 2025 at 05:50:01PM -0400, Andres Freund wrote:

On 2025-08-27 22:27:37 +0200, Julien Tachoires wrote:

Please find attached a patch set proposal intended to implement WHERE
clauses (qual) push down to the underlying table AM during
table/sequential scan execution.

The primary goal of this project is to convert quals to ScanKeys and
pass them to the table AMs. Table AMs are then allowed to apply early
tuple filtering during table (sequential) scans. Applying filtering at
the table storage level is something necessary for non row-oriented
table storage like columnar storage. Index organized table is another
table storage that would need quals push down.

AFAIK, CustomScan is the one and only way to go for having table scan
using quals pushed down, but each table AM must implement its own
mechanism. IMHO, having this feature available in core would help the
development of new table AMs. About Heap, some performance testing
(detailed at the end of this message) shows between 45% and 60%
improvement in seq scan execution time when only one tuple is returned
from the table.

One problem with doing that in the case of heapam is that you're evaluating
scan keys with the buffer lock held - with basically arbitrary expressions
being evaluated. That's an easy path to undetected deadlocks. You'd have to
redesign the relevant mechanism to filter outside of the lock...

Thank you for this quick feedback.

One potential approach to solve this in heapgettup() would be:
1. hold the buffer lock
2. get the tuple from the buffer
3. if the tuple is not visible, move to the next tuple, back to 2.
4. release the buffer lock
5. if the tuple does not satisfy the scan keys, take the buffer lock,
move to the next tuple, back to 2.
6. return the tuple

Do you see something fundamentally wrong here?

In practice, I might be wrong, but I think this problem affects
heapgettup() only, heapgettup_pagemode() does not hold the buffer lock
when HeapKeyTest() is called. To reach this problematic part we need two
conditions: non-NULL ScanKey array and not to be in the page-at-a-time
mode (pagemode). The only table_beginscan_something() function able to
meet these conditions before calling the scan_begin() API is
table_beginscan_sampling(): the caller can choose to use pagemode. The
only place where table_beginscan_sampling() is called is in
tablesample_init(), in this case the ScanKey array is NULL, so we cannot
reach the problematic part of heapgettup(). There is only one other case
where we disable the pagemode in heapam: when the snapshot is non-MVCC.

Do you know any other code path/scenario I missed that can lead to reach
this problematic part?

Best regards,

--
Julien Tachoires

Julien Tachoires

julien@tachoires.me

5 months ago

In reply to: Kirill Reshke (#3)

Re: Qual push down to table AM

Hi,

On Thu, Aug 28, 2025 at 02:57:02AM +0500, Kirill Reshke wrote:

Hi!
I was also always wondering if something like quals pushing can be
implemented in Postgres. It is indeed very beneficial for Column-based
processing in MPP databases, Greenplum and Cloudberry to name a few. I
did my own micro-research a while ago (while working on some
Cloudberry features), so here are my thoughts on the subject.

What this patchset is doing, is passing ScanKeys directly to tableam
somewhat blindly. In speedups processing execution-phase. While I do
not have strong objections against this approach, I suspect this
breaks some layers of abstractions and *silent* (or maybe documented)
agreements of what are responsibilities of TableAM functions. So,
passing ScanKeys directly to TAM is used on HEAD for catalog-access
only. Correct me if I'm wrong. For all other types of relation each
query is planned, which includes

(1) building data access patch thought various data access methods (indexes)
(2) Decide for each Qual which indexes can be used to satisfy this qual
(3) Using Cost Model for filtering best options

All of this can not be done with your approach?

Cost model can give hints to the optimizer that this TAM will process
some qual much faster than any by-index access. Smart cost
model/optimizer can realise that selecting only few of all attributes
from column-orietired relation + filter when using SIMD etc can be
really cheap.

So maybe the good shape of this patch would be something that could
choose between seqscan and indexscan in planner time?

Thank you for your quick feed back.

Exact, this patch does not add/reduce any cost when some quals are
planned to be pushed down. I agree with you that it would be nice
(necessary?) to have this. I think the table AM API should provide, via
new APIs, cost estimation in case of table scan and considering the cost
of evaluating the quals, if any.

Best regards,

--
Julien Tachoires

Mark Dilger

mark.dilger@enterprisedb.com

3 months ago

In reply to: Julien Tachoires (#1)

1 attachment(s)

Re: Qual push down to table AM

Thanks for the patchset, Julien.

v1-0001:
All current callers of scan_rescan() pass NULL for the `key` parameter,
making it unclear to
Table AM authors if this is supposed to be a pointer to a single key, or
an array. If an array,
how is it terminated? None of this is addressed in the current code
comments, and given
that nobody uses this field, the intent is undiscoverable. By adding
`int nkeys` to the parameter
list, your patch makes the intention clearer. Perhaps you could also
update the documentation
for these functions?

v1-0002
Changing the EXPLAIN output to include where the exclusion happened is
quite nice!

Out of curiosity, why did you wait until this patch to add the `int
nkeys` parameter to
initscan()? It seems more on-topic for v1-0001. Likewise, renaming
`key` as `keys` helps,
but could have been done in v1-0001.

As for Andres' concern upstream, I am including a Work-In-Progress patch
(WIP) to check
that no light-weight locks are held during qual evaluation. I don't intend
this for commit so much
as for discussion. It seems to me fairly clear that your patch does not
evaluate the quals while
holding such locks, but I might have misunderstood the concern.

On Wed, Aug 27, 2025 at 1:28 PM Julien Tachoires <julien@tachoires.me>
wrote:

Hi,

Please find attached a patch set proposal intended to implement WHERE
clauses (qual) push down to the underlying table AM during
table/sequential scan execution.

The primary goal of this project is to convert quals to ScanKeys and
pass them to the table AMs. Table AMs are then allowed to apply early
tuple filtering during table (sequential) scans. Applying filtering at
the table storage level is something necessary for non row-oriented
table storage like columnar storage. Index organized table is another
table storage that would need quals push down.

AFAIK, CustomScan is the one and only way to go for having table scan
using quals pushed down, but each table AM must implement its own
mechanism. IMHO, having this feature available in core would help the
development of new table AMs. About Heap, some performance testing
(detailed at the end of this message) shows between 45% and 60%
improvement in seq scan execution time when only one tuple is returned
from the table.

Only a few expressions are supported: OpExpr (<key> <operator> <value>),
ScalarArrayOpExpr (<key> <operator> ANY|ALL(ARRAY[...]), and NullTest.
Row comparison is not yet supported as this part is still not clear to
me. On the right part of the expression, we support: constant, variable,
function call, and subquery (InitPlan only).

In terms of security, we check if the function related to the operator
is not user defined: only functions from the catalog are supported. We
also check that the function is "leakproof".

Pushing down quals does not guaranty to the executor that the tuples
returned during table scan satisfy a qual, as we don't know if the table
AM (potentially implemented via an extension) has applied tuple
filtering. In order to ensure to produce the right response to the where
clause, pushed down quals are executed twice per tuple returned: once by
the table AM, and once by the executor. This produces a performance
regression (15-17%) where almost the entire table is returned (see perf.
test results at the end of the message). This could be optimized by
flagging the tuples filtered by the table AM, this way we could avoid
the re-execution of the pushed down quals.

Details about the patch files

v1-0001-Pass-the-number-of-ScanKeys-to-scan_rescan.patch: This patch
adds the number of ScanKeys passed via scan_rescan() as a new argument.
The number of ScanKeys was only passed to the table AM via begin_scan(),
but not in scan_rescan().

v1-0002-Simple-quals-push-down-to-table-AMs.patch: Core of the feature,
this patch adds qual push down support for OpExpr expressions.

v1-0003-Add-the-table-reloption-quals_push_down.patch: Adds a new
reloption: quals_push_down used to enable/disable qual push down for a
table. Disabled by default.

v1-0004-Add-tests-for-quals-push-down-to-table-AM.patch: Regression
tests.

v1-0005-Push-down-IN-NOT-IN-array-quals-to-table-AMs.patch:
ScalarArrayOpExpr support.

v1-0006-Push-down-IS-IS-NOT-NULL-quals-to-table-AMs.patch: NullTest
support.

Performance testing

Head:
CREATE TABLE t (i INTEGER);

Patch:
CREATE TABLE t (i INTEGER) WITH (quals_push_down = on);

n=1M:
INSERT INTO t SELECT generate_series(1, 1000000);
VACUUM t;

n=10M:
TRUNCATE t;
INSERT INTO t SELECT generate_series(1, 10000000);
VACUUM t;

n=100M:
TRUNCATE t;
INSERT INTO t SELECT generate_series(1, 100000000);
VACUUM t;

Case #1: SELECT COUNT(*) FROM t WHERE i = 50000;
|       n=1M      |        n=10M      |         n=100M
+--------+--------+---------+---------+----------+---------
|  Head  |  Patch |  Head   |  Patch  |  Head    |  Patch
--------+--------+--------+---------+---------+----------+---------
Test #1 | 38.903 | 21.308 | 365.707 | 155.429 | 3939.937 | 1564.182
Test #2 | 39.239 | 21.271 | 364.206 | 153.127 | 3872.370 | 1527.988
Test #3 | 39.015 | 21.958 | 365.434 | 154.498 | 3812.382 | 1525.535
--------+--------+--------+---------+---------+----------+---------
--------+--------+--------+---------+---------+----------+---------
Average | 39.052 | 21.512 | 365.116 | 154.351 | 3874.896 | 1539.235
Std dev | 0.171 | 0.386 | 0.800 | 1.158 | 63.815 | 21.640
--------+--------+--------+---------+---------+----------+---------
Gain | 44.91% | 57.73% | 60.28%

Case #2: SELECT COUNT(*) FROM t WHERE i >= 2;
|       n=1M      |        n=10M      |         n=100M
+--------+--------+---------+---------+----------+---------
|  Head  |  Patch |  Head   |  Patch  |  Head    |  Patch
--------+--------+--------+---------+---------+----------+---------
Test #1 | 68.422 | 81.233 | 674.397 | 778.427 | 6845.165 | 8071.627
Test #2 | 69.237 | 80.868 | 682.976 | 774.417 | 6533.091 | 7668.477
Test #3 | 69.579 | 80.418 | 676.072 | 791.465 | 6917.964 | 7916.182
--------+--------+--------+---------+---------+----------+---------
--------+--------+--------+---------+---------+----------+---------
Average | 69.079 | 80.840 | 677.815 | 781.436 | 6765.407 | 7885.429
Std dev | 0.594 | 0.408 | 4.547 | 8.914 | 204.457 | 203.327
--------+--------+--------+---------+---------+----------+---------
Gain | -17.02% | -15.29% | -16.56%

Thoughts?

Best regards,

--
Julien Tachoires

*Mark Dilger*

Attachments:

Assert-lwlocks-not-held-during-qual-evaluation.patch.WIPapplication/octet-stream; name=Assert-lwlocks-not-held-during-qual-evaluation.patch.WIPDownload

From 63bead18fbfb71e46b4193ac4464740f63fa4295 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 7 Oct 2025 16:49:22 -0700
Subject: [PATCH v2] Assert lwlocks not held during qual evaluation

Add assertions in the new push-down quals code that no lwlocks are
held, including lwlocks on buffers, during the evaluation of the
quals.
---
 src/backend/access/heap/heapam_valid.c |  2 ++
 src/backend/executor/nodeSeqscan.c     | 16 ++++++++++++++++
 src/backend/storage/lmgr/lwlock.c      |  9 +++++++++
 src/include/storage/lwlock.h           |  6 ++++++
 4 files changed, 33 insertions(+)

diff --git a/src/backend/access/heap/heapam_valid.c b/src/backend/access/heap/heapam_valid.c
index a05738a9144..00936331109 100644
--- a/src/backend/access/heap/heapam_valid.c
+++ b/src/backend/access/heap/heapam_valid.c
@@ -123,6 +123,8 @@ HeapKeyTest(HeapTuple tuple, TupleDesc tupdesc, int nkeys, ScanKey keys)
 	int			cur_nkeys = nkeys;
 	ScanKey		cur_key = keys;
 
+	AssertNoLWLockHeldByMe("HeapKeyTest");
+
 	for (; cur_nkeys--; cur_key++)
 	{
 		Datum		atp;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 210d4cb84e0..ea80208b4ee 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,6 +65,8 @@ ExecSeqBuildScanKeys(PlanState *planstate, List *quals, int *numScanKeys,
 	int			n_runtime_keys;
 	int			max_runtime_keys;
 
+	AssertNoLWLockHeldByMe("ExecSeqBuildScanKeys");
+
 	n_quals = list_length(quals);
 
 	/*
@@ -396,6 +398,8 @@ SeqNext(SeqScanState *node)
 	ScanDirection direction;
 	TupleTableSlot *slot;
 
+	AssertNoLWLockHeldByMe("SeqNext");
+
 	/*
 	 * get information from the estate and scan state
 	 */
@@ -778,6 +782,8 @@ ExecReScanSeqScan(SeqScanState *node)
 {
 	TableScanDesc scan;
 
+	AssertNoLWLockHeldByMe("ExecReScanSeqScan");
+
 	/*
 	 * If we are doing runtime key calculations (ie, any of the scan key
 	 * values weren't simple Consts), compute the new key values.  But first,
@@ -821,6 +827,8 @@ ExecSeqScanEvalRuntimeKeys(ExprContext *econtext,
 	int			j;
 	MemoryContext oldContext;
 
+	AssertNoLWLockHeldByMe("ExecSeqScanEvalRuntimeKeys");
+
 	/* We want to keep the key values in per-tuple memory */
 	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
 
@@ -881,6 +889,8 @@ ExecSeqScanEstimate(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 
+	AssertNoLWLockHeldByMe("ExecSeqScanEstimate");
+
 	node->pscan_len = table_parallelscan_estimate(node->ss.ss_currentRelation,
 												  estate->es_snapshot);
 	shm_toc_estimate_chunk(&pcxt->estimator, node->pscan_len);
@@ -900,6 +910,8 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
 
+	AssertNoLWLockHeldByMe("ExecSeqScanInitializeDSM");
+
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
@@ -931,6 +943,8 @@ ExecSeqScanReInitializeDSM(SeqScanState *node,
 {
 	ParallelTableScanDesc pscan;
 
+	AssertNoLWLockHeldByMe("ExecSeqScanReInitializeDSM");
+
 	pscan = node->ss.ss_currentScanDesc->rs_parallel;
 	table_parallelscan_reinitialize(node->ss.ss_currentRelation, pscan);
 }
@@ -947,6 +961,8 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 {
 	ParallelTableScanDesc pscan;
 
+	AssertNoLWLockHeldByMe("ExecSeqScanInitializeWorker");
+
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index ec9c345ffdf..3aed1e3de4c 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -743,6 +743,15 @@ GetLWLockIdentifier(uint32 classId, uint16 eventId)
 	return GetLWTrancheName(eventId);
 }
 
+#ifdef USE_ASSERT_CHECKING
+void
+AssertNoLWLockHeldByMe(const char *context)
+{
+	if (num_held_lwlocks > 0)
+		elog(ERROR, "In %s: holding %d lightweight locks", context, num_held_lwlocks);
+}
+#endif
+
 /*
  * Internal function that tries to atomically acquire the lwlock in the passed
  * in mode.
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 5e717765764..f786b84ba91 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -146,6 +146,12 @@ extern void InitLWLockAccess(void);
 
 extern const char *GetLWLockIdentifier(uint32 classId, uint16 eventId);
 
+#ifdef USE_ASSERT_CHECKING
+extern void AssertNoLWLockHeldByMe(const char *context);
+#else
+#define AssertNoLWLockHeldByMe(context)
+#endif
+
 /*
  * Extensions (or core code) can obtain an LWLocks by calling
  * RequestNamedLWLockTranche() during postmaster startup.  Subsequently,
-- 
2.39.5 (Apple Git-154)

Julien Tachoires

julien@tachoires.me

about 1 month ago

In reply to: Mark Dilger (#6)

7 attachment(s)

Re: Qual push down to table AM

Hi Mark,

On Tue, Oct 07, 2025 at 04:55:53PM -0700, Mark Dilger wrote:

v1-0001:
All current callers of scan_rescan() pass NULL for the `key` parameter,
making it unclear to
Table AM authors if this is supposed to be a pointer to a single key, or
an array. If an array,
how is it terminated? None of this is addressed in the current code
comments, and given
that nobody uses this field, the intent is undiscoverable. By adding
`int nkeys` to the parameter
list, your patch makes the intention clearer. Perhaps you could also
update the documentation
for these functions?

v1-0002
Changing the EXPLAIN output to include where the exclusion happened is
quite nice!

Out of curiosity, why did you wait until this patch to add the `int
nkeys` parameter to
initscan()? It seems more on-topic for v1-0001. Likewise, renaming
`key` as `keys` helps,
but could have been done in v1-0001.

Thanks for your review. Please find a new version attached that addresses
these points.

As for Andres' concern upstream, I am including a Work-In-Progress patch
(WIP) to check
that no light-weight locks are held during qual evaluation. I don't intend
this for commit so much
as for discussion. It seems to me fairly clear that your patch does not
evaluate the quals while
holding such locks, but I might have misunderstood the concern.

Thanks, it helps to confirm that due to the absence of ScanKey,
HeapKeyTest() is not called from heapgettup(), at least when running the
regression tests.

In order to guarantee to not hold the Buffer lock while evaluating a
ScanKey, v4-0001-Release-buffer-lock-before-scan-key-evaluation releases
the lock before calling HeapKeyTest().

Regards,

--
Julien Tachoires

Attachments:

v4-0001-Release-buffer-lock-before-scan-key-evaluation.patchtext/x-diff; charset=us-asciiDownload

From efeda879b4de167a54b49b143998d958f7473887 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julien@tachoires.me>
Date: Tue, 2 Dec 2025 10:38:45 +0100
Subject: [PATCH 1/7] Release buffer lock before scan key evaluation

heapgettup() hold a buffer lock to examine tuple visibility.
When the tuple is visible, the buffer lock can be released before
calling HeapKeyTest() in order to avoid abritrary code execution
due to scan key evaluation while the buffer lock is held. Holding
the buffer pin is enough to access tuple's data.
---
 src/backend/access/heap/heapam.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4d382a04338..1d9f9efa4fd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -975,13 +975,27 @@ continue_page:
 			if (!visible)
 				continue;
 
+			/*
+			 * If tuple visibility is statisfied, then we can release the
+			 * buffer lock before evaluating the scan keys in order to avoid
+			 * abritrary code execution while we hold the lock.
+			 */
+			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+
 			/* skip any tuples that don't match the scan key */
 			if (key != NULL &&
 				!HeapKeyTest(tuple, RelationGetDescr(scan->rs_base.rs_rd),
 							 nkeys, key))
+			{
+				/*
+				 * When the tuple is visible but does not satisfy any scan
+				 * key, then we have to re-acquire the buffer lock to examine
+				 * next tuple's visibility.
+				 */
+				LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 				continue;
+			}
 
-			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 			scan->rs_coffset = lineoff;
 			return;
 		}
-- 
2.39.5

v4-0002-Pass-the-number-of-ScanKeys-to-scan_rescan.patchtext/x-diff; charset=us-asciiDownload

From 3c6ef1cd01fa177c2d2439dfd8dcec5392f5b881 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julien@tachoires.me>
Date: Tue, 2 Dec 2025 10:40:55 +0100
Subject: [PATCH 2/7] Pass the number of ScanKeys to scan_rescan()

The number of ScanKeys passed to the table AM API routine scan_rescan()
was not specified, forcing the table AM to keep in memory the initial
number of ScanKeys passed via scan_begin(). Currenlty, there isn't any
real use of the ScanKeys during a table scan, so, this is not an issue,
but it could become a blocking point in the future if we want to
implement quals push down - as ScanKeys - to the table AM. Due to
runtime keys evaluation, this number of ScanKeys can vary between the
initial call to scan_begin() and a potential further call
to scan_rescan().

table_rescan() is modified in order to reflect the changes on
scan_rescan().

table_beginscan_parallel() signature is slightly modified in order to
pass eventual ScanKeys and their numbers to scan_begin().

We also rename the variable "key" to "keys" in multiple places in
order to make clear that this is an array of ScanKeys.
---
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam.c          | 22 +++++++-----
 src/backend/access/nbtree/nbtsort.c       |  3 +-
 src/backend/access/table/tableam.c        |  9 ++---
 src/backend/executor/execReplication.c    |  4 +--
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeSamplescan.c     |  2 +-
 src/backend/executor/nodeSeqscan.c        |  5 +--
 src/backend/executor/nodeTidscan.c        |  2 +-
 src/include/access/heapam.h               |  7 ++--
 src/include/access/tableam.h              | 41 ++++++++++++-----------
 12 files changed, 59 insertions(+), 44 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index cb3331921cb..0a43f67f919 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0, NULL);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index f87c60a230c..9f7a55f3647 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0, NULL);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1d9f9efa4fd..25bc941a815 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -346,10 +346,13 @@ bitmapheap_stream_read_next(ReadStream *pgsr, void *private_data,
 
 /* ----------------
  *		initscan - scan code common to heap_beginscan and heap_rescan
+ *
+ * Note: in order to pass on the ScanKeys to the scan, this function takes as
+ * arguments the number of ScanKeys and one array of ScanKeys.
  * ----------------
  */
 static void
-initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
+initscan(HeapScanDesc scan, int nkeys, ScanKey keys, bool keep_startblock)
 {
 	ParallelBlockTableScanDesc bpscan = NULL;
 	bool		allow_strat;
@@ -471,10 +474,13 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
 	/* page-at-a-time fields are always invalid when not rs_inited */
 
 	/*
-	 * copy the scan key, if appropriate
+	 * copy the scan keys, if appropriate
 	 */
-	if (key != NULL && scan->rs_base.rs_nkeys > 0)
-		memcpy(scan->rs_base.rs_key, key, scan->rs_base.rs_nkeys * sizeof(ScanKeyData));
+	if (keys != NULL && nkeys > 0)
+	{
+		scan->rs_base.rs_nkeys = nkeys;
+		memcpy(scan->rs_base.rs_key, keys, nkeys * sizeof(ScanKeyData));
+	}
 
 	/*
 	 * Currently, we only have a stats counter for sequential heap scans (but
@@ -1127,7 +1133,7 @@ continue_page:
 
 TableScanDesc
 heap_beginscan(Relation relation, Snapshot snapshot,
-			   int nkeys, ScanKey key,
+			   int nkeys, ScanKey keys,
 			   ParallelTableScanDesc parallel_scan,
 			   uint32 flags)
 {
@@ -1228,7 +1234,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 	else
 		scan->rs_base.rs_key = NULL;
 
-	initscan(scan, key, false);
+	initscan(scan, nkeys, keys, false);
 
 	scan->rs_read_stream = NULL;
 
@@ -1280,7 +1286,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 }
 
 void
-heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
+heap_rescan(TableScanDesc sscan, int nkeys, ScanKey keys, bool set_params,
 			bool allow_strat, bool allow_sync, bool allow_pagemode)
 {
 	HeapScanDesc scan = (HeapScanDesc) sscan;
@@ -1329,7 +1335,7 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 	/*
 	 * reinitialize scan descriptor
 	 */
-	initscan(scan, key, true);
+	initscan(scan, nkeys, keys, true);
 }
 
 void
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 454adaee7dc..9c757a1ee95 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,8 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared),
+									0, NULL);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 1e099febdc8..0efa564f7b2 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -110,14 +110,14 @@ table_slot_create(Relation relation, List **reglist)
  */
 
 TableScanDesc
-table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *key)
+table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *keys)
 {
 	uint32		flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | SO_TEMP_SNAPSHOT;
 	Oid			relid = RelationGetRelid(relation);
 	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
 
-	return relation->rd_tableam->scan_begin(relation, snapshot, nkeys, key,
+	return relation->rd_tableam->scan_begin(relation, snapshot, nkeys, keys,
 											NULL, flags);
 }
 
@@ -163,7 +163,8 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan,
+						 int nkeys, ScanKeyData *keys)
 {
 	Snapshot	snapshot;
 	uint32		flags = SO_TYPE_SEQSCAN |
@@ -184,7 +185,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		snapshot = SnapshotAny;
 	}
 
-	return relation->rd_tableam->scan_begin(relation, snapshot, 0, NULL,
+	return relation->rd_tableam->scan_begin(relation, snapshot, nkeys, keys,
 											pscan, flags);
 }
 
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index def32774c90..b7d45daa55a 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -388,7 +388,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 retry:
 	found = false;
 
-	table_rescan(scan, NULL);
+	table_rescan(scan, 0, NULL);
 
 	/* Try to find the tuple */
 	while (table_scan_getnextslot(scan, ForwardScanDirection, scanslot))
@@ -604,7 +604,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
 	scanslot = table_slot_create(rel, NULL);
 
-	table_rescan(scan, NULL);
+	table_rescan(scan, 0, NULL);
 
 	/* Try to find the tuple */
 	while (table_scan_getnextslot(scan, ForwardScanDirection, scanslot))
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..fb778e0ae3b 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -239,7 +239,7 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
 			tbm_end_iterate(&scan->st.rs_tbmiterator);
 
 		/* rescan to release any page pin */
-		table_rescan(node->ss.ss_currentScanDesc, NULL);
+		table_rescan(node->ss.ss_currentScanDesc, 0, NULL);
 	}
 
 	/* release bitmaps and buffers if any */
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b3db7548ed..a7e172d83a4 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -301,7 +301,7 @@ tablesample_init(SampleScanState *scanstate)
 	}
 	else
 	{
-		table_rescan_set_params(scanstate->ss.ss_currentScanDesc, NULL,
+		table_rescan_set_params(scanstate->ss.ss_currentScanDesc, 0, NULL,
 								scanstate->use_bulkread,
 								allow_sync,
 								scanstate->use_pagemode);
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..454d4ee0499 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -326,6 +326,7 @@ ExecReScanSeqScan(SeqScanState *node)
 
 	if (scan != NULL)
 		table_rescan(scan,		/* scan desc */
+					 0,			/* number of new scan keys */
 					 NULL);		/* new scan keys */
 
 	ExecScanReScan((ScanState *) node);
@@ -374,7 +375,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0, NULL);
 }
 
 /* ----------------------------------------------------------------
@@ -407,5 +408,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0, NULL);
 }
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index d50c6600358..0ab70d891c4 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -465,7 +465,7 @@ ExecReScanTidScan(TidScanState *node)
 
 	/* not really necessary, but seems good form */
 	if (node->ss.ss_currentScanDesc)
-		table_rescan(node->ss.ss_currentScanDesc, NULL);
+		table_rescan(node->ss.ss_currentScanDesc, 0, NULL);
 
 	ExecScanReScan(&node->ss);
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 632c4332a8c..7dd5e0bcd78 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -325,14 +325,15 @@ typedef struct PruneFreezeResult
 
 
 extern TableScanDesc heap_beginscan(Relation relation, Snapshot snapshot,
-									int nkeys, ScanKey key,
+									int nkeys, ScanKey keys,
 									ParallelTableScanDesc parallel_scan,
 									uint32 flags);
 extern void heap_setscanlimits(TableScanDesc sscan, BlockNumber startBlk,
 							   BlockNumber numBlks);
 extern void heap_prepare_pagescan(TableScanDesc sscan);
-extern void heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
-						bool allow_strat, bool allow_sync, bool allow_pagemode);
+extern void heap_rescan(TableScanDesc sscan, int nkeys, ScanKey keys,
+						bool set_params, bool allow_strat, bool allow_sync,
+						bool allow_pagemode);
 extern void heap_endscan(TableScanDesc sscan);
 extern HeapTuple heap_getnext(TableScanDesc sscan, ScanDirection direction);
 extern bool heap_getnextslot(TableScanDesc sscan,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..848aa114de1 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -326,7 +326,7 @@ typedef struct TableAmRoutine
 	 */
 	TableScanDesc (*scan_begin) (Relation rel,
 								 Snapshot snapshot,
-								 int nkeys, ScanKeyData *key,
+								 int nkeys, ScanKeyData *keys,
 								 ParallelTableScanDesc pscan,
 								 uint32 flags);
 
@@ -340,9 +340,10 @@ typedef struct TableAmRoutine
 	 * Restart relation scan.  If set_params is set to true, allow_{strat,
 	 * sync, pagemode} (see scan_begin) changes should be taken into account.
 	 */
-	void		(*scan_rescan) (TableScanDesc scan, ScanKeyData *key,
-								bool set_params, bool allow_strat,
-								bool allow_sync, bool allow_pagemode);
+	void		(*scan_rescan) (TableScanDesc scan, int nkeys,
+								ScanKeyData *keys, bool set_params,
+								bool allow_strat, bool allow_sync,
+								bool allow_pagemode);
 
 	/*
 	 * Return next tuple from `scan`, store in slot.
@@ -874,12 +875,12 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *keys)
 {
 	uint32		flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
-	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, keys, NULL, flags);
 }
 
 /*
@@ -887,7 +888,7 @@ table_beginscan(Relation rel, Snapshot snapshot,
  * snapshot appropriate for scanning catalog relations.
  */
 extern TableScanDesc table_beginscan_catalog(Relation relation, int nkeys,
-											 ScanKeyData *key);
+											 ScanKeyData *keys);
 
 /*
  * Like table_beginscan(), but table_beginscan_strat() offers an extended API
@@ -898,7 +899,7 @@ extern TableScanDesc table_beginscan_catalog(Relation relation, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan_strat(Relation rel, Snapshot snapshot,
-					  int nkeys, ScanKeyData *key,
+					  int nkeys, ScanKeyData *keys,
 					  bool allow_strat, bool allow_sync)
 {
 	uint32		flags = SO_TYPE_SEQSCAN | SO_ALLOW_PAGEMODE;
@@ -908,7 +909,7 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
 
-	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, keys, NULL, flags);
 }
 
 /*
@@ -919,11 +920,11 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *keys)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
-	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, keys,
 									   NULL, flags);
 }
 
@@ -936,7 +937,7 @@ table_beginscan_bm(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
-						 int nkeys, ScanKeyData *key,
+						 int nkeys, ScanKeyData *keys,
 						 bool allow_strat, bool allow_sync,
 						 bool allow_pagemode)
 {
@@ -949,7 +950,7 @@ table_beginscan_sampling(Relation rel, Snapshot snapshot,
 	if (allow_pagemode)
 		flags |= SO_ALLOW_PAGEMODE;
 
-	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, keys, NULL, flags);
 }
 
 /*
@@ -991,9 +992,9 @@ table_endscan(TableScanDesc scan)
  * Restart a relation scan.
  */
 static inline void
-table_rescan(TableScanDesc scan, ScanKeyData *key)
+table_rescan(TableScanDesc scan, int nkeys, ScanKeyData *keys)
 {
-	scan->rs_rd->rd_tableam->scan_rescan(scan, key, false, false, false, false);
+	scan->rs_rd->rd_tableam->scan_rescan(scan, nkeys, keys, false, false, false, false);
 }
 
 /*
@@ -1005,10 +1006,10 @@ table_rescan(TableScanDesc scan, ScanKeyData *key)
  * previously selected startblock will be kept.
  */
 static inline void
-table_rescan_set_params(TableScanDesc scan, ScanKeyData *key,
+table_rescan_set_params(TableScanDesc scan, int nkeys, ScanKeyData *keys,
 						bool allow_strat, bool allow_sync, bool allow_pagemode)
 {
-	scan->rs_rd->rd_tableam->scan_rescan(scan, key, true,
+	scan->rs_rd->rd_tableam->scan_rescan(scan, nkeys, keys, true,
 										 allow_strat, allow_sync,
 										 allow_pagemode);
 }
@@ -1073,7 +1074,8 @@ table_rescan_tidrange(TableScanDesc sscan, ItemPointer mintid,
 	/* Ensure table_beginscan_tidrange() was used. */
 	Assert((sscan->rs_flags & SO_TYPE_TIDRANGESCAN) != 0);
 
-	sscan->rs_rd->rd_tableam->scan_rescan(sscan, NULL, false, false, false, false);
+	sscan->rs_rd->rd_tableam->scan_rescan(sscan, 0, NULL, false, false, false,
+										  false);
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
 }
 
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  int nkeys, ScanKeyData *keys);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
-- 
2.39.5

v4-0003-Simple-quals-push-down-to-table-AMs.patchtext/x-diff; charset=us-asciiDownload

From 7a7793109764bff1ae7633caf5a3c5b401be3b84 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julien@tachoires.me>
Date: Tue, 2 Dec 2025 10:42:38 +0100
Subject: [PATCH 3/7] Simple quals push down to table AMs

Simple quals like: <column> <op> <const|func|var|subquery> are now
converted to ScanKeys and then passed to the underlying layer via the
table AM API. During the execution of sequential scans, the table AM can
use the given ScanKeys to filter out the tuples not satisfying the
condition before returning them to the executor. Doing this kind of
early tuples filtering speed up sequential scans execution time when a
large portion of the table must be excluded from the final result.

The query planner, via fix_tablequal_references(), is in charge of
pre-processing the quals and exclude those that cannot be used as
ScanKeys. Pre-processing quals consists in making sure that the key is
on left part of the expression, the value is on the right, and both left
and right are not relabeled.

Non-constant values are registered as run-time keys and then evaluated
and converted to ScanKeys when a rescan is requested via
ExecReScanSeqScan(). InitPlan (sub-SELECT executed only once) is the
only type of SubQuery supported for now.

A new instrumention counter is added in order to make the distinction
between the tuples excluded by the table AM and those excluded by the
executor. The explain command output is modified in that sense too.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   4 +-
 src/backend/access/heap/heapam.c              |  15 +-
 src/backend/commands/explain.c                |  51 ++-
 src/backend/executor/instrument.c             |   1 +
 src/backend/executor/nodeSeqscan.c            | 358 +++++++++++++++++-
 src/backend/optimizer/plan/createplan.c       | 221 ++++++++++-
 src/include/access/relscan.h                  |   1 +
 src/include/executor/instrument.h             |   7 +-
 src/include/executor/nodeSeqscan.h            |   3 +
 src/include/nodes/execnodes.h                 |  40 +-
 src/include/nodes/plannodes.h                 |   2 +
 src/test/regress/expected/memoize.out         |  21 +-
 src/test/regress/expected/merge.out           |   2 +-
 src/test/regress/expected/partition_prune.out |  28 +-
 src/test/regress/expected/select_parallel.out |   4 +-
 src/test/regress/sql/partition_prune.sql      |   2 +-
 16 files changed, 688 insertions(+), 72 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 48e3185b227..5747606ff90 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11925,7 +11925,7 @@ SELECT * FROM local_tbl, async_pt WHERE local_tbl.a = async_pt.a AND local_tbl.c
  Nested Loop (actual rows=1.00 loops=1)
    ->  Seq Scan on local_tbl (actual rows=1.00 loops=1)
          Filter: (c = 'bar'::text)
-         Rows Removed by Filter: 1
+         Rows Removed In Executor by Filter: 1
    ->  Append (actual rows=1.00 loops=1)
          ->  Async Foreign Scan on async_p1 async_pt_1 (never executed)
          ->  Async Foreign Scan on async_p2 async_pt_2 (actual rows=1.00 loops=1)
@@ -12220,7 +12220,7 @@ SELECT * FROM async_pt t1 WHERE t1.b === 505 LIMIT 1;
                Filter: (b === 505)
          ->  Seq Scan on async_p3 t1_3 (actual rows=1.00 loops=1)
                Filter: (b === 505)
-               Rows Removed by Filter: 101
+               Rows Removed In Executor by Filter: 101
 (9 rows)
 
 SELECT * FROM async_pt t1 WHERE t1.b === 505 LIMIT 1;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 25bc941a815..f3c4dc91e54 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -352,7 +352,8 @@ bitmapheap_stream_read_next(ReadStream *pgsr, void *private_data,
  * ----------------
  */
 static void
-initscan(HeapScanDesc scan, int nkeys, ScanKey keys, bool keep_startblock)
+initscan(HeapScanDesc scan, int nkeys, ScanKey keys, bool keep_startblock,
+		 bool update_stats)
 {
 	ParallelBlockTableScanDesc bpscan = NULL;
 	bool		allow_strat;
@@ -487,7 +488,7 @@ initscan(HeapScanDesc scan, int nkeys, ScanKey keys, bool keep_startblock)
 	 * e.g for bitmap scans the underlying bitmap index scans will be counted,
 	 * and for sample scans we update stats for tuple fetches).
 	 */
-	if (scan->rs_base.rs_flags & SO_TYPE_SEQSCAN)
+	if (update_stats && (scan->rs_base.rs_flags & SO_TYPE_SEQSCAN))
 		pgstat_count_heap_scan(scan->rs_base.rs_rd);
 }
 
@@ -993,6 +994,8 @@ continue_page:
 				!HeapKeyTest(tuple, RelationGetDescr(scan->rs_base.rs_rd),
 							 nkeys, key))
 			{
+				scan->rs_base.rs_nskip++;
+
 				/*
 				 * When the tuple is visible but does not satisfy any scan
 				 * key, then we have to re-acquire the buffer lock to examine
@@ -1107,7 +1110,10 @@ continue_page:
 			if (key != NULL &&
 				!HeapKeyTest(tuple, RelationGetDescr(scan->rs_base.rs_rd),
 							 nkeys, key))
+			{
+				scan->rs_base.rs_nskip++;
 				continue;
+			}
 
 			scan->rs_cindex = lineindex;
 			return;
@@ -1167,6 +1173,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 	scan->rs_base.rs_rd = relation;
 	scan->rs_base.rs_snapshot = snapshot;
 	scan->rs_base.rs_nkeys = nkeys;
+	scan->rs_base.rs_nskip = 0;
 	scan->rs_base.rs_flags = flags;
 	scan->rs_base.rs_parallel = parallel_scan;
 	scan->rs_strategy = NULL;	/* set in initscan */
@@ -1234,7 +1241,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 	else
 		scan->rs_base.rs_key = NULL;
 
-	initscan(scan, nkeys, keys, false);
+	initscan(scan, nkeys, keys, false, true);
 
 	scan->rs_read_stream = NULL;
 
@@ -1335,7 +1342,7 @@ heap_rescan(TableScanDesc sscan, int nkeys, ScanKey keys, bool set_params,
 	/*
 	 * reinitialize scan descriptor
 	 */
-	initscan(scan, nkeys, keys, true);
+	initscan(scan, nkeys, keys, true, false);
 }
 
 void
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7e699f8595e..2b0df2eb256 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1968,7 +1968,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 						   "Order By", planstate, ancestors, es);
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			show_indexsearches_info(planstate, es);
 			break;
@@ -1982,7 +1982,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 						   "Order By", planstate, ancestors, es);
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			if (es->analyze)
 				ExplainPropertyFloat("Heap Fetches", NULL,
@@ -2002,7 +2002,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
 			break;
@@ -2012,6 +2012,15 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			/* fall through to print additional fields the same as SeqScan */
 			/* FALLTHROUGH */
 		case T_SeqScan:
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+			{
+				show_instrumentation_count("Rows Removed In Table AM by Filter", 3,
+										   planstate, es);
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
+										   planstate, es);
+			}
+			break;
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_NamedTuplestoreScan:
@@ -2019,7 +2028,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SubqueryScan:
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			if (IsA(plan, CteScan))
 				show_ctescan_info(castNode(CteScanState, planstate), es);
@@ -2030,7 +2039,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 
 				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 				if (plan->qual)
-					show_instrumentation_count("Rows Removed by Filter", 1,
+					show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 											   planstate, es);
 				ExplainPropertyInteger("Workers Planned", NULL,
 									   gather->num_workers, es);
@@ -2054,7 +2063,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 
 				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 				if (plan->qual)
-					show_instrumentation_count("Rows Removed by Filter", 1,
+					show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 											   planstate, es);
 				ExplainPropertyInteger("Workers Planned", NULL,
 									   gm->num_workers, es);
@@ -2088,7 +2097,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			}
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			break;
 		case T_TableFuncScan:
@@ -2102,7 +2111,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			}
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			show_table_func_scan_info(castNode(TableFuncScanState,
 											   planstate), es);
@@ -2120,7 +2129,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				show_scan_qual(tidquals, "TID Cond", planstate, ancestors, es);
 				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 				if (plan->qual)
-					show_instrumentation_count("Rows Removed by Filter", 1,
+					show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 											   planstate, es);
 			}
 			break;
@@ -2137,14 +2146,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				show_scan_qual(tidquals, "TID Cond", planstate, ancestors, es);
 				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 				if (plan->qual)
-					show_instrumentation_count("Rows Removed by Filter", 1,
+					show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 											   planstate, es);
 			}
 			break;
 		case T_ForeignScan:
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			show_foreignscan_info((ForeignScanState *) planstate, es);
 			break;
@@ -2154,7 +2163,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 
 				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 				if (plan->qual)
-					show_instrumentation_count("Rows Removed by Filter", 1,
+					show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 											   planstate, es);
 				if (css->methods->ExplainCustomScan)
 					css->methods->ExplainCustomScan(css, ancestors, es);
@@ -2168,7 +2177,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 2,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 2,
 										   planstate, es);
 			break;
 		case T_MergeJoin:
@@ -2181,7 +2190,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 2,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 2,
 										   planstate, es);
 			break;
 		case T_HashJoin:
@@ -2194,7 +2203,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 2,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 2,
 										   planstate, es);
 			break;
 		case T_Agg:
@@ -2202,7 +2211,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			show_hashagg_info((AggState *) planstate, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			break;
 		case T_WindowAgg:
@@ -2211,7 +2220,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 							"Run Condition", planstate, ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			show_windowagg_info(castNode(WindowAggState, planstate), es);
 			break;
@@ -2219,7 +2228,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			show_group_keys(castNode(GroupState, planstate), ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			break;
 		case T_Sort:
@@ -2242,7 +2251,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 							"One-Time Filter", planstate, ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			if (plan->qual)
-				show_instrumentation_count("Rows Removed by Filter", 1,
+				show_instrumentation_count("Rows Removed In Executor by Filter", 1,
 										   planstate, es);
 			break;
 		case T_ModifyTable:
@@ -3996,7 +4005,9 @@ show_instrumentation_count(const char *qlabel, int which,
 	if (!es->analyze || !planstate->instrument)
 		return;
 
-	if (which == 2)
+	if (which == 3)
+		nfiltered = planstate->instrument->nfiltered3;
+	else if (which == 2)
 		nfiltered = planstate->instrument->nfiltered2;
 	else
 		nfiltered = planstate->instrument->nfiltered1;
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 9e11c662a7c..5c669ca5262 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -186,6 +186,7 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
 	dst->nloops += add->nloops;
 	dst->nfiltered1 += add->nfiltered1;
 	dst->nfiltered2 += add->nfiltered2;
+	dst->nfiltered3 += add->nfiltered3;
 
 	/* Add delta of buffer usage since entry to node's totals */
 	if (dst->need_bufusage)
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 454d4ee0499..7fc0ae5d97a 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -29,9 +29,12 @@
 
 #include "access/relscan.h"
 #include "access/tableam.h"
+#include "executor/execExpr.h"
 #include "executor/execScan.h"
 #include "executor/executor.h"
 #include "executor/nodeSeqscan.h"
+#include "nodes/nodeFuncs.h"
+#include "utils/lsyscache.h"
 #include "utils/rel.h"
 
 static TupleTableSlot *SeqNext(SeqScanState *node);
@@ -41,6 +44,157 @@ static TupleTableSlot *SeqNext(SeqScanState *node);
  * ----------------------------------------------------------------
  */
 
+/* ----------------------------------------------------------------
+ *		ExecSeqBuildScanKeys
+ *
+ *		Builds the scan keys pushed to the table AM API. Scan keys
+ *		are used to filter out tuples before returning them to the
+ *		executor, based on the quals list.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecSeqBuildScanKeys(PlanState *planstate, List *quals, int *numScanKeys,
+					 ScanKey *scanKeys, SeqScanRuntimeKeyInfo * *runtimeKeys,
+					 int *numRuntimeKeys)
+{
+	ListCell   *qual_cell;
+	ScanKey		scan_keys;
+	int			n_scan_keys = 0;
+	int			n_quals;
+	SeqScanRuntimeKeyInfo *runtime_keys;
+	int			n_runtime_keys;
+	int			max_runtime_keys;
+
+	n_quals = list_length(quals);
+
+	/*
+	 * If quals list is empty we have nothing to do.
+	 */
+	if (n_quals == 0)
+		return;
+
+	/*
+	 * Allocate an array of ScanKeyData structs: one per qual.
+	 *
+	 * Note: when we cannot convert all the quals to ScanKeys, then we waste
+	 * some memory but this avoids memory reallocation on the fly.
+	 */
+	scan_keys = (ScanKey) palloc(n_quals * sizeof(ScanKeyData));
+
+	/*
+	 * run-time_keys array is dynamically resized as needed. Caller must be
+	 * sure to pass in NULL/0 for first call.
+	 */
+	runtime_keys = *runtimeKeys;
+	n_runtime_keys = max_runtime_keys = *numRuntimeKeys;
+
+	foreach(qual_cell, quals)
+	{
+		Expr	   *clause = (Expr *) lfirst(qual_cell);
+		ScanKey		this_scan_key = &scan_keys[n_scan_keys];
+		RegProcedure opfuncid;	/* operator proc id used in scan */
+		Expr	   *leftop;		/* expr on lhs of operator */
+		Expr	   *rightop;	/* expr on rhs ... */
+		AttrNumber	varattno;	/* att number used in scan */
+
+		/*
+		 * Simple qual case: <leftop> <op> <rightop>
+		 */
+		if (IsA(clause, OpExpr))
+		{
+			int			flags = 0;
+			Datum		scanvalue;
+
+			opfuncid = ((OpExpr *) clause)->opfuncid;
+
+			/*
+			 * leftop and rightop are not relabeled and can be used as they
+			 * are because they have been pre-computed by
+			 * fix_tablequal_references(), so, the key Var is always on the
+			 * left.
+			 */
+			leftop = (Expr *) get_leftop(clause);
+			rightop = (Expr *) get_rightop(clause);
+
+			/* Left and right are not null */
+			Assert(leftop != NULL);
+			Assert(rightop != NULL);
+			/* The operator shouldn't be user defined */
+			Assert(((OpExpr *) clause)->opno < FirstNormalObjectId);
+			/* Left part is a Var */
+			Assert(IsA(leftop, Var));
+			/* Datatype is not TOASTable */
+			Assert(!TypeIsToastable(((Var *) leftop)->vartype));
+			/* Operator's function is leakproof */
+			Assert(get_func_leakproof(opfuncid));
+
+			varattno = ((Var *) leftop)->varattno;
+
+			if (IsA(rightop, Const))
+			{
+				/*
+				 * OK, simple constant comparison value
+				 */
+				scanvalue = ((Const *) rightop)->constvalue;
+				if (((Const *) rightop)->constisnull)
+					flags |= SK_ISNULL;
+			}
+			else
+			{
+				/* Need to treat this one as a run-time key */
+				if (n_runtime_keys >= max_runtime_keys)
+				{
+					if (max_runtime_keys == 0)
+					{
+						max_runtime_keys = 8;
+						runtime_keys = (SeqScanRuntimeKeyInfo *)
+							palloc(max_runtime_keys * sizeof(SeqScanRuntimeKeyInfo));
+					}
+					else
+					{
+						max_runtime_keys *= 2;
+						runtime_keys = (SeqScanRuntimeKeyInfo *)
+							repalloc(runtime_keys,
+									 max_runtime_keys * sizeof(SeqScanRuntimeKeyInfo));
+					}
+				}
+				runtime_keys[n_runtime_keys].scan_key = this_scan_key;
+				runtime_keys[n_runtime_keys].key_expr =
+					ExecInitExpr(rightop, planstate);
+				runtime_keys[n_runtime_keys].key_toastable = false;
+				n_runtime_keys++;
+				scanvalue = (Datum) 0;
+			}
+
+			n_scan_keys++;
+
+			ScanKeyEntryInitialize(this_scan_key,
+								   flags,
+								   varattno,
+								   InvalidStrategy, /* no strategy */
+								   InvalidOid,	/* no subtype */
+								   ((OpExpr *) clause)->inputcollid,
+								   opfuncid,
+								   scanvalue);
+		}
+		else
+		{
+			/*
+			 * Unsupported qual, then do not push it to the table AM.
+			 */
+			continue;
+		}
+	}
+
+	/*
+	 * Return info to our caller.
+	 */
+	*scanKeys = scan_keys;
+	*numScanKeys = n_scan_keys;
+	*runtimeKeys = runtime_keys;
+	*numRuntimeKeys = n_runtime_keys;
+}
+
 /* ----------------------------------------------------------------
  *		SeqNext
  *
@@ -71,15 +225,47 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   node->sss_NumScanKeys,
+								   node->sss_ScanKeys);
 		node->ss.ss_currentScanDesc = scandesc;
+
+		/*
+		 * If no run-time key to calculate or if they are ready to use, go
+		 * ahead and pass the ScanKeys to the table AM.
+		 */
+		if (node->sss_NumRuntimeKeys == 0 || node->sss_RuntimeKeysReady)
+			table_rescan(node->ss.ss_currentScanDesc, node->sss_NumScanKeys,
+						 node->sss_ScanKeys);
 	}
 
 	/*
 	 * get the next tuple from the table
 	 */
 	if (table_scan_getnextslot(scandesc, direction, slot))
+	{
+		/*
+		 * Update the instrumentation counter in charge of tracking the number
+		 * of tuples skipped during table/seq scan.
+		 *
+		 * Note: it seems necessary to do it after getting each tuple only
+		 * when the table scan is executed by the postgres_fdw. In all other
+		 * cases, we can update the counter only once when there is no next
+		 * tuple to return.
+		 */
+		InstrCountFiltered3(node, scandesc->rs_nskip);
+
+		/*
+		 * We have to reset the local counter once the instrumentation counter
+		 * has been updated.
+		 */
+		scandesc->rs_nskip = 0;
+
 		return slot;
+	}
+
+	InstrCountFiltered3(node, scandesc->rs_nskip);
+	scandesc->rs_nskip = 0;
+
 	return NULL;
 }
 
@@ -115,6 +301,15 @@ ExecSeqScan(PlanState *pstate)
 	Assert(pstate->qual == NULL);
 	Assert(pstate->ps_ProjInfo == NULL);
 
+	/*
+	 * If we have run-time keys and they've not already been set up, do it
+	 * now.
+	 */
+	if (node->sss_NumRuntimeKeys != 0 && !node->sss_RuntimeKeysReady)
+	{
+		ExecReScan((PlanState *) node);
+	}
+
 	return ExecScanExtended(&node->ss,
 							(ExecScanAccessMtd) SeqNext,
 							(ExecScanRecheckMtd) SeqRecheck,
@@ -139,6 +334,15 @@ ExecSeqScanWithQual(PlanState *pstate)
 	pg_assume(pstate->qual != NULL);
 	Assert(pstate->ps_ProjInfo == NULL);
 
+	/*
+	 * If we have run-time keys and they've not already been set up, do it
+	 * now.
+	 */
+	if (node->sss_NumRuntimeKeys != 0 && !node->sss_RuntimeKeysReady)
+	{
+		ExecReScan((PlanState *) node);
+	}
+
 	return ExecScanExtended(&node->ss,
 							(ExecScanAccessMtd) SeqNext,
 							(ExecScanRecheckMtd) SeqRecheck,
@@ -159,6 +363,15 @@ ExecSeqScanWithProject(PlanState *pstate)
 	Assert(pstate->qual == NULL);
 	pg_assume(pstate->ps_ProjInfo != NULL);
 
+	/*
+	 * If we have run-time keys and they've not already been set up, do it
+	 * now.
+	 */
+	if (node->sss_NumRuntimeKeys != 0 && !node->sss_RuntimeKeysReady)
+	{
+		ExecReScan((PlanState *) node);
+	}
+
 	return ExecScanExtended(&node->ss,
 							(ExecScanAccessMtd) SeqNext,
 							(ExecScanRecheckMtd) SeqRecheck,
@@ -180,6 +393,15 @@ ExecSeqScanWithQualProject(PlanState *pstate)
 	pg_assume(pstate->qual != NULL);
 	pg_assume(pstate->ps_ProjInfo != NULL);
 
+	/*
+	 * If we have run-time keys and they've not already been set up, do it
+	 * now.
+	 */
+	if (node->sss_NumRuntimeKeys != 0 && !node->sss_RuntimeKeysReady)
+	{
+		ExecReScan((PlanState *) node);
+	}
+
 	return ExecScanExtended(&node->ss,
 							(ExecScanAccessMtd) SeqNext,
 							(ExecScanRecheckMtd) SeqRecheck,
@@ -198,6 +420,15 @@ ExecSeqScanEPQ(PlanState *pstate)
 {
 	SeqScanState *node = castNode(SeqScanState, pstate);
 
+	/*
+	 * If we have run-time keys and they've not already been set up, do it
+	 * now.
+	 */
+	if (node->sss_NumRuntimeKeys != 0 && !node->sss_RuntimeKeysReady)
+	{
+		ExecReScan((PlanState *) node);
+	}
+
 	return ExecScan(&node->ss,
 					(ExecScanAccessMtd) SeqNext,
 					(ExecScanRecheckMtd) SeqRecheck);
@@ -225,6 +456,11 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate = makeNode(SeqScanState);
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
+	scanstate->sss_ScanKeys = NULL;
+	scanstate->sss_NumScanKeys = 0;
+	scanstate->sss_RuntimeKeysReady = false;
+	scanstate->sss_RuntimeKeys = NULL;
+	scanstate->sss_NumRuntimeKeys = 0;
 
 	/*
 	 * Miscellaneous initialization
@@ -258,6 +494,14 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.qual =
 		ExecInitQual(node->scan.plan.qual, (PlanState *) scanstate);
 
+	/* Build sequential scan keys */
+	ExecSeqBuildScanKeys((PlanState *) scanstate,
+						 node->tablequal,
+						 &scanstate->sss_NumScanKeys,
+						 &scanstate->sss_ScanKeys,
+						 &scanstate->sss_RuntimeKeys,
+						 &scanstate->sss_NumRuntimeKeys);
+
 	/*
 	 * When EvalPlanQual() is not in use, assign ExecProcNode for this node
 	 * based on the presence of qual and projection. Each ExecSeqScan*()
@@ -280,6 +524,24 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 			scanstate->ss.ps.ExecProcNode = ExecSeqScanWithQualProject;
 	}
 
+	/*
+	 * If we have runtime keys, we need an ExprContext to evaluate them. The
+	 * node's standard context won't do because we want to reset that context
+	 * for every tuple.  So, build another context just like the other one...
+	 */
+	if (scanstate->sss_NumRuntimeKeys != 0)
+	{
+		ExprContext *stdecontext = scanstate->ss.ps.ps_ExprContext;
+
+		ExecAssignExprContext(estate, &scanstate->ss.ps);
+		scanstate->sss_RuntimeContext = scanstate->ss.ps.ps_ExprContext;
+		scanstate->ss.ps.ps_ExprContext = stdecontext;
+	}
+	else
+	{
+		scanstate->sss_RuntimeContext = NULL;
+	}
+
 	return scanstate;
 }
 
@@ -322,16 +584,82 @@ ExecReScanSeqScan(SeqScanState *node)
 {
 	TableScanDesc scan;
 
+	/*
+	 * If we are doing runtime key calculations (ie, any of the scan key
+	 * values weren't simple Consts), compute the new key values.  But first,
+	 * reset the context so we don't leak memory as each outer tuple is
+	 * scanned.  Note this assumes that we will recalculate *all* runtime keys
+	 * on each call.
+	 */
+	if (node->sss_NumRuntimeKeys != 0)
+	{
+		ExprContext *econtext = node->sss_RuntimeContext;
+
+		ResetExprContext(econtext);
+		ExecSeqScanEvalRuntimeKeys(econtext,
+								   node->sss_RuntimeKeys,
+								   node->sss_NumRuntimeKeys);
+	}
+	node->sss_RuntimeKeysReady = true;
+
 	scan = node->ss.ss_currentScanDesc;
 
 	if (scan != NULL)
 		table_rescan(scan,		/* scan desc */
-					 0,			/* number of new scan keys */
-					 NULL);		/* new scan keys */
+					 node->sss_NumScanKeys, /* number of scan keys */
+					 node->sss_ScanKeys);	/* scan keys */
 
 	ExecScanReScan((ScanState *) node);
 }
 
+/* ----------------------------------------------------------------
+ * 		ExecSeqScanEvalRuntimeKeys
+ *
+ * 		Evaluate any run-time key values, and update the scankeys.
+ * ----------------------------------------------------------------
+ */
+void
+ExecSeqScanEvalRuntimeKeys(ExprContext *econtext,
+						   SeqScanRuntimeKeyInfo * runtimeKeys,
+						   int numRuntimeKeys)
+{
+	int			j;
+	MemoryContext oldContext;
+
+	/* We want to keep the key values in per-tuple memory */
+	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
+	for (j = 0; j < numRuntimeKeys; j++)
+	{
+		ScanKey		scan_key = runtimeKeys[j].scan_key;
+		ExprState  *key_expr = runtimeKeys[j].key_expr;
+		bool		isNull;
+
+		/*
+		 * For each run-time key, extract the run-time expression and evaluate
+		 * it with respect to the current context.  We then stick the result
+		 * into the proper scan key.
+		 *
+		 * Note: the result of the eval could be a pass-by-ref value that's
+		 * stored in some outer scan's tuple, not in
+		 * econtext->ecxt_per_tuple_memory.  We assume that the outer tuple
+		 * will stay put throughout our scan.  If this is wrong, we could copy
+		 * the result into our context explicitly, but I think that's not
+		 * necessary.
+		 */
+		scan_key->sk_argument = ExecEvalExpr(key_expr,
+											 econtext,
+											 &isNull);
+
+		if (isNull)
+			scan_key->sk_flags |= SK_ISNULL;
+		else
+			scan_key->sk_flags &= ~SK_ISNULL;
+	}
+
+	MemoryContextSwitchTo(oldContext);
+}
+
 /* ----------------------------------------------------------------
  *						Parallel Scan Support
  * ----------------------------------------------------------------
@@ -375,7 +703,17 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0, NULL);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 node->sss_NumScanKeys,
+								 node->sss_ScanKeys);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the table AM.
+	 */
+	if (node->sss_NumRuntimeKeys == 0 || node->sss_RuntimeKeysReady)
+		table_rescan(node->ss.ss_currentScanDesc, node->sss_NumScanKeys,
+					 node->sss_ScanKeys);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +746,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0, NULL);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 node->sss_NumScanKeys,
+								 node->sss_ScanKeys);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the table AM.
+	 */
+	if (node->sss_NumRuntimeKeys == 0 || node->sss_RuntimeKeysReady)
+		table_rescan(node->ss.ss_currentScanDesc, node->sss_NumScanKeys,
+					 node->sss_ScanKeys);
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8af091ba647..b7adc512189 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -20,6 +20,7 @@
 
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
+#include "executor/executor.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/extensible.h"
@@ -40,8 +41,10 @@
 #include "optimizer/tlist.h"
 #include "parser/parse_clause.h"
 #include "parser/parsetree.h"
+#include "parser/parse_relation.h"
 #include "partitioning/partprune.h"
 #include "tcop/tcopprot.h"
+#include "utils/acl.h"
 #include "utils/lsyscache.h"
 
 
@@ -171,6 +174,8 @@ static Node *fix_indexqual_clause(PlannerInfo *root,
 								  IndexOptInfo *index, int indexcol,
 								  Node *clause, List *indexcolnos);
 static Node *fix_indexqual_operand(Node *node, IndexOptInfo *index, int indexcol);
+static void fix_tablequal_references(PlannerInfo *root, Path *best_path,
+									 List *scan_clauses, List **fixed_tablequals_p);
 static List *get_switched_clauses(List *clauses, Relids outerrelids);
 static List *order_qual_clauses(PlannerInfo *root, List *clauses);
 static void copy_generic_path_info(Plan *dest, Path *src);
@@ -179,7 +184,7 @@ static void label_sort_with_costsize(PlannerInfo *root, Sort *plan,
 									 double limit_tuples);
 static void label_incrementalsort_with_costsize(PlannerInfo *root, IncrementalSort *plan,
 												List *pathkeys, double limit_tuples);
-static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
+static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid, List *tablequal);
 static SampleScan *make_samplescan(List *qptlist, List *qpqual, Index scanrelid,
 								   TableSampleClause *tsc);
 static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
@@ -2755,10 +2760,16 @@ create_seqscan_plan(PlannerInfo *root, Path *best_path,
 {
 	SeqScan    *scan_plan;
 	Index		scan_relid = best_path->parent->relid;
+	List	   *fixed_tablequals = NIL;
+	RangeTblEntry *rte;
+	RTEPermissionInfo *perminfo;
+	bool		do_fix_tablequal_ref = true;
 
 	/* it should be a base rel... */
 	Assert(scan_relid > 0);
+	rte = planner_rt_fetch(scan_relid, root);
 	Assert(best_path->parent->rtekind == RTE_RELATION);
+	Assert(rte->rtekind == RTE_RELATION);
 
 	/* Sort clauses into best execution order */
 	scan_clauses = order_qual_clauses(root, scan_clauses);
@@ -2773,9 +2784,25 @@ create_seqscan_plan(PlannerInfo *root, Path *best_path,
 			replace_nestloop_params(root, (Node *) scan_clauses);
 	}
 
+	/*
+	 * Check relation permission before doing any preliminary work on quals.
+	 * If the permissions can't be checked, then we won't do unnecessary work
+	 * related to quals push down.
+	 */
+	if (rte->perminfoindex != 0)
+	{
+		perminfo = getRTEPermissionInfo(root->parse->rteperminfos, rte);
+		if (!ExecCheckOneRelPerms(perminfo))
+			do_fix_tablequal_ref = false;
+	}
+
+	if (do_fix_tablequal_ref)
+		fix_tablequal_references(root, best_path, scan_clauses, &fixed_tablequals);
+
 	scan_plan = make_seqscan(tlist,
 							 scan_clauses,
-							 scan_relid);
+							 scan_relid,
+							 fixed_tablequals);
 
 	copy_generic_path_info(&scan_plan->scan.plan, best_path);
 
@@ -5168,6 +5195,192 @@ fix_indexqual_operand(Node *node, IndexOptInfo *index, int indexcol)
 	return NULL;				/* keep compiler quiet */
 }
 
+/*
+ * Check if the right part of a qual can be used in a ScanKey that will
+ * later be pushed down during sequential scan.
+ */
+static bool inline
+check_tablequal_rightop(Expr *rightop)
+{
+	switch (nodeTag((Node *) rightop))
+	{
+			/* Supported nodes */
+		case T_Const:
+		case T_Param:
+			break;
+
+			/*
+			 * In case of function expression, make sure function args do not
+			 * contain any reference to the table being scanned.
+			 */
+		case T_FuncExpr:
+			{
+				FuncExpr   *func = (FuncExpr *) rightop;
+				ListCell   *temp;
+
+				foreach(temp, func->args)
+				{
+					Node	   *arg = lfirst(temp);
+
+					if (IsA(arg, Var) && ((Var *) arg)->varattno > 0)
+						return false;
+				}
+
+				break;
+			}
+
+			/*
+			 * In case of Var, check if this is an attribute of a relation,
+			 * which is not supported.
+			 */
+		case T_Var:
+			{
+				if (((Var *) rightop)->varattno > 0)
+					return false;
+				break;
+			}
+			/* Unsupported nodes */
+		default:
+			return false;
+			break;
+	}
+
+	return true;
+}
+
+/*
+ * fix_tablequal_references
+ *    Precompute scan clauses in order to pass them ready to be pushed down by
+ *    the executor during table scan.
+ *
+ * We do left/right commutation if needed because we want to keep the scan key
+ * on left.
+ */
+static void
+fix_tablequal_references(PlannerInfo *root, Path *best_path,
+						 List *scan_clauses, List **fixed_tablequals_p)
+{
+	List	   *fixed_tablequals;
+	ListCell   *lc;
+
+	fixed_tablequals = NIL;
+
+	scan_clauses = (List *) replace_nestloop_params(root, (Node *) scan_clauses);
+
+	foreach(lc, scan_clauses)
+	{
+		/*
+		 * Let work with a "deep" copy of the original scan clause in order to
+		 * avoid any update on the initial scan clause.
+		 */
+		Expr	   *clause = (Expr *) copyObject(lfirst(lc));
+
+		switch (nodeTag((Node *) clause))
+		{
+				/*
+				 * Simple qual case: <leftop> <op> <rightop>
+				 */
+			case T_OpExpr:
+				{
+					OpExpr	   *opexpr = (OpExpr *) clause;
+					Expr	   *leftop;
+					Expr	   *rightop;
+
+					leftop = (Expr *) get_leftop(clause);
+					rightop = (Expr *) get_rightop(clause);
+
+					if (leftop && IsA(leftop, RelabelType))
+						leftop = ((RelabelType *) leftop)->arg;
+
+					if (rightop && IsA(rightop, RelabelType))
+						rightop = ((RelabelType *) rightop)->arg;
+
+					if (leftop == NULL || rightop == NULL)
+						continue;
+
+					/*
+					 * Ignore qual if the operator is user defined
+					 */
+					if (opexpr->opno >= FirstNormalObjectId)
+						continue;
+
+					/*
+					 * Ignore qual if the function is not leakproof
+					 */
+					if (!get_func_leakproof(opexpr->opfuncid))
+						continue;
+
+					/*
+					 * Commute left and right if needed and reflect those
+					 * changes on the clause, this way, the executor won't
+					 * have to check positions of Var and Const/other: Var is
+					 * always on the left while Const/other is on the right.
+					 */
+					if (IsA(rightop, Var) && !IsA(leftop, Var)
+						&& ((Var *) rightop)->varattno > 0)
+					{
+						Expr	   *tmpop = leftop;
+						Oid			commutator;
+
+						leftop = rightop;
+						rightop = tmpop;
+
+						commutator = get_commutator(opexpr->opno);
+
+						if (OidIsValid(commutator))
+						{
+							opexpr->opno = commutator;
+							opexpr->opfuncid = get_opcode(opexpr->opno);
+						}
+						else
+						{
+							/*
+							 * If we don't have any commutator function
+							 * available for this operator, then ignore the
+							 * qual because we cannot commute it.
+							 */
+							continue;
+						}
+					}
+
+					/*
+					 * Make sure our left part is a Var referencing an
+					 * attribute.
+					 */
+					if (!(IsA(leftop, Var) && ((Var *) leftop)->varattno > 0))
+						continue;
+
+					/*
+					 * Make sure the var type is not TOASTable as we don't
+					 * want to deal with potentially TOASTed data when
+					 * evaluating the scan keys.
+					 */
+					if (TypeIsToastable(((Var *) leftop)->vartype))
+						continue;
+
+					if (!check_tablequal_rightop(rightop))
+						continue;
+
+					/*
+					 * Even if there is no left/right commutation, update the
+					 * clause in order to avoid unnecessary checks by the
+					 * executor.
+					 */
+					list_free(opexpr->args);
+					opexpr->args = list_make2(leftop, rightop);
+
+					/* Append the modified clause to fixed_tablequals */
+					fixed_tablequals = lappend(fixed_tablequals, clause);
+					break;
+				}
+			default:
+				continue;
+		}
+	}
+
+	*fixed_tablequals_p = fixed_tablequals;
+}
+
 /*
  * get_switched_clauses
  *	  Given a list of merge or hash joinclauses (as RestrictInfo nodes),
@@ -5480,7 +5693,8 @@ bitmap_subplan_mark_shared(Plan *plan)
 static SeqScan *
 make_seqscan(List *qptlist,
 			 List *qpqual,
-			 Index scanrelid)
+			 Index scanrelid,
+			 List *tablequal)
 {
 	SeqScan    *node = makeNode(SeqScan);
 	Plan	   *plan = &node->scan.plan;
@@ -5490,6 +5704,7 @@ make_seqscan(List *qptlist,
 	plan->lefttree = NULL;
 	plan->righttree = NULL;
 	node->scan.scanrelid = scanrelid;
+	node->tablequal = tablequal;
 
 	return node;
 }
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 87a8be10461..a8dcc00c8a0 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -65,6 +65,7 @@ typedef struct TableScanDescData
 
 	struct ParallelTableScanDescData *rs_parallel;	/* parallel scan
 													 * information */
+	uint64		rs_nskip;		/* number of tuples skipped during table scan */
 } TableScanDescData;
 typedef struct TableScanDescData *TableScanDesc;
 
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index ffe470f2b84..48114f2d5df 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -88,8 +88,11 @@ typedef struct Instrumentation
 	double		ntuples;		/* total tuples produced */
 	double		ntuples2;		/* secondary node-specific tuple counter */
 	double		nloops;			/* # of run cycles for this node */
-	double		nfiltered1;		/* # of tuples removed by scanqual or joinqual */
-	double		nfiltered2;		/* # of tuples removed by "other" quals */
+	double		nfiltered1;		/* # of tuples in executor removed by scanqual
+								 * or joinqual */
+	double		nfiltered2;		/* # of tuples in executor removed by "other"
+								 * quals */
+	double		nfiltered3;		/* # of tuples in table AM removed by quals */
 	BufferUsage bufusage;		/* total buffer usage */
 	WalUsage	walusage;		/* total WAL usage */
 } Instrumentation;
diff --git a/src/include/executor/nodeSeqscan.h b/src/include/executor/nodeSeqscan.h
index 3adad8b585b..6285ebc0e58 100644
--- a/src/include/executor/nodeSeqscan.h
+++ b/src/include/executor/nodeSeqscan.h
@@ -20,6 +20,9 @@
 extern SeqScanState *ExecInitSeqScan(SeqScan *node, EState *estate, int eflags);
 extern void ExecEndSeqScan(SeqScanState *node);
 extern void ExecReScanSeqScan(SeqScanState *node);
+extern void ExecSeqScanEvalRuntimeKeys(ExprContext *econtext,
+									   SeqScanRuntimeKeyInfo * runtimeKeys,
+									   int numRuntimeKeys);
 
 /* parallel scan support */
 extern void ExecSeqScanEstimate(SeqScanState *node, ParallelContext *pcxt);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 64ff6996431..9b136ab43eb 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1276,6 +1276,11 @@ typedef struct PlanState
 		if (((PlanState *)(node))->instrument) \
 			((PlanState *)(node))->instrument->nfiltered2 += (delta); \
 	} while(0)
+#define InstrCountFiltered3(node, delta) \
+	do { \
+		if (((PlanState *)(node))->instrument) \
+			((PlanState *)(node))->instrument->nfiltered3 += (delta); \
+	} while(0)
 
 /*
  * EPQState is state for executing an EvalPlanQual recheck on a candidate
@@ -1624,14 +1629,38 @@ typedef struct ScanState
 	TupleTableSlot *ss_ScanTupleSlot;
 } ScanState;
 
+typedef struct RuntimeKeyInfo
+{
+	ScanKeyData *scan_key;		/* scankey to put value into */
+	ExprState  *key_expr;		/* expr to evaluate to get value */
+	bool		key_toastable;	/* is expr's result a toastable datatype? */
+}			RuntimeKeyInfo;
+
+typedef struct RuntimeKeyInfo SeqScanRuntimeKeyInfo;
+
 /* ----------------
  *	 SeqScanState information
+ *
+ *		ss					its first field is NodeTag
+ *		pscan_len			size of parallel heap scan descriptor
+ *		sss_ScanKeys		Skeys array used to push down quals
+ *		sss_NumScanKeys		number of Skeys
+ *		sss_RuntimeKeys		info about Skeys that must be evaluated at runtime
+ *		sss_NumRuntimeKeys	number of RuntimeKeys
+ *		sss_RuntimeKeysReady true if runtime Skeys have been computed
+ *		sss_RuntimeContext	expr context for evaling runtime Skeys
  * ----------------
  */
 typedef struct SeqScanState
 {
-	ScanState	ss;				/* its first field is NodeTag */
-	Size		pscan_len;		/* size of parallel heap scan descriptor */
+	ScanState	ss;
+	Size		pscan_len;
+	struct ScanKeyData *sss_ScanKeys;
+	int			sss_NumScanKeys;
+	SeqScanRuntimeKeyInfo *sss_RuntimeKeys;
+	int			sss_NumRuntimeKeys;
+	bool		sss_RuntimeKeysReady;
+	ExprContext *sss_RuntimeContext;
 } SeqScanState;
 
 /* ----------------
@@ -1660,12 +1689,7 @@ typedef struct SampleScanState
  * constant right-hand sides.  See comments for ExecIndexBuildScanKeys()
  * for discussion.
  */
-typedef struct
-{
-	ScanKeyData *scan_key;		/* scankey to put value into */
-	ExprState  *key_expr;		/* expr to evaluate to get value */
-	bool		key_toastable;	/* is expr's result a toastable datatype? */
-} IndexRuntimeKeyInfo;
+typedef struct RuntimeKeyInfo IndexRuntimeKeyInfo;
 
 typedef struct
 {
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c4393a94321..fcbe482c770 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -530,6 +530,8 @@ typedef struct Scan
 typedef struct SeqScan
 {
 	Scan		scan;
+	/* list of quals (usually OpExprs) pushed down to the table AM */
+	List	   *tablequal;
 } SeqScan;
 
 /* ----------------
diff --git a/src/test/regress/expected/memoize.out b/src/test/regress/expected/memoize.out
index 00c30b91459..c116c3945ef 100644
--- a/src/test/regress/expected/memoize.out
+++ b/src/test/regress/expected/memoize.out
@@ -43,7 +43,7 @@ WHERE t2.unique1 < 1000;', false);
    ->  Nested Loop (actual rows=1000.00 loops=N)
          ->  Seq Scan on tenk1 t2 (actual rows=1000.00 loops=N)
                Filter: (unique1 < 1000)
-               Rows Removed by Filter: 9000
+               Rows Removed In Table AM by Filter: 9000
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: t2.twenty
                Cache Mode: logical
@@ -75,7 +75,7 @@ WHERE t1.unique1 < 1000;', false);
    ->  Nested Loop (actual rows=1000.00 loops=N)
          ->  Seq Scan on tenk1 t1 (actual rows=1000.00 loops=N)
                Filter: (unique1 < 1000)
-               Rows Removed by Filter: 9000
+               Rows Removed In Table AM by Filter: 9000
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: t1.twenty
                Cache Mode: binary
@@ -117,7 +117,7 @@ WHERE t1.unique1 < 10;', false);
                Hits: 8  Misses: 2  Evictions: Zero  Overflows: 0  Memory Usage: NkB
                ->  Subquery Scan on t2 (actual rows=2.00 loops=N)
                      Filter: (t1.two = t2.two)
-                     Rows Removed by Filter: 2
+                     Rows Removed In Executor by Filter: 2
                      ->  Index Scan using tenk1_unique1 on tenk1 t2_1 (actual rows=4.00 loops=N)
                            Index Cond: (unique1 < 4)
                            Index Searches: N
@@ -146,14 +146,14 @@ WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false);
    ->  Nested Loop (actual rows=1000.00 loops=N)
          ->  Seq Scan on tenk1 t1 (actual rows=1000.00 loops=N)
                Filter: (unique1 < 1000)
-               Rows Removed by Filter: 9000
+               Rows Removed In Table AM by Filter: 9000
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: (t1.two + 1)
                Cache Mode: binary
                Hits: 998  Misses: 2  Evictions: Zero  Overflows: 0  Memory Usage: NkB
                ->  Index Only Scan using tenk1_unique1 on tenk1 t2 (actual rows=1.00 loops=N)
                      Filter: ((t1.two + 1) = unique1)
-                     Rows Removed by Filter: 9999
+                     Rows Removed In Executor by Filter: 9999
                      Heap Fetches: N
                      Index Searches: N
 (14 rows)
@@ -179,15 +179,16 @@ WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false);
    ->  Nested Loop (actual rows=1000.00 loops=N)
          ->  Seq Scan on tenk1 t1 (actual rows=1000.00 loops=N)
                Filter: (unique1 < 1000)
-               Rows Removed by Filter: 9000
+               Rows Removed In Table AM by Filter: 9000
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: t1.two, t1.twenty
                Cache Mode: binary
                Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
                ->  Seq Scan on tenk1 t2 (actual rows=1.00 loops=N)
                      Filter: ((t1.twenty = unique1) AND (t1.two = two))
-                     Rows Removed by Filter: 9999
-(12 rows)
+                     Rows Removed In Table AM by Filter: 5000
+                     Rows Removed In Executor by Filter: 4999
+(13 rows)
 
 -- And check we get the expected results.
 SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
@@ -246,7 +247,7 @@ WHERE t2.unique1 < 1200;', true);
    ->  Nested Loop (actual rows=1200.00 loops=N)
          ->  Seq Scan on tenk1 t2 (actual rows=1200.00 loops=N)
                Filter: (unique1 < 1200)
-               Rows Removed by Filter: 8800
+               Rows Removed In Table AM by Filter: 8800
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: t2.thousand
                Cache Mode: logical
@@ -522,7 +523,7 @@ WHERE t2.a IS NULL;', false);
                Hits: 97  Misses: 3  Evictions: Zero  Overflows: 0  Memory Usage: NkB
                ->  Subquery Scan on t2 (actual rows=0.67 loops=N)
                      Filter: ((t1.a + 1) = t2.a)
-                     Rows Removed by Filter: 2
+                     Rows Removed In Executor by Filter: 2
                      ->  Unique (actual rows=2.67 loops=N)
                            ->  Sort (actual rows=67.33 loops=N)
                                  Sort Key: t2_1.a
diff --git a/src/test/regress/expected/merge.out b/src/test/regress/expected/merge.out
index 9cb1d87066a..123b063716f 100644
--- a/src/test/regress/expected/merge.out
+++ b/src/test/regress/expected/merge.out
@@ -1801,7 +1801,7 @@ WHEN MATCHED AND t.a < 10 THEN
                Sort Method: quicksort  Memory: xxx
                ->  Seq Scan on ex_mtarget t (actual rows=0.00 loops=1)
                      Filter: (a < '-1000'::integer)
-                     Rows Removed by Filter: 54
+                     Rows Removed In Table AM by Filter: 54
          ->  Sort (never executed)
                Sort Key: s.a
                ->  Seq Scan on ex_msource s (never executed)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index deacdd75807..943cc9131ae 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2346,16 +2346,16 @@ explain (analyze, costs off, summary off, timing off, buffers off) select * from
  Append (actual rows=0.00 loops=1)
    ->  Seq Scan on list_part1 list_part_1 (actual rows=0.00 loops=1)
          Filter: (a = (list_part_fn(1) + a))
-         Rows Removed by Filter: 1
+         Rows Removed In Executor by Filter: 1
    ->  Seq Scan on list_part2 list_part_2 (actual rows=0.00 loops=1)
          Filter: (a = (list_part_fn(1) + a))
-         Rows Removed by Filter: 1
+         Rows Removed In Executor by Filter: 1
    ->  Seq Scan on list_part3 list_part_3 (actual rows=0.00 loops=1)
          Filter: (a = (list_part_fn(1) + a))
-         Rows Removed by Filter: 1
+         Rows Removed In Executor by Filter: 1
    ->  Seq Scan on list_part4 list_part_4 (actual rows=0.00 loops=1)
          Filter: (a = (list_part_fn(1) + a))
-         Rows Removed by Filter: 1
+         Rows Removed In Executor by Filter: 1
 (13 rows)
 
 rollback;
@@ -2381,7 +2381,7 @@ begin
     loop
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+(?:\.\d+)? loops=\d+', 'actual rows=N loops=N');
-        ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Rows Removed In Executor by Filter: \d+', 'Rows Removed In Executor by Filter: N');
         perform regexp_matches(ln, 'Index Searches: \d+');
         if found then
           continue;
@@ -2623,7 +2623,7 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
-                           Rows Removed by Filter: N
+                           Rows Removed In Executor by Filter: N
                      ->  Append (actual rows=N loops=N)
                            ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (actual rows=N loops=N)
                                  Index Cond: (a = a.a)
@@ -2657,7 +2657,7 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on
                ->  Nested Loop (actual rows=N loops=N)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)
                            Filter: (a = ANY ('{1,0,0}'::integer[]))
-                           Rows Removed by Filter: N
+                           Rows Removed In Executor by Filter: N
                      ->  Append (actual rows=N loops=N)
                            ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 ab_1 (never executed)
                                  Index Cond: (a = a.a)
@@ -2879,7 +2879,7 @@ explain (analyze, costs off, summary off, timing off, buffers off) execute ab_q6
          Filter: ((a = $1) AND (b = (InitPlan expr_1).col1))
    ->  Seq Scan on xy_1 (actual rows=0.00 loops=1)
          Filter: ((x = $1) AND (y = (InitPlan expr_1).col1))
-         Rows Removed by Filter: 1
+         Rows Removed In Table AM by Filter: 1
    ->  Seq Scan on ab_a1_b1 ab_4 (never executed)
          Filter: ((a = $1) AND (b = (InitPlan expr_1).col1))
    ->  Seq Scan on ab_a1_b2 ab_5 (never executed)
@@ -3543,7 +3543,7 @@ select * from boolp where a = (select value from boolvalues where value);
    InitPlan expr_1
      ->  Seq Scan on boolvalues (actual rows=1.00 loops=1)
            Filter: value
-           Rows Removed by Filter: 1
+           Rows Removed In Executor by Filter: 1
    ->  Seq Scan on boolp_f boolp_1 (never executed)
          Filter: (a = (InitPlan expr_1).col1)
    ->  Seq Scan on boolp_t boolp_2 (actual rows=0.00 loops=1)
@@ -3558,7 +3558,7 @@ select * from boolp where a = (select value from boolvalues where not value);
    InitPlan expr_1
      ->  Seq Scan on boolvalues (actual rows=1.00 loops=1)
            Filter: (NOT value)
-           Rows Removed by Filter: 1
+           Rows Removed In Executor by Filter: 1
    ->  Seq Scan on boolp_f boolp_1 (actual rows=0.00 loops=1)
          Filter: (a = (InitPlan expr_1).col1)
    ->  Seq Scan on boolp_t boolp_2 (never executed)
@@ -3587,11 +3587,11 @@ explain (analyze, costs off, summary off, timing off, buffers off) execute mt_q1
    Subplans Removed: 1
    ->  Index Scan using ma_test_p2_b_idx on ma_test_p2 ma_test_1 (actual rows=1.00 loops=1)
          Filter: ((a >= $1) AND ((a % 10) = 5))
-         Rows Removed by Filter: 9
+         Rows Removed In Executor by Filter: 9
          Index Searches: 1
    ->  Index Scan using ma_test_p3_b_idx on ma_test_p3 ma_test_2 (actual rows=1.00 loops=1)
          Filter: ((a >= $1) AND ((a % 10) = 5))
-         Rows Removed by Filter: 9
+         Rows Removed In Executor by Filter: 9
          Index Searches: 1
 (11 rows)
 
@@ -3610,7 +3610,7 @@ explain (analyze, costs off, summary off, timing off, buffers off) execute mt_q1
    Subplans Removed: 2
    ->  Index Scan using ma_test_p3_b_idx on ma_test_p3 ma_test_1 (actual rows=1.00 loops=1)
          Filter: ((a >= $1) AND ((a % 10) = 5))
-         Rows Removed by Filter: 9
+         Rows Removed In Executor by Filter: 9
          Index Searches: 1
 (7 rows)
 
@@ -4115,7 +4115,7 @@ select * from listp where a = (select 2) and b <> 10;
  Seq Scan on listp1 listp (actual rows=0.00 loops=1)
    Filter: ((b <> 10) AND (a = (InitPlan expr_1).col1))
    InitPlan expr_1
-     ->  Result (never executed)
+     ->  Result (actual rows=1.00 loops=1)
 (4 rows)
 
 --
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 933921d1860..3467feea2d3 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -589,13 +589,13 @@ explain (analyze, timing off, summary off, costs off, buffers off)
    ->  Nested Loop (actual rows=98000.00 loops=1)
          ->  Seq Scan on tenk2 (actual rows=10.00 loops=1)
                Filter: (thousand = 0)
-               Rows Removed by Filter: 9990
+               Rows Removed In Table AM by Filter: 9990
          ->  Gather (actual rows=9800.00 loops=10)
                Workers Planned: 4
                Workers Launched: 4
                ->  Parallel Seq Scan on tenk1 (actual rows=1960.00 loops=50)
                      Filter: (hundred > 1)
-                     Rows Removed by Filter: 40
+                     Rows Removed In Table AM by Filter: 40
 (11 rows)
 
 alter table tenk2 reset (parallel_workers);
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d93c0c03bab..b939d725e91 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -587,7 +587,7 @@ begin
     loop
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+(?:\.\d+)? loops=\d+', 'actual rows=N loops=N');
-        ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
+        ln := regexp_replace(ln, 'Rows Removed In Executor by Filter: \d+', 'Rows Removed In Executor by Filter: N');
         perform regexp_matches(ln, 'Index Searches: \d+');
         if found then
           continue;
-- 
2.39.5

v4-0004-Add-the-table-reloption-quals_push_down.patchtext/x-diff; charset=us-asciiDownload

From 3961b56774e1c04a0446f1a121a5eccf15bf0f58 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julien@tachoires.me>
Date: Tue, 2 Dec 2025 10:43:52 +0100
Subject: [PATCH 4/7] Add the table reloption quals_push_down

The reloption quals_push_down enables or disables qualifiers to ScanKey
transformation and push down to the table access method during table
scan execution.

The default value is off, making the Quals Push Down feature disabled by
default.
---
 doc/src/sgml/ref/create_table.sgml            | 19 +++++++++++++++++
 src/backend/access/common/reloptions.c        | 13 +++++++++++-
 src/backend/executor/nodeSeqscan.c            | 21 ++++++++++++-------
 src/bin/psql/tab-complete.in.c                |  1 +
 src/include/utils/rel.h                       |  9 ++++++++
 src/test/regress/expected/memoize.out         | 15 +++++++------
 src/test/regress/expected/merge.out           |  2 +-
 src/test/regress/expected/partition_prune.out |  4 ++--
 src/test/regress/expected/select_parallel.out |  4 ++--
 9 files changed, 67 insertions(+), 21 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 6557c5cffd8..29118575b06 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -2012,6 +2012,25 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     </listitem>
    </varlistentry>
 
+   <varlistentry id="reloption-quals-push-down" xreflabel="quals_push_down">
+    <term><literal>quals_push_down</literal> (<type>boolean</type>)
+    <indexterm>
+     <primary><varname>quals_push_down</varname> storage parameter</primary>
+    </indexterm>
+    </term>
+    <listitem>
+     <para>
+     Enables or disables qualifiers (<literal>WHERE</literal> clause) push
+     down to the table access method. When enabled, during table scan execution,
+     the table access method is able to apply early tuple filtering and returns
+     only the tuples satisfying the qualifiers. By default, this option is
+     disabled, then the table access method returns all the visible tuples and
+     let the query executor alone in charge of doing tuple filtering based
+     on the qualifiers.
+     </para>
+    </listitem>
+   </varlistentry>
+
    </variablelist>
 
   </refsect2>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 9e288dfecbf..196faddf55b 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -166,6 +166,15 @@ static relopt_bool boolRelOpts[] =
 		},
 		true
 	},
+	{
+		{
+			"quals_push_down",
+			"Enables query qualifiers push down to the table access method during table scan",
+			RELOPT_KIND_HEAP,
+			AccessExclusiveLock
+		},
+		false
+	},
 	/* list terminator */
 	{{NULL}}
 };
@@ -1926,7 +1935,9 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		{"vacuum_truncate", RELOPT_TYPE_BOOL,
 		offsetof(StdRdOptions, vacuum_truncate), offsetof(StdRdOptions, vacuum_truncate_set)},
 		{"vacuum_max_eager_freeze_failure_rate", RELOPT_TYPE_REAL,
-		offsetof(StdRdOptions, vacuum_max_eager_freeze_failure_rate)}
+		offsetof(StdRdOptions, vacuum_max_eager_freeze_failure_rate)},
+		{"quals_push_down", RELOPT_TYPE_BOOL,
+		offsetof(StdRdOptions, quals_push_down)}
 	};
 
 	return (bytea *) build_reloptions(reloptions, validate, kind,
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 7fc0ae5d97a..de9dafcd428 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -494,13 +494,20 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.qual =
 		ExecInitQual(node->scan.plan.qual, (PlanState *) scanstate);
 
-	/* Build sequential scan keys */
-	ExecSeqBuildScanKeys((PlanState *) scanstate,
-						 node->tablequal,
-						 &scanstate->sss_NumScanKeys,
-						 &scanstate->sss_ScanKeys,
-						 &scanstate->sss_RuntimeKeys,
-						 &scanstate->sss_NumRuntimeKeys);
+	/*
+	 * Build an push the ScanKeys only if the relation's reloption
+	 * quals_push_down is on.
+	 */
+	if (RelationGetQualsPushDown(scanstate->ss.ss_currentRelation))
+	{
+		/* Build sequential scan keys */
+		ExecSeqBuildScanKeys((PlanState *) scanstate,
+							 node->tablequal,
+							 &scanstate->sss_NumScanKeys,
+							 &scanstate->sss_ScanKeys,
+							 &scanstate->sss_RuntimeKeys,
+							 &scanstate->sss_NumRuntimeKeys);
+	}
 
 	/*
 	 * When EvalPlanQual() is not in use, assign ExecProcNode for this node
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 20d7a65c614..e0282ab9a57 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1434,6 +1434,7 @@ static const char *const table_storage_parameters[] = {
 	"log_autovacuum_min_duration",
 	"log_autoanalyze_min_duration",
 	"parallel_workers",
+	"quals_push_down",
 	"toast.autovacuum_enabled",
 	"toast.autovacuum_freeze_max_age",
 	"toast.autovacuum_freeze_min_age",
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 80286076a11..dff085faa50 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -355,6 +355,7 @@ typedef struct StdRdOptions
 	 * to freeze. 0 if disabled, -1 if unspecified.
 	 */
 	double		vacuum_max_eager_freeze_failure_rate;
+	bool		quals_push_down;	/* enable quals push down to the table AM */
 } StdRdOptions;
 
 #define HEAP_MIN_FILLFACTOR			10
@@ -410,6 +411,14 @@ typedef struct StdRdOptions
 	((relation)->rd_options ? \
 	 ((StdRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))
 
+/*
+ * RelationGetQualsPushDown
+ *		Returns the relation's quals_push_down reloption setting.
+ */
+#define RelationGetQualsPushDown(relation) \
+	((relation)->rd_options ? \
+	 ((StdRdOptions *) (relation)->rd_options)->quals_push_down : false)
+
 /* ViewOptions->check_option values */
 typedef enum ViewOptCheckOption
 {
diff --git a/src/test/regress/expected/memoize.out b/src/test/regress/expected/memoize.out
index c116c3945ef..aef4aec89ab 100644
--- a/src/test/regress/expected/memoize.out
+++ b/src/test/regress/expected/memoize.out
@@ -43,7 +43,7 @@ WHERE t2.unique1 < 1000;', false);
    ->  Nested Loop (actual rows=1000.00 loops=N)
          ->  Seq Scan on tenk1 t2 (actual rows=1000.00 loops=N)
                Filter: (unique1 < 1000)
-               Rows Removed In Table AM by Filter: 9000
+               Rows Removed In Executor by Filter: 9000
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: t2.twenty
                Cache Mode: logical
@@ -75,7 +75,7 @@ WHERE t1.unique1 < 1000;', false);
    ->  Nested Loop (actual rows=1000.00 loops=N)
          ->  Seq Scan on tenk1 t1 (actual rows=1000.00 loops=N)
                Filter: (unique1 < 1000)
-               Rows Removed In Table AM by Filter: 9000
+               Rows Removed In Executor by Filter: 9000
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: t1.twenty
                Cache Mode: binary
@@ -146,7 +146,7 @@ WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false);
    ->  Nested Loop (actual rows=1000.00 loops=N)
          ->  Seq Scan on tenk1 t1 (actual rows=1000.00 loops=N)
                Filter: (unique1 < 1000)
-               Rows Removed In Table AM by Filter: 9000
+               Rows Removed In Executor by Filter: 9000
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: (t1.two + 1)
                Cache Mode: binary
@@ -179,16 +179,15 @@ WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false);
    ->  Nested Loop (actual rows=1000.00 loops=N)
          ->  Seq Scan on tenk1 t1 (actual rows=1000.00 loops=N)
                Filter: (unique1 < 1000)
-               Rows Removed In Table AM by Filter: 9000
+               Rows Removed In Executor by Filter: 9000
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: t1.two, t1.twenty
                Cache Mode: binary
                Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory Usage: NkB
                ->  Seq Scan on tenk1 t2 (actual rows=1.00 loops=N)
                      Filter: ((t1.twenty = unique1) AND (t1.two = two))
-                     Rows Removed In Table AM by Filter: 5000
-                     Rows Removed In Executor by Filter: 4999
-(13 rows)
+                     Rows Removed In Executor by Filter: 9999
+(12 rows)
 
 -- And check we get the expected results.
 SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
@@ -247,7 +246,7 @@ WHERE t2.unique1 < 1200;', true);
    ->  Nested Loop (actual rows=1200.00 loops=N)
          ->  Seq Scan on tenk1 t2 (actual rows=1200.00 loops=N)
                Filter: (unique1 < 1200)
-               Rows Removed In Table AM by Filter: 8800
+               Rows Removed In Executor by Filter: 8800
          ->  Memoize (actual rows=1.00 loops=N)
                Cache Key: t2.thousand
                Cache Mode: logical
diff --git a/src/test/regress/expected/merge.out b/src/test/regress/expected/merge.out
index 123b063716f..daf5cc746d1 100644
--- a/src/test/regress/expected/merge.out
+++ b/src/test/regress/expected/merge.out
@@ -1801,7 +1801,7 @@ WHEN MATCHED AND t.a < 10 THEN
                Sort Method: quicksort  Memory: xxx
                ->  Seq Scan on ex_mtarget t (actual rows=0.00 loops=1)
                      Filter: (a < '-1000'::integer)
-                     Rows Removed In Table AM by Filter: 54
+                     Rows Removed In Executor by Filter: 54
          ->  Sort (never executed)
                Sort Key: s.a
                ->  Seq Scan on ex_msource s (never executed)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 943cc9131ae..f6bd2b1e735 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2879,7 +2879,7 @@ explain (analyze, costs off, summary off, timing off, buffers off) execute ab_q6
          Filter: ((a = $1) AND (b = (InitPlan expr_1).col1))
    ->  Seq Scan on xy_1 (actual rows=0.00 loops=1)
          Filter: ((x = $1) AND (y = (InitPlan expr_1).col1))
-         Rows Removed In Table AM by Filter: 1
+         Rows Removed In Executor by Filter: 1
    ->  Seq Scan on ab_a1_b1 ab_4 (never executed)
          Filter: ((a = $1) AND (b = (InitPlan expr_1).col1))
    ->  Seq Scan on ab_a1_b2 ab_5 (never executed)
@@ -4115,7 +4115,7 @@ select * from listp where a = (select 2) and b <> 10;
  Seq Scan on listp1 listp (actual rows=0.00 loops=1)
    Filter: ((b <> 10) AND (a = (InitPlan expr_1).col1))
    InitPlan expr_1
-     ->  Result (actual rows=1.00 loops=1)
+     ->  Result (never executed)
 (4 rows)
 
 --
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 3467feea2d3..90c24ace07d 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -589,13 +589,13 @@ explain (analyze, timing off, summary off, costs off, buffers off)
    ->  Nested Loop (actual rows=98000.00 loops=1)
          ->  Seq Scan on tenk2 (actual rows=10.00 loops=1)
                Filter: (thousand = 0)
-               Rows Removed In Table AM by Filter: 9990
+               Rows Removed In Executor by Filter: 9990
          ->  Gather (actual rows=9800.00 loops=10)
                Workers Planned: 4
                Workers Launched: 4
                ->  Parallel Seq Scan on tenk1 (actual rows=1960.00 loops=50)
                      Filter: (hundred > 1)
-                     Rows Removed In Table AM by Filter: 40
+                     Rows Removed In Executor by Filter: 40
 (11 rows)
 
 alter table tenk2 reset (parallel_workers);
-- 
2.39.5

v4-0005-Add-tests-for-quals-push-down-to-table-AM.patchtext/x-diff; charset=us-asciiDownload

From 90d91ef12f6943edef3c324b261e81e33b0af9eb Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julien@tachoires.me>
Date: Tue, 2 Dec 2025 10:45:32 +0100
Subject: [PATCH 5/7] Add tests for quals push down to table AM

With the help of the EXPLAIN command, we check if the rows are filtered
out by the executor or by the table AM. We also make sure that by
default, quals are not pushed down.
---
 src/test/regress/expected/qual_pushdown.out | 253 ++++++++++++++++++++
 src/test/regress/parallel_schedule          |   2 +-
 src/test/regress/sql/qual_pushdown.sql      |  48 ++++
 3 files changed, 302 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/qual_pushdown.out
 create mode 100644 src/test/regress/sql/qual_pushdown.sql

diff --git a/src/test/regress/expected/qual_pushdown.out b/src/test/regress/expected/qual_pushdown.out
new file mode 100644
index 00000000000..7949fb949b5
--- /dev/null
+++ b/src/test/regress/expected/qual_pushdown.out
@@ -0,0 +1,253 @@
+DROP TABLE IF EXISTS qa;
+NOTICE:  table "qa" does not exist, skipping
+DROP TABLE IF EXISTS qb;
+NOTICE:  table "qb" does not exist, skipping
+CREATE TABLE qa (i INTEGER, ii INTEGER);
+CREATE TABLE qb (j INTEGER);
+INSERT INTO qa SELECT n, n * n  FROM generate_series(1, 1000) as n;
+INSERT INTO qb SELECT n FROM generate_series(1, 1000) as n;
+ANALYZE qa;
+ANALYZE qb;
+-- By default, the quals are not pushed down. The tuples are filtered out by
+-- the executor.
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = 100;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (i = 100)
+   Rows Removed In Executor by Filter: 999
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i < 10;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=9.00 loops=1)
+   Filter: (i < 10)
+   Rows Removed In Executor by Filter: 991
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 100 = i;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (100 = i)
+   Rows Removed In Executor by Filter: 999
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 10 > i;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=9.00 loops=1)
+   Filter: (10 > i)
+   Rows Removed In Executor by Filter: 991
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = SQRT(25)::INT;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (i = 5)
+   Rows Removed In Executor by Filter: 999
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT 100);
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (i = (InitPlan expr_1).col1)
+   Rows Removed In Executor by Filter: 999
+   InitPlan expr_1
+     ->  Result (actual rows=1.00 loops=1)
+(5 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT SQRT(j)::INT FROM qb WHERE j = 100);
+                    QUERY PLAN                     
+---------------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (i = (InitPlan expr_1).col1)
+   Rows Removed In Executor by Filter: 999
+   InitPlan expr_1
+     ->  Seq Scan on qb (actual rows=1.00 loops=1)
+           Filter: (j = 100)
+           Rows Removed In Executor by Filter: 999
+(7 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Nested Loop (actual rows=1.00 loops=1)
+   ->  Seq Scan on qa (actual rows=1.00 loops=1)
+         Filter: (i = 100)
+         Rows Removed In Executor by Filter: 999
+   ->  Seq Scan on qb (actual rows=1.00 loops=1)
+         Filter: (j = 100)
+         Rows Removed In Executor by Filter: 999
+(7 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ii AND ii < 10;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: ((ii < 10) AND (i = ii))
+   Rows Removed In Executor by Filter: 999
+(3 rows)
+
+-- Enable quals push down
+ALTER TABLE qa SET (quals_push_down=on);
+ALTER TABLE qb SET (quals_push_down=on);
+-- Now, we expect to see the tuples being filtered out by the table AM
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = 100;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (i = 100)
+   Rows Removed In Table AM by Filter: 999
+(3 rows)
+
+SELECT ii FROM qa WHERE i = 100;
+  ii   
+-------
+ 10000
+(1 row)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i < 10;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=9.00 loops=1)
+   Filter: (i < 10)
+   Rows Removed In Table AM by Filter: 991
+(3 rows)
+
+SELECT ii FROM qa WHERE i < 10;
+ ii 
+----
+  1
+  4
+  9
+ 16
+ 25
+ 36
+ 49
+ 64
+ 81
+(9 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 100 = i;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (100 = i)
+   Rows Removed In Table AM by Filter: 999
+(3 rows)
+
+SELECT ii FROM qa WHERE 100 = i;
+  ii   
+-------
+ 10000
+(1 row)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 10 > i;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=9.00 loops=1)
+   Filter: (10 > i)
+   Rows Removed In Table AM by Filter: 991
+(3 rows)
+
+SELECT ii FROM qa WHERE 10 > i;
+ ii 
+----
+  1
+  4
+  9
+ 16
+ 25
+ 36
+ 49
+ 64
+ 81
+(9 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = SQRT(25)::INT;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (i = 5)
+   Rows Removed In Table AM by Filter: 999
+(3 rows)
+
+SELECT ii FROM qa WHERE i = SQRT(25)::INT;
+ ii 
+----
+ 25
+(1 row)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT 100);
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (i = (InitPlan expr_1).col1)
+   Rows Removed In Table AM by Filter: 999
+   InitPlan expr_1
+     ->  Result (actual rows=1.00 loops=1)
+(5 rows)
+
+SELECT ii FROM qa WHERE i = (SELECT 100);
+  ii   
+-------
+ 10000
+(1 row)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT SQRT(j)::INT FROM qb WHERE j = 100);
+                    QUERY PLAN                     
+---------------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (i = (InitPlan expr_1).col1)
+   Rows Removed In Table AM by Filter: 999
+   InitPlan expr_1
+     ->  Seq Scan on qb (actual rows=1.00 loops=1)
+           Filter: (j = 100)
+           Rows Removed In Table AM by Filter: 999
+(7 rows)
+
+SELECT ii FROM qa WHERE i = (SELECT SQRT(j)::INT FROM qb WHERE j = 100);
+ ii  
+-----
+ 100
+(1 row)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Nested Loop (actual rows=1.00 loops=1)
+   ->  Seq Scan on qa (actual rows=1.00 loops=1)
+         Filter: (i = 100)
+         Rows Removed In Table AM by Filter: 999
+   ->  Seq Scan on qb (actual rows=1.00 loops=1)
+         Filter: (j = 100)
+         Rows Removed In Table AM by Filter: 999
+(7 rows)
+
+SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
+  ii   
+-------
+ 10000
+(1 row)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ii AND ii < 10;
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: ((ii < 10) AND (i = ii))
+   Rows Removed In Table AM by Filter: 997
+   Rows Removed In Executor by Filter: 2
+(4 rows)
+
+SELECT ii FROM qa WHERE i = ii AND ii < 10;
+ ii 
+----
+  1
+(1 row)
+
+DROP TABLE IF EXISTS qa;
+DROP TABLE IF EXISTS qb;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index cc6d799bcea..dc4703ec6b5 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # The stats test resets stats, so nothing else needing stats access can be in
 # this group.
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate qual_pushdown
 
 # event_trigger depends on create_am and cannot run concurrently with
 # any test that runs DDL
diff --git a/src/test/regress/sql/qual_pushdown.sql b/src/test/regress/sql/qual_pushdown.sql
new file mode 100644
index 00000000000..0f0410cd1d5
--- /dev/null
+++ b/src/test/regress/sql/qual_pushdown.sql
@@ -0,0 +1,48 @@
+DROP TABLE IF EXISTS qa;
+DROP TABLE IF EXISTS qb;
+
+CREATE TABLE qa (i INTEGER, ii INTEGER);
+CREATE TABLE qb (j INTEGER);
+INSERT INTO qa SELECT n, n * n  FROM generate_series(1, 1000) as n;
+INSERT INTO qb SELECT n FROM generate_series(1, 1000) as n;
+ANALYZE qa;
+ANALYZE qb;
+
+-- By default, the quals are not pushed down. The tuples are filtered out by
+-- the executor.
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = 100;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i < 10;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 100 = i;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 10 > i;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = SQRT(25)::INT;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT 100);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT SQRT(j)::INT FROM qb WHERE j = 100);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ii AND ii < 10;
+
+-- Enable quals push down
+ALTER TABLE qa SET (quals_push_down=on);
+ALTER TABLE qb SET (quals_push_down=on);
+
+-- Now, we expect to see the tuples being filtered out by the table AM
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = 100;
+SELECT ii FROM qa WHERE i = 100;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i < 10;
+SELECT ii FROM qa WHERE i < 10;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 100 = i;
+SELECT ii FROM qa WHERE 100 = i;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 10 > i;
+SELECT ii FROM qa WHERE 10 > i;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = SQRT(25)::INT;
+SELECT ii FROM qa WHERE i = SQRT(25)::INT;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT 100);
+SELECT ii FROM qa WHERE i = (SELECT 100);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT SQRT(j)::INT FROM qb WHERE j = 100);
+SELECT ii FROM qa WHERE i = (SELECT SQRT(j)::INT FROM qb WHERE j = 100);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
+SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ii AND ii < 10;
+SELECT ii FROM qa WHERE i = ii AND ii < 10;
+
+DROP TABLE IF EXISTS qa;
+DROP TABLE IF EXISTS qb;
-- 
2.39.5

v4-0006-Push-down-IN-NOT-IN-array-quals-to-table-AMs.patchtext/x-diff; charset=us-asciiDownload

From bdb97e10d6c6e5ccff52a009b6bd9a9fdd08cecc Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julien@tachoires.me>
Date: Tue, 2 Dec 2025 10:47:22 +0100
Subject: [PATCH 6/7] Push down IN/NOT IN <array> quals to table AMs

In order to allow table AMs to apply key filtering against scalar array
values, when a such qualifier is found then the executor is in charge of
collecting the required informations to later build a hash table. The
table AM is then able to create a simple hash table and use it to quickly
check existence or absence of the key in the given array in a O(1)
fashion.

The new structure ScanKeyHashInfoData is used to store the hash
informations that are passed to the table AM via the new ScanKey field:
sk_hashinfo

In the Index scan case, this field is set to NULL and unused.
---
 src/backend/access/common/scankey.c         |   9 +
 src/backend/access/heap/Makefile            |   1 +
 src/backend/access/heap/heapam.c            |   1 -
 src/backend/access/heap/heapam_valid.c      | 290 ++++++++++++++++++++
 src/backend/access/heap/meson.build         |   1 +
 src/backend/executor/nodeSeqscan.c          | 171 +++++++++++-
 src/backend/optimizer/plan/createplan.c     |  60 ++++
 src/include/access/heapam.h                 |   2 +
 src/include/access/skey.h                   |  34 +++
 src/include/access/valid.h                  |  58 ----
 src/test/regress/expected/qual_pushdown.out | 126 +++++++++
 src/test/regress/sql/qual_pushdown.sql      |  12 +
 12 files changed, 694 insertions(+), 71 deletions(-)
 create mode 100644 src/backend/access/heap/heapam_valid.c
 delete mode 100644 src/include/access/valid.h

diff --git a/src/backend/access/common/scankey.c b/src/backend/access/common/scankey.c
index 2d65ab02dd3..0d34bab755c 100644
--- a/src/backend/access/common/scankey.c
+++ b/src/backend/access/common/scankey.c
@@ -44,6 +44,7 @@ ScanKeyEntryInitialize(ScanKey entry,
 	entry->sk_subtype = subtype;
 	entry->sk_collation = collation;
 	entry->sk_argument = argument;
+	entry->sk_hashinfo = NULL;
 	if (RegProcedureIsValid(procedure))
 	{
 		fmgr_info(procedure, &entry->sk_func);
@@ -85,6 +86,7 @@ ScanKeyInit(ScanKey entry,
 	entry->sk_subtype = InvalidOid;
 	entry->sk_collation = C_COLLATION_OID;
 	entry->sk_argument = argument;
+	entry->sk_hashinfo = NULL;
 	fmgr_info(procedure, &entry->sk_func);
 }
 
@@ -113,5 +115,12 @@ ScanKeyEntryInitializeWithInfo(ScanKey entry,
 	entry->sk_subtype = subtype;
 	entry->sk_collation = collation;
 	entry->sk_argument = argument;
+	entry->sk_hashinfo = NULL;
 	fmgr_info_copy(&entry->sk_func, finfo, CurrentMemoryContext);
 }
+
+void
+ScanKeyEntrySetHashInfo(ScanKey entry, ScanKeyHashInfo hashinfo)
+{
+	entry->sk_hashinfo = hashinfo;
+}
diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile
index 394534172fa..b796a4ccdff 100644
--- a/src/backend/access/heap/Makefile
+++ b/src/backend/access/heap/Makefile
@@ -15,6 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = \
 	heapam.o \
 	heapam_handler.o \
+	heapam_valid.o \
 	heapam_visibility.o \
 	heapam_xlog.o \
 	heaptoast.o \
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f3c4dc91e54..cbc4aa49cbe 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -37,7 +37,6 @@
 #include "access/multixact.h"
 #include "access/subtrans.h"
 #include "access/syncscan.h"
-#include "access/valid.h"
 #include "access/visibilitymap.h"
 #include "access/xloginsert.h"
 #include "catalog/pg_database.h"
diff --git a/src/backend/access/heap/heapam_valid.c b/src/backend/access/heap/heapam_valid.c
new file mode 100644
index 00000000000..7261723e378
--- /dev/null
+++ b/src/backend/access/heap/heapam_valid.c
@@ -0,0 +1,290 @@
+/*-------------------------------------------------------------------------
+ *
+ * heapam_valid.c
+ *	  Heap tuple qualification validity definitions
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/heap/heapam_valid.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "utils/array.h"
+#include "utils/lsyscache.h"
+#include "access/heapam.h"
+#include "access/htup.h"
+#include "access/htup_details.h"
+#include "access/skey.h"
+#include "access/tupdesc.h"
+
+/*
+ * SearchArrayHashEntry - Hash table entry type used by SK_SEARCHARRAY
+ */
+typedef struct SearchArrayHashEntry
+{
+	Datum		key;
+	uint32		status;			/* hash status */
+	uint32		hash;			/* hash value (cached) */
+}			SearchArrayHashEntry;
+
+#define SH_PREFIX searcharray
+#define SH_ELEMENT_TYPE SearchArrayHashEntry
+#define SH_KEY_TYPE Datum
+#define SH_SCOPE static inline
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static bool searcharray_hash_element_match(struct searcharray_hash *tb, Datum key1,
+										   Datum key2);
+static uint32 searcharray_element_hash(struct searcharray_hash *tb, Datum key);
+
+/*
+ * SearchArrayHashTable - Hash table for SK_SEARCHARRAY
+ */
+typedef struct SearchArrayHashTable
+{
+	searcharray_hash *tab;		/* underlying hash table */
+	FmgrInfo	hash_finfo;		/* hash function */
+	FunctionCallInfo hash_fcinfo;	/* arguments etc */
+	FmgrInfo	match_finfo;	/* comparison function */
+	FunctionCallInfo match_fcinfo;	/* arguments etc */
+	bool		has_nulls;
+}			SearchArrayHashTable;
+
+/* Define parameters for SearchArray hash table code generation. */
+#define SH_PREFIX searcharray
+#define SH_ELEMENT_TYPE SearchArrayHashEntry
+#define SH_KEY_TYPE Datum
+#define SH_KEY key
+#define SH_HASH_KEY(tb, key) searcharray_element_hash(tb, key)
+#define SH_EQUAL(tb, a, b) searcharray_hash_element_match(tb, a, b)
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * Hash function for scalar array hash op elements.
+ *
+ * We use the element type's default hash opclass, and the column collation
+ * if the type is collation-sensitive.
+ */
+static uint32
+searcharray_element_hash(struct searcharray_hash *tb, Datum key)
+{
+	SearchArrayHashTable *elements_tab = (SearchArrayHashTable *) tb->private_data;
+	FunctionCallInfo fcinfo = elements_tab->hash_fcinfo;
+	Datum		hash;
+
+	fcinfo->args[0].value = key;
+	fcinfo->args[0].isnull = false;
+
+	hash = elements_tab->hash_finfo.fn_addr(fcinfo);
+
+	return DatumGetUInt32(hash);
+}
+
+/*
+ * Matching function for scalar array hash op elements, to be used in hashtable
+ * lookups.
+ */
+static bool
+searcharray_hash_element_match(struct searcharray_hash *tb, Datum key1, Datum key2)
+{
+	Datum		result;
+
+	SearchArrayHashTable *elements_tab = (SearchArrayHashTable *) tb->private_data;
+	FunctionCallInfo fcinfo = elements_tab->match_fcinfo;
+
+	fcinfo->args[0].value = key1;
+	fcinfo->args[0].isnull = false;
+	fcinfo->args[1].value = key2;
+	fcinfo->args[1].isnull = false;
+
+	result = elements_tab->match_finfo.fn_addr(fcinfo);
+
+	return DatumGetBool(result);
+}
+
+/*
+ *		HeapKeyTest
+ *
+ *		Test a heap tuple to see if it satisfies a scan key.
+ */
+bool
+HeapKeyTest(HeapTuple tuple, TupleDesc tupdesc, int nkeys, ScanKey keys)
+{
+	int			cur_nkeys = nkeys;
+	ScanKey		cur_key = keys;
+
+	for (; cur_nkeys--; cur_key++)
+	{
+		Datum		atp;
+		bool		isnull;
+		Datum		test;
+
+		if (cur_key->sk_flags & SK_ISNULL)
+			return false;
+
+		atp = heap_getattr(tuple, cur_key->sk_attno, tupdesc, &isnull);
+
+		/* Case when the rightop was a scalar array */
+		if (cur_key->sk_flags & SK_SEARCHARRAY)
+		{
+			bool		hashfound;
+			ScanKeyHashInfo hashinfo = cur_key->sk_hashinfo;
+			SearchArrayHashTable *hashtab;
+
+			/*
+			 * Build the hash table on the first call if needed
+			 */
+			if (hashinfo->hashtab == NULL)
+			{
+				ArrayType  *arr;
+				int16		typlen;
+				bool		typbyval;
+				char		typalign;
+				int			nitems;
+				bool		has_nulls = false;
+				char	   *s;
+				bits8	   *bitmap;
+				int			bitmask;
+
+				arr = DatumGetArrayTypeP(cur_key->sk_argument);
+				nitems = ArrayGetNItems(ARR_NDIM(arr), ARR_DIMS(arr));
+
+				get_typlenbyvalalign(ARR_ELEMTYPE(arr),
+									 &typlen,
+									 &typbyval,
+									 &typalign);
+
+				hashtab = (SearchArrayHashTable *)
+					palloc0(sizeof(SearchArrayHashTable));
+
+				hashtab->hash_finfo = hashinfo->hash_finfo;
+				hashtab->match_finfo = hashinfo->match_finfo;
+				hashtab->hash_fcinfo = hashinfo->hash_fcinfo;
+				hashtab->match_fcinfo = hashinfo->match_fcinfo;
+
+				/*
+				 * Create the hash table sizing it according to the number of
+				 * elements in the array.  This does assume that the array has
+				 * no duplicates. If the array happens to contain many
+				 * duplicate values then it'll just mean that we sized the
+				 * table a bit on the large side.
+				 */
+				hashtab->tab = searcharray_create(CurrentMemoryContext,
+												  nitems,
+												  hashtab);
+
+
+				s = (char *) ARR_DATA_PTR(arr);
+				bitmap = ARR_NULLBITMAP(arr);
+				bitmask = 1;
+				for (int i = 0; i < nitems; i++)
+				{
+					/* Get array element, checking for NULL. */
+					if (bitmap && (*bitmap & bitmask) == 0)
+					{
+						has_nulls = true;
+					}
+					else
+					{
+						Datum		element;
+
+						element = fetch_att(s, typbyval, typlen);
+						s = att_addlength_pointer(s, typlen, s);
+						s = (char *) att_align_nominal(s, typalign);
+
+						searcharray_insert(hashtab->tab, element,
+										   &hashfound);
+					}
+
+					/* Advance bitmap pointer if any. */
+					if (bitmap)
+					{
+						bitmask <<= 1;
+						if (bitmask == 0x100)
+						{
+							bitmap++;
+							bitmask = 1;
+						}
+					}
+				}
+
+				/*
+				 * Remember if we had any nulls so that we know if we need to
+				 * execute non-strict functions with a null lhs value if no
+				 * match is found.
+				 */
+				hashtab->has_nulls = has_nulls;
+
+				/* Link the hash table to the current ScanKey */
+				hashinfo->hashtab = hashtab;
+			}
+			else
+				hashtab = (SearchArrayHashTable *) hashinfo->hashtab;
+
+			/* Check the hash to see if we have a match. */
+			hashfound = NULL != searcharray_lookup(hashtab->tab, atp);
+
+			/* IN case */
+			if (hashinfo->inclause && hashfound)
+				return true;
+			/* NOT IN case */
+			if (!hashinfo->inclause && !hashfound)
+				return true;
+
+			if (!hashfound && hashtab->has_nulls)
+			{
+				if (!hashtab->match_finfo.fn_strict)
+				{
+					Datum		result;
+
+					/*
+					 * Execute function will null rhs just once.
+					 */
+					hashtab->match_fcinfo->args[0].value = atp;
+					hashtab->match_fcinfo->args[0].isnull = isnull;
+					hashtab->match_fcinfo->args[1].value = (Datum) 0;
+					hashtab->match_fcinfo->args[1].isnull = true;
+
+					result = hashtab->match_finfo.fn_addr(hashtab->match_fcinfo);
+
+					/*
+					 * Reverse the result for NOT IN clauses since the above
+					 * function is the equality function and we need
+					 * not-equals.
+					 */
+					if (!hashinfo->inclause)
+						result = !result;
+
+					if (result)
+						return true;
+				}
+			}
+
+			return false;
+		}
+		else
+		{
+			if (isnull)
+				return false;
+
+			test = FunctionCall2Coll(&cur_key->sk_func,
+									 cur_key->sk_collation,
+									 atp, cur_key->sk_argument);
+
+			if (!DatumGetBool(test))
+				return false;
+		}
+	}
+
+	return true;
+}
diff --git a/src/backend/access/heap/meson.build b/src/backend/access/heap/meson.build
index 2637b24112f..2e23ca9a586 100644
--- a/src/backend/access/heap/meson.build
+++ b/src/backend/access/heap/meson.build
@@ -3,6 +3,7 @@
 backend_sources += files(
   'heapam.c',
   'heapam_handler.c',
+  'heapam_valid.c',
   'heapam_visibility.c',
   'heapam_xlog.c',
   'heaptoast.c',
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index de9dafcd428..b5f0d8c23b9 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -96,16 +96,18 @@ ExecSeqBuildScanKeys(PlanState *planstate, List *quals, int *numScanKeys,
 		Expr	   *leftop;		/* expr on lhs of operator */
 		Expr	   *rightop;	/* expr on rhs ... */
 		AttrNumber	varattno;	/* att number used in scan */
+		int			flags = 0;
+		Datum		scanvalue;
+		Oid			collationid = InvalidOid;
+		ScanKeyHashInfo skeyhashinfo = NULL;
 
 		/*
 		 * Simple qual case: <leftop> <op> <rightop>
 		 */
 		if (IsA(clause, OpExpr))
 		{
-			int			flags = 0;
-			Datum		scanvalue;
-
 			opfuncid = ((OpExpr *) clause)->opfuncid;
+			collationid = ((OpExpr *) clause)->inputcollid;
 
 			/*
 			 * leftop and rightop are not relabeled and can be used as they
@@ -165,17 +167,149 @@ ExecSeqBuildScanKeys(PlanState *planstate, List *quals, int *numScanKeys,
 				n_runtime_keys++;
 				scanvalue = (Datum) 0;
 			}
+		}
+		/* <leftop> <op> ANY/ALL (array-expression) */
+		else if (IsA(clause, ScalarArrayOpExpr))
+		{
+			ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+			Oid			cmpfuncid;
+			Oid			hashfuncid;
+			Oid			negfuncid;
 
-			n_scan_keys++;
+			opfuncid = saop->opfuncid;
+			collationid = saop->inputcollid;
 
-			ScanKeyEntryInitialize(this_scan_key,
-								   flags,
-								   varattno,
-								   InvalidStrategy, /* no strategy */
-								   InvalidOid,	/* no subtype */
-								   ((OpExpr *) clause)->inputcollid,
-								   opfuncid,
-								   scanvalue);
+			leftop = (Expr *) linitial(saop->args);
+			rightop = (Expr *) lsecond(saop->args);
+
+			varattno = ((Var *) leftop)->varattno;
+
+			flags |= SK_SEARCHARRAY;
+
+			if (IsA(rightop, Const))
+			{
+				/*
+				 * OK, simple constant comparison value
+				 */
+				scanvalue = ((Const *) rightop)->constvalue;
+				if (((Const *) rightop)->constisnull)
+					flags |= SK_ISNULL;
+			}
+			else
+			{
+				/* Need to treat this one as a run-time key */
+				if (n_runtime_keys >= max_runtime_keys)
+				{
+					if (max_runtime_keys == 0)
+					{
+						max_runtime_keys = 8;
+						runtime_keys = (SeqScanRuntimeKeyInfo *)
+							palloc(max_runtime_keys * sizeof(SeqScanRuntimeKeyInfo));
+					}
+					else
+					{
+						max_runtime_keys *= 2;
+						runtime_keys = (SeqScanRuntimeKeyInfo *)
+							repalloc(runtime_keys,
+									 max_runtime_keys * sizeof(SeqScanRuntimeKeyInfo));
+					}
+				}
+				runtime_keys[n_runtime_keys].scan_key = this_scan_key;
+				runtime_keys[n_runtime_keys].key_expr =
+					ExecInitExpr(rightop, planstate);
+				runtime_keys[n_runtime_keys].key_toastable =
+					TypeIsToastable(((Var *) leftop)->vartype);
+				n_runtime_keys++;
+				scanvalue = (Datum) 0;
+			}
+
+			hashfuncid = saop->hashfuncid;
+			negfuncid = saop->negfuncid;
+
+			/*
+			 * If there is no hash function attached to the expr., then we
+			 * need to force one.
+			 *
+			 * One reason why there is no hash function attached is that the
+			 * scalar array is too small. In this case, the executor assumes
+			 * that for small array, using hash table/functions does not worth
+			 * it. But in our case, we want to handle all arrays in the same
+			 * way, whatever the array size.
+			 *
+			 * Another reason is that the right op. is not a constant and
+			 * needs runtime evaluation.
+			 */
+			if (!OidIsValid(hashfuncid))
+			{
+				Oid			lefthashfunc;
+				Oid			righthashfunc;
+
+				if (saop->useOr)
+				{
+					if (get_op_hash_functions(saop->opno, &lefthashfunc, &righthashfunc) &&
+						lefthashfunc == righthashfunc)
+						hashfuncid = lefthashfunc;
+				}
+				else
+				{
+					Oid			negator = get_negator(saop->opno);
+
+					if (OidIsValid(negator) &&
+						get_op_hash_functions(negator, &lefthashfunc, &righthashfunc) &&
+						lefthashfunc == righthashfunc)
+					{
+						hashfuncid = lefthashfunc;
+						negfuncid = get_opcode(negator);
+					}
+				}
+			}
+
+			/*
+			 * If no hash function can be found, it means that we cannot use a
+			 * hash table to handle array search because the operator does not
+			 * support hashing.
+			 *
+			 * TODO: use an alternative to hash table in this case. For now,
+			 * we just ignore this qual and don't push it, so we let the
+			 * executor handle it for us.
+			 */
+			if (!OidIsValid(hashfuncid))
+				continue;
+
+			/*
+			 * If we have a negative function set, let's use it as the
+			 * comparison function because we are in a NOT IN case.
+			 */
+			if (OidIsValid(negfuncid))
+				cmpfuncid = negfuncid;
+			else
+				cmpfuncid = saop->opfuncid;
+
+			skeyhashinfo = (ScanKeyHashInfo) palloc0(sizeof(ScanKeyHashInfoData));
+
+			/* IN or NOT IN */
+			skeyhashinfo->inclause = saop->useOr;
+			skeyhashinfo->hash_fcinfo = palloc0(SizeForFunctionCallInfo(1));
+			skeyhashinfo->match_fcinfo = palloc0(SizeForFunctionCallInfo(2));
+
+			fmgr_info(hashfuncid, &skeyhashinfo->hash_finfo);
+			fmgr_info_set_expr((Node *) saop, &skeyhashinfo->hash_finfo);
+			fmgr_info(cmpfuncid, &skeyhashinfo->match_finfo);
+			fmgr_info_set_expr((Node *) saop, &skeyhashinfo->match_finfo);
+
+			InitFunctionCallInfoData(*skeyhashinfo->hash_fcinfo,
+									 &skeyhashinfo->hash_finfo,
+									 1,
+									 saop->inputcollid,
+									 NULL,
+									 NULL);
+
+			InitFunctionCallInfoData(*skeyhashinfo->match_fcinfo,
+									 &skeyhashinfo->match_finfo,
+									 2,
+									 saop->inputcollid,
+									 NULL,
+									 NULL);
 		}
 		else
 		{
@@ -184,6 +318,19 @@ ExecSeqBuildScanKeys(PlanState *planstate, List *quals, int *numScanKeys,
 			 */
 			continue;
 		}
+
+		n_scan_keys++;
+
+		ScanKeyEntryInitialize(this_scan_key,
+							   flags,
+							   varattno,
+							   InvalidStrategy, /* no strategy */
+							   InvalidOid,	/* no subtype */
+							   collationid,
+							   opfuncid,
+							   scanvalue);
+
+		ScanKeyEntrySetHashInfo(this_scan_key, skeyhashinfo);
 	}
 
 	/*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index b7adc512189..982bdd08e74 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5373,6 +5373,66 @@ fix_tablequal_references(PlannerInfo *root, Path *best_path,
 					fixed_tablequals = lappend(fixed_tablequals, clause);
 					break;
 				}
+
+				/*
+				 * ScalarArrayOpExpr case: <leftop> <op> ANY(ARRAY(..))
+				 */
+			case T_ScalarArrayOpExpr:
+				{
+					ScalarArrayOpExpr *saopexpr = (ScalarArrayOpExpr *) clause;
+					Expr	   *leftop;
+					Expr	   *rightop;
+
+					leftop = (Expr *) get_leftop(clause);
+					rightop = (Expr *) get_rightop(clause);
+
+					if (leftop && IsA(leftop, RelabelType))
+						leftop = ((RelabelType *) leftop)->arg;
+
+					if (rightop && IsA(rightop, RelabelType))
+						rightop = ((RelabelType *) rightop)->arg;
+
+					if (leftop == NULL || rightop == NULL)
+						continue;
+
+					if (saopexpr->opno >= FirstNormalObjectId)
+						continue;
+
+					if (!get_func_leakproof(saopexpr->opfuncid))
+						continue;
+
+					if (IsA(rightop, Var) && !IsA(leftop, Var)
+						&& ((Var *) rightop)->varattno > 0)
+					{
+						Expr	   *tmpop = leftop;
+						Oid			commutator;
+
+						leftop = rightop;
+						rightop = tmpop;
+
+						commutator = get_commutator(saopexpr->opno);
+
+						if (OidIsValid(commutator))
+						{
+							saopexpr->opno = commutator;
+							saopexpr->opfuncid = get_opcode(saopexpr->opno);
+						}
+						else
+							continue;
+					}
+
+					if (!(IsA(leftop, Var) && ((Var *) leftop)->varattno > 0))
+						continue;
+
+					if (!check_tablequal_rightop(rightop))
+						continue;
+
+					list_free(saopexpr->args);
+					saopexpr->args = list_make2(leftop, rightop);
+
+					fixed_tablequals = lappend(fixed_tablequals, clause);
+					break;
+				}
 			default:
 				continue;
 		}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 7dd5e0bcd78..01373600b01 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -463,6 +463,8 @@ extern bool ResolveCminCmaxDuringDecoding(struct HTAB *tuplecid_data,
 extern void HeapCheckForSerializableConflictOut(bool visible, Relation relation, HeapTuple tuple,
 												Buffer buffer, Snapshot snapshot);
 
+extern bool HeapKeyTest(HeapTuple tuple, TupleDesc tupdesc, int nkeys, ScanKey keys);
+
 /*
  * heap_execute_freeze_tuple
  *		Execute the prepared freezing of a tuple with caller's freeze plan.
diff --git a/src/include/access/skey.h b/src/include/access/skey.h
index e650c2e7baf..0f5a556df4c 100644
--- a/src/include/access/skey.h
+++ b/src/include/access/skey.h
@@ -18,6 +18,38 @@
 #include "access/stratnum.h"
 #include "fmgr.h"
 
+/*
+ * A ScanKeyHashInfoData contains the necessary informations required to apply
+ * tuple filtering in table/heap scan when the condition is "column op
+ * ANY(ARRAY[...])".
+ *
+ * This structure is only used when pushing down quals to the Table Access
+ * Method layer in a table/heap scan context. In this case, the Table AM can
+ * use it to filter out tuples, based on a hash table.
+ *
+ * hashtab is a void pointer that will be used to store the actual reference
+ * to the hash table that will be created later during table scan.
+ *
+ * inclause indicates if the IN clause is involved. If not, then, the NOT IN
+ * clause is.
+ *
+ * hash_finfo and hash_fcinfo define the function and function call in charge
+ * of hashing a value.
+ *
+ * match_finfo and match_fcinfo define the function and function call in charge
+ * of making the comparison between two hashed values.
+ */
+typedef struct ScanKeyHashInfoData
+{
+	void	   *hashtab;
+	bool		inclause;
+	FmgrInfo	hash_finfo;
+	FmgrInfo	match_finfo;
+	FunctionCallInfo hash_fcinfo;
+	FunctionCallInfo match_fcinfo;
+}			ScanKeyHashInfoData;
+
+typedef ScanKeyHashInfoData * ScanKeyHashInfo;
 
 /*
  * A ScanKey represents the application of a comparison operator between
@@ -70,6 +102,7 @@ typedef struct ScanKeyData
 	Oid			sk_collation;	/* collation to use, if needed */
 	FmgrInfo	sk_func;		/* lookup info for function to call */
 	Datum		sk_argument;	/* data to compare */
+	ScanKeyHashInfo sk_hashinfo;	/* hash table informations */
 } ScanKeyData;
 
 typedef ScanKeyData *ScanKey;
@@ -147,5 +180,6 @@ extern void ScanKeyEntryInitializeWithInfo(ScanKey entry,
 										   Oid collation,
 										   FmgrInfo *finfo,
 										   Datum argument);
+extern void ScanKeyEntrySetHashInfo(ScanKey entry, ScanKeyHashInfo hashinfo);
 
 #endif							/* SKEY_H */
diff --git a/src/include/access/valid.h b/src/include/access/valid.h
deleted file mode 100644
index 8b33089dac4..00000000000
--- a/src/include/access/valid.h
+++ /dev/null
@@ -1,58 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * valid.h
- *	  POSTGRES tuple qualification validity definitions.
- *
- *
- * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
- * Portions Copyright (c) 1994, Regents of the University of California
- *
- * src/include/access/valid.h
- *
- *-------------------------------------------------------------------------
- */
-#ifndef VALID_H
-#define VALID_H
-
-#include "access/htup.h"
-#include "access/htup_details.h"
-#include "access/skey.h"
-#include "access/tupdesc.h"
-
-/*
- *		HeapKeyTest
- *
- *		Test a heap tuple to see if it satisfies a scan key.
- */
-static inline bool
-HeapKeyTest(HeapTuple tuple, TupleDesc tupdesc, int nkeys, ScanKey keys)
-{
-	int			cur_nkeys = nkeys;
-	ScanKey		cur_key = keys;
-
-	for (; cur_nkeys--; cur_key++)
-	{
-		Datum		atp;
-		bool		isnull;
-		Datum		test;
-
-		if (cur_key->sk_flags & SK_ISNULL)
-			return false;
-
-		atp = heap_getattr(tuple, cur_key->sk_attno, tupdesc, &isnull);
-
-		if (isnull)
-			return false;
-
-		test = FunctionCall2Coll(&cur_key->sk_func,
-								 cur_key->sk_collation,
-								 atp, cur_key->sk_argument);
-
-		if (!DatumGetBool(test))
-			return false;
-	}
-
-	return true;
-}
-
-#endif							/* VALID_H */
diff --git a/src/test/regress/expected/qual_pushdown.out b/src/test/regress/expected/qual_pushdown.out
index 7949fb949b5..75fc9c93ad0 100644
--- a/src/test/regress/expected/qual_pushdown.out
+++ b/src/test/regress/expected/qual_pushdown.out
@@ -92,6 +92,43 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
    Rows Removed In Executor by Filter: 999
 (3 rows)
 
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]);
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Seq Scan on qa (actual rows=10.00 loops=1)
+   Filter: (i = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
+   Rows Removed In Executor by Filter: 990
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=2.00 loops=1)
+   Filter: (i = ANY ('{1,2}'::integer[]))
+   Rows Removed In Executor by Filter: 998
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Seq Scan on qa (actual rows=10.00 loops=1)
+   Filter: (i = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
+   Rows Removed In Executor by Filter: 990
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::int[]);
+                        QUERY PLAN                        
+----------------------------------------------------------
+ Seq Scan on qa (actual rows=10.00 loops=1)
+   Filter: (i = ANY ((InitPlan expr_1).col1))
+   Rows Removed In Executor by Filter: 990
+   InitPlan expr_1
+     ->  Aggregate (actual rows=1.00 loops=1)
+           ->  Seq Scan on qb (actual rows=10.00 loops=1)
+                 Filter: ((j > 50) AND (j <= 60))
+                 Rows Removed In Executor by Filter: 990
+(8 rows)
+
 -- Enable quals push down
 ALTER TABLE qa SET (quals_push_down=on);
 ALTER TABLE qb SET (quals_push_down=on);
@@ -249,5 +286,94 @@ SELECT ii FROM qa WHERE i = ii AND ii < 10;
   1
 (1 row)
 
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]);
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Seq Scan on qa (actual rows=10.00 loops=1)
+   Filter: (i = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
+   Rows Removed In Table AM by Filter: 990
+(3 rows)
+
+SELECT ii FROM qa WHERE i = ANY('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]);
+ ii  
+-----
+   1
+   4
+   9
+  16
+  25
+  36
+  49
+  64
+  81
+ 100
+(10 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
+                QUERY PLAN                 
+-------------------------------------------
+ Seq Scan on qa (actual rows=2.00 loops=1)
+   Filter: (i = ANY ('{1,2}'::integer[]))
+   Rows Removed In Table AM by Filter: 998
+(3 rows)
+
+SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
+ ii 
+----
+  1
+  4
+(2 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Seq Scan on qa (actual rows=10.00 loops=1)
+   Filter: (i = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
+   Rows Removed In Table AM by Filter: 990
+(3 rows)
+
+SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
+ ii  
+-----
+   1
+   4
+   9
+  16
+  25
+  36
+  49
+  64
+  81
+ 100
+(10 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::INT[]);
+                        QUERY PLAN                        
+----------------------------------------------------------
+ Seq Scan on qa (actual rows=10.00 loops=1)
+   Filter: (i = ANY ((InitPlan expr_1).col1))
+   Rows Removed In Table AM by Filter: 990
+   InitPlan expr_1
+     ->  Aggregate (actual rows=1.00 loops=1)
+           ->  Seq Scan on qb (actual rows=10.00 loops=1)
+                 Filter: ((j > 50) AND (j <= 60))
+                 Rows Removed In Table AM by Filter: 990
+(8 rows)
+
+SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::INT[]);
+  ii  
+------
+ 2601
+ 2704
+ 2809
+ 2916
+ 3025
+ 3136
+ 3249
+ 3364
+ 3481
+ 3600
+(10 rows)
+
 DROP TABLE IF EXISTS qa;
 DROP TABLE IF EXISTS qb;
diff --git a/src/test/regress/sql/qual_pushdown.sql b/src/test/regress/sql/qual_pushdown.sql
index 0f0410cd1d5..38e88a50c33 100644
--- a/src/test/regress/sql/qual_pushdown.sql
+++ b/src/test/regress/sql/qual_pushdown.sql
@@ -19,6 +19,10 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT SQRT(j)::INT FROM qb WHERE j = 100);
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ii AND ii < 10;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::int[]);
 
 -- Enable quals push down
 ALTER TABLE qa SET (quals_push_down=on);
@@ -43,6 +47,14 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ii AND ii < 10;
 SELECT ii FROM qa WHERE i = ii AND ii < 10;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]);
+SELECT ii FROM qa WHERE i = ANY('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
+SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
+SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::INT[]);
+SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::INT[]);
 
 DROP TABLE IF EXISTS qa;
 DROP TABLE IF EXISTS qb;
-- 
2.39.5

v4-0007-Push-down-IS-IS-NOT-NULL-quals-to-table-AMs.patchtext/x-diff; charset=us-asciiDownload

From 13cd2db9ffcbb8482ea792fa1dc41aa4a234a278 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julien@tachoires.me>
Date: Tue, 2 Dec 2025 10:48:42 +0100
Subject: [PATCH 7/7] Push down IS/IS NOT NULL quals to table AMs

This commit adds suuport for IS/IS NOT NULL where clauses push down
to table AMs during table scan.
---
 src/backend/access/heap/heapam_valid.c      |  16 ++-
 src/backend/executor/nodeSeqscan.c          |  77 +++++++++--
 src/backend/optimizer/plan/createplan.c     |  30 +++++
 src/test/regress/expected/qual_pushdown.out | 141 +++++++++++++-------
 src/test/regress/sql/qual_pushdown.sql      |   7 +
 5 files changed, 209 insertions(+), 62 deletions(-)

diff --git a/src/backend/access/heap/heapam_valid.c b/src/backend/access/heap/heapam_valid.c
index 7261723e378..a05738a9144 100644
--- a/src/backend/access/heap/heapam_valid.c
+++ b/src/backend/access/heap/heapam_valid.c
@@ -129,7 +129,12 @@ HeapKeyTest(HeapTuple tuple, TupleDesc tupdesc, int nkeys, ScanKey keys)
 		bool		isnull;
 		Datum		test;
 
-		if (cur_key->sk_flags & SK_ISNULL)
+		/*
+		 * When the SK_ISNULL flag is set but we are not handling the IS/IS
+		 * NOT NULL case
+		 */
+		if ((cur_key->sk_flags & SK_ISNULL)
+			&& !(cur_key->sk_flags & (SK_SEARCHNULL | SK_SEARCHNOTNULL)))
 			return false;
 
 		atp = heap_getattr(tuple, cur_key->sk_attno, tupdesc, &isnull);
@@ -272,6 +277,15 @@ HeapKeyTest(HeapTuple tuple, TupleDesc tupdesc, int nkeys, ScanKey keys)
 
 			return false;
 		}
+		/* IS/IS NOT NULL case */
+		else if ((cur_key->sk_flags & SK_ISNULL)
+				 && (cur_key->sk_flags & (SK_SEARCHNULL | SK_SEARCHNOTNULL)))
+		{
+			if ((cur_key->sk_flags & SK_SEARCHNULL) && !isnull)
+				return false;
+			if ((cur_key->sk_flags & SK_SEARCHNOTNULL) && isnull)
+				return false;
+		}
 		else
 		{
 			if (isnull)
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index b5f0d8c23b9..30a9b092267 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -167,6 +167,18 @@ ExecSeqBuildScanKeys(PlanState *planstate, List *quals, int *numScanKeys,
 				n_runtime_keys++;
 				scanvalue = (Datum) 0;
 			}
+
+			n_scan_keys++;
+
+			ScanKeyEntryInitialize(this_scan_key,
+								   flags,
+								   varattno,
+								   InvalidStrategy, /* no strategy */
+								   InvalidOid,	/* no subtype */
+								   collationid,
+								   opfuncid,
+								   scanvalue);
+
 		}
 		/* <leftop> <op> ANY/ALL (array-expression) */
 		else if (IsA(clause, ScalarArrayOpExpr))
@@ -310,6 +322,58 @@ ExecSeqBuildScanKeys(PlanState *planstate, List *quals, int *numScanKeys,
 									 saop->inputcollid,
 									 NULL,
 									 NULL);
+
+			n_scan_keys++;
+
+			ScanKeyEntryInitialize(this_scan_key,
+								   flags,
+								   varattno,
+								   InvalidStrategy, /* no strategy */
+								   InvalidOid,	/* no subtype */
+								   collationid,
+								   opfuncid,
+								   scanvalue);
+
+			ScanKeyEntrySetHashInfo(this_scan_key, skeyhashinfo);
+
+		}
+		/* <leftop> IS/IS NOT NULL */
+		else if (IsA(clause, NullTest))
+		{
+			NullTest   *ntest = (NullTest *) clause;
+
+			leftop = ntest->arg;
+			collationid = InvalidOid;
+
+			varattno = ((Var *) leftop)->varattno;
+
+			/*
+			 * initialize the scan key's fields appropriately
+			 */
+			switch (ntest->nulltesttype)
+			{
+				case IS_NULL:
+					flags = SK_ISNULL | SK_SEARCHNULL;
+					break;
+				case IS_NOT_NULL:
+					flags = SK_ISNULL | SK_SEARCHNOTNULL;
+					break;
+				default:
+					elog(ERROR, "unrecognized nulltesttype: %d",
+						 (int) ntest->nulltesttype);
+					break;
+			}
+
+			n_scan_keys++;
+
+			ScanKeyEntryInitialize(this_scan_key,
+								   flags,
+								   varattno,
+								   InvalidStrategy, /* no strategy */
+								   InvalidOid,	/* no subtype */
+								   InvalidOid,	/* no collation */
+								   InvalidOid,	/* no reg proc for this */
+								   (Datum) 0);	/* constant */
 		}
 		else
 		{
@@ -318,19 +382,6 @@ ExecSeqBuildScanKeys(PlanState *planstate, List *quals, int *numScanKeys,
 			 */
 			continue;
 		}
-
-		n_scan_keys++;
-
-		ScanKeyEntryInitialize(this_scan_key,
-							   flags,
-							   varattno,
-							   InvalidStrategy, /* no strategy */
-							   InvalidOid,	/* no subtype */
-							   collationid,
-							   opfuncid,
-							   scanvalue);
-
-		ScanKeyEntrySetHashInfo(this_scan_key, skeyhashinfo);
 	}
 
 	/*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 982bdd08e74..1b6fba85ba7 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5433,6 +5433,36 @@ fix_tablequal_references(PlannerInfo *root, Path *best_path,
 					fixed_tablequals = lappend(fixed_tablequals, clause);
 					break;
 				}
+
+				/*
+				 * NullTest: <leftop> IS/IS NOT NULL
+				 */
+			case T_NullTest:
+				{
+					NullTest   *nt = (NullTest *) clause;
+					Expr	   *leftop;
+
+					leftop = (Expr *) nt->arg;
+
+					/*
+					 * Handle relabeling and make sure our left part is a
+					 * column name.
+					 */
+					if (leftop && IsA(leftop, RelabelType))
+						leftop = ((RelabelType *) leftop)->arg;
+
+					if (leftop == NULL)
+						continue;
+
+					if (!(IsA(leftop, Var) && ((Var *) leftop)->varattno > 0))
+						continue;
+
+					/* Override Null test arg in case of relabeling */
+					nt->arg = leftop;
+
+					fixed_tablequals = lappend(fixed_tablequals, clause);
+					break;
+				}
 			default:
 				continue;
 		}
diff --git a/src/test/regress/expected/qual_pushdown.out b/src/test/regress/expected/qual_pushdown.out
index 75fc9c93ad0..60baa3f4275 100644
--- a/src/test/regress/expected/qual_pushdown.out
+++ b/src/test/regress/expected/qual_pushdown.out
@@ -6,16 +6,17 @@ CREATE TABLE qa (i INTEGER, ii INTEGER);
 CREATE TABLE qb (j INTEGER);
 INSERT INTO qa SELECT n, n * n  FROM generate_series(1, 1000) as n;
 INSERT INTO qb SELECT n FROM generate_series(1, 1000) as n;
+INSERT INTO qa VALUES (1001, NULL);
 ANALYZE qa;
 ANALYZE qb;
 -- By default, the quals are not pushed down. The tuples are filtered out by
 -- the executor.
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = 100;
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (i = 100)
-   Rows Removed In Executor by Filter: 999
+   Rows Removed In Executor by Filter: 1000
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i < 10;
@@ -23,15 +24,15 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -------------------------------------------
  Seq Scan on qa (actual rows=9.00 loops=1)
    Filter: (i < 10)
-   Rows Removed In Executor by Filter: 991
+   Rows Removed In Executor by Filter: 992
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 100 = i;
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (100 = i)
-   Rows Removed In Executor by Filter: 999
+   Rows Removed In Executor by Filter: 1000
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 10 > i;
@@ -39,23 +40,23 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -------------------------------------------
  Seq Scan on qa (actual rows=9.00 loops=1)
    Filter: (10 > i)
-   Rows Removed In Executor by Filter: 991
+   Rows Removed In Executor by Filter: 992
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = SQRT(25)::INT;
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (i = 5)
-   Rows Removed In Executor by Filter: 999
+   Rows Removed In Executor by Filter: 1000
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT 100);
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (i = (InitPlan expr_1).col1)
-   Rows Removed In Executor by Filter: 999
+   Rows Removed In Executor by Filter: 1000
    InitPlan expr_1
      ->  Result (actual rows=1.00 loops=1)
 (5 rows)
@@ -65,7 +66,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 ---------------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (i = (InitPlan expr_1).col1)
-   Rows Removed In Executor by Filter: 999
+   Rows Removed In Executor by Filter: 1000
    InitPlan expr_1
      ->  Seq Scan on qb (actual rows=1.00 loops=1)
            Filter: (j = 100)
@@ -73,23 +74,23 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 (7 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
-                   QUERY PLAN                    
--------------------------------------------------
+                    QUERY PLAN                    
+--------------------------------------------------
  Nested Loop (actual rows=1.00 loops=1)
    ->  Seq Scan on qa (actual rows=1.00 loops=1)
          Filter: (i = 100)
-         Rows Removed In Executor by Filter: 999
+         Rows Removed In Executor by Filter: 1000
    ->  Seq Scan on qb (actual rows=1.00 loops=1)
          Filter: (j = 100)
          Rows Removed In Executor by Filter: 999
 (7 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ii AND ii < 10;
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: ((ii < 10) AND (i = ii))
-   Rows Removed In Executor by Filter: 999
+   Rows Removed In Executor by Filter: 1000
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]);
@@ -97,7 +98,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -----------------------------------------------------------
  Seq Scan on qa (actual rows=10.00 loops=1)
    Filter: (i = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
-   Rows Removed In Executor by Filter: 990
+   Rows Removed In Executor by Filter: 991
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
@@ -105,7 +106,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -------------------------------------------
  Seq Scan on qa (actual rows=2.00 loops=1)
    Filter: (i = ANY ('{1,2}'::integer[]))
-   Rows Removed In Executor by Filter: 998
+   Rows Removed In Executor by Filter: 999
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
@@ -113,7 +114,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -----------------------------------------------------------
  Seq Scan on qa (actual rows=10.00 loops=1)
    Filter: (i = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
-   Rows Removed In Executor by Filter: 990
+   Rows Removed In Executor by Filter: 991
 (3 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::int[]);
@@ -121,7 +122,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 ----------------------------------------------------------
  Seq Scan on qa (actual rows=10.00 loops=1)
    Filter: (i = ANY ((InitPlan expr_1).col1))
-   Rows Removed In Executor by Filter: 990
+   Rows Removed In Executor by Filter: 991
    InitPlan expr_1
      ->  Aggregate (actual rows=1.00 loops=1)
            ->  Seq Scan on qb (actual rows=10.00 loops=1)
@@ -129,16 +130,32 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
                  Rows Removed In Executor by Filter: 990
 (8 rows)
 
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT i, ii FROM qa WHERE ii IS NULL;
+                 QUERY PLAN                 
+--------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (ii IS NULL)
+   Rows Removed In Executor by Filter: 1000
+(3 rows)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT i, ii FROM qa WHERE ii IS NOT NULL AND i >= 1000;;
+                  QUERY PLAN                  
+----------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: ((ii IS NOT NULL) AND (i >= 1000))
+   Rows Removed In Executor by Filter: 1000
+(3 rows)
+
 -- Enable quals push down
 ALTER TABLE qa SET (quals_push_down=on);
 ALTER TABLE qb SET (quals_push_down=on);
 -- Now, we expect to see the tuples being filtered out by the table AM
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = 100;
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (i = 100)
-   Rows Removed In Table AM by Filter: 999
+   Rows Removed In Table AM by Filter: 1000
 (3 rows)
 
 SELECT ii FROM qa WHERE i = 100;
@@ -152,7 +169,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -------------------------------------------
  Seq Scan on qa (actual rows=9.00 loops=1)
    Filter: (i < 10)
-   Rows Removed In Table AM by Filter: 991
+   Rows Removed In Table AM by Filter: 992
 (3 rows)
 
 SELECT ii FROM qa WHERE i < 10;
@@ -170,11 +187,11 @@ SELECT ii FROM qa WHERE i < 10;
 (9 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE 100 = i;
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (100 = i)
-   Rows Removed In Table AM by Filter: 999
+   Rows Removed In Table AM by Filter: 1000
 (3 rows)
 
 SELECT ii FROM qa WHERE 100 = i;
@@ -188,7 +205,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -------------------------------------------
  Seq Scan on qa (actual rows=9.00 loops=1)
    Filter: (10 > i)
-   Rows Removed In Table AM by Filter: 991
+   Rows Removed In Table AM by Filter: 992
 (3 rows)
 
 SELECT ii FROM qa WHERE 10 > i;
@@ -206,11 +223,11 @@ SELECT ii FROM qa WHERE 10 > i;
 (9 rows)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = SQRT(25)::INT;
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (i = 5)
-   Rows Removed In Table AM by Filter: 999
+   Rows Removed In Table AM by Filter: 1000
 (3 rows)
 
 SELECT ii FROM qa WHERE i = SQRT(25)::INT;
@@ -220,11 +237,11 @@ SELECT ii FROM qa WHERE i = SQRT(25)::INT;
 (1 row)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = (SELECT 100);
-                QUERY PLAN                 
--------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (i = (InitPlan expr_1).col1)
-   Rows Removed In Table AM by Filter: 999
+   Rows Removed In Table AM by Filter: 1000
    InitPlan expr_1
      ->  Result (actual rows=1.00 loops=1)
 (5 rows)
@@ -240,7 +257,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 ---------------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: (i = (InitPlan expr_1).col1)
-   Rows Removed In Table AM by Filter: 999
+   Rows Removed In Table AM by Filter: 1000
    InitPlan expr_1
      ->  Seq Scan on qb (actual rows=1.00 loops=1)
            Filter: (j = 100)
@@ -254,12 +271,12 @@ SELECT ii FROM qa WHERE i = (SELECT SQRT(j)::INT FROM qb WHERE j = 100);
 (1 row)
 
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa JOIN qb ON (qa.i = qb.j) WHERE j = 100;
-                   QUERY PLAN                    
--------------------------------------------------
+                    QUERY PLAN                    
+--------------------------------------------------
  Nested Loop (actual rows=1.00 loops=1)
    ->  Seq Scan on qa (actual rows=1.00 loops=1)
          Filter: (i = 100)
-         Rows Removed In Table AM by Filter: 999
+         Rows Removed In Table AM by Filter: 1000
    ->  Seq Scan on qb (actual rows=1.00 loops=1)
          Filter: (j = 100)
          Rows Removed In Table AM by Filter: 999
@@ -276,7 +293,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -------------------------------------------
  Seq Scan on qa (actual rows=1.00 loops=1)
    Filter: ((ii < 10) AND (i = ii))
-   Rows Removed In Table AM by Filter: 997
+   Rows Removed In Table AM by Filter: 998
    Rows Removed In Executor by Filter: 2
 (4 rows)
 
@@ -291,7 +308,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -----------------------------------------------------------
  Seq Scan on qa (actual rows=10.00 loops=1)
    Filter: (i = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
-   Rows Removed In Table AM by Filter: 990
+   Rows Removed In Table AM by Filter: 991
 (3 rows)
 
 SELECT ii FROM qa WHERE i = ANY('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]);
@@ -314,7 +331,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -------------------------------------------
  Seq Scan on qa (actual rows=2.00 loops=1)
    Filter: (i = ANY ('{1,2}'::integer[]))
-   Rows Removed In Table AM by Filter: 998
+   Rows Removed In Table AM by Filter: 999
 (3 rows)
 
 SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
@@ -329,7 +346,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 -----------------------------------------------------------
  Seq Scan on qa (actual rows=10.00 loops=1)
    Filter: (i = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
-   Rows Removed In Table AM by Filter: 990
+   Rows Removed In Table AM by Filter: 991
 (3 rows)
 
 SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
@@ -352,7 +369,7 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 ----------------------------------------------------------
  Seq Scan on qa (actual rows=10.00 loops=1)
    Filter: (i = ANY ((InitPlan expr_1).col1))
-   Rows Removed In Table AM by Filter: 990
+   Rows Removed In Table AM by Filter: 991
    InitPlan expr_1
      ->  Aggregate (actual rows=1.00 loops=1)
            ->  Seq Scan on qb (actual rows=10.00 loops=1)
@@ -375,5 +392,33 @@ SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j
  3600
 (10 rows)
 
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT i, ii FROM qa WHERE ii IS NULL;
+                 QUERY PLAN                 
+--------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: (ii IS NULL)
+   Rows Removed In Table AM by Filter: 1000
+(3 rows)
+
+SELECT i, ii FROM qa WHERE ii IS NULL;
+  i   | ii 
+------+----
+ 1001 |   
+(1 row)
+
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT i, ii FROM qa WHERE ii IS NOT NULL AND i >= 1000;;
+                  QUERY PLAN                  
+----------------------------------------------
+ Seq Scan on qa (actual rows=1.00 loops=1)
+   Filter: ((ii IS NOT NULL) AND (i >= 1000))
+   Rows Removed In Table AM by Filter: 1000
+(3 rows)
+
+SELECT i, ii FROM qa WHERE ii IS NOT NULL AND i >= 1000;;
+  i   |   ii    
+------+---------
+ 1000 | 1000000
+(1 row)
+
 DROP TABLE IF EXISTS qa;
 DROP TABLE IF EXISTS qb;
diff --git a/src/test/regress/sql/qual_pushdown.sql b/src/test/regress/sql/qual_pushdown.sql
index 38e88a50c33..50d6f9b316a 100644
--- a/src/test/regress/sql/qual_pushdown.sql
+++ b/src/test/regress/sql/qual_pushdown.sql
@@ -5,6 +5,7 @@ CREATE TABLE qa (i INTEGER, ii INTEGER);
 CREATE TABLE qb (j INTEGER);
 INSERT INTO qa SELECT n, n * n  FROM generate_series(1, 1000) as n;
 INSERT INTO qb SELECT n FROM generate_series(1, 1000) as n;
+INSERT INTO qa VALUES (1001, NULL);
 ANALYZE qa;
 ANALYZE qb;
 
@@ -23,6 +24,8 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY('{1, 2}'::INT[]);
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::int[]);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT i, ii FROM qa WHERE ii IS NULL;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT i, ii FROM qa WHERE ii IS NOT NULL AND i >= 1000;;
 
 -- Enable quals push down
 ALTER TABLE qa SET (quals_push_down=on);
@@ -55,6 +58,10 @@ EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FRO
 SELECT ii FROM qa WHERE NOT (i <> ALL('{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}'::INT[]));
 EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::INT[]);
 SELECT ii FROM qa WHERE i = ANY((SELECT array_agg(j) FROM qb WHERE j > 50 AND j <= 60)::INT[]);
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT i, ii FROM qa WHERE ii IS NULL;
+SELECT i, ii FROM qa WHERE ii IS NULL;
+EXPLAIN (ANALYZE, COSTS off, TIMING off, SUMMARY off, BUFFERS off) SELECT i, ii FROM qa WHERE ii IS NOT NULL AND i >= 1000;;
+SELECT i, ii FROM qa WHERE ii IS NOT NULL AND i >= 1000;;
 
 DROP TABLE IF EXISTS qa;
 DROP TABLE IF EXISTS qb;
-- 
2.39.5

Robert Haas

robertmhaas@gmail.com

about 1 month ago

In reply to: Julien Tachoires (#4)

Re: Qual push down to table AM

On Fri, Aug 29, 2025 at 4:38 AM Julien Tachoires <julien@tachoires.me> wrote:

Thank you for this quick feedback.

One potential approach to solve this in heapgettup() would be:
1. hold the buffer lock
2. get the tuple from the buffer
3. if the tuple is not visible, move to the next tuple, back to 2.
4. release the buffer lock
5. if the tuple does not satisfy the scan keys, take the buffer lock,
move to the next tuple, back to 2.
6. return the tuple

Do you see something fundamentally wrong here?

I spent a bit of time this afternoon looking at v4-0001. I noticed a
few spelling mistakes (abritrary x2, statisfied x1). As far as the
basic approach is concerned, I don't see how there can be a safety
problem here. If it's safe to release the buffer lock when we find a
tuple that matches the quals, for the purposes of returning that tuple
to the caller, then it seems like it must also be safe to release it
to evaluate a proposed qual.

Potentially, there could be a performance problem. Imagine that we
have some code right now that uses this code path and it's safe
because the qual that we're evaluating is something super-simple like
the integer less-than operator, so calling it under the buffer lock
doesn't create a stability hazard. Well, with the patch, we'd
potentially take and release the buffer lock a lot more times than we
do right now. Imagine that there are lots of tuples on each page but
only 1 or very few of them satisfy the qual: then we lock and unlock
the buffer a whole bunch of times instead of just once.

However, I don't think this really happens in practice. I believe it's
possible to take this code path if you set ignore_system_indexes=on,
because that turns index scans --- which, not surprisingly, have
scankeys --- into sequential scans which then end up also having
scankeys. Many of those scans use catalog snapshots so there's no
issue, but a little bit of debugging code seems to show that
systable_beginscan() can also be called with snapshot->snapshot_type
set to SNAPSHOT_ANY or SNAPSHOT_DIRTY. For example, see
GetNewOidWithIndex(). However, even if ignore_system_indexes=on gets a
little slower as a result of this or some other patch, I don't think
we really care, and without that setting, this code doesn't seem to
get exercised at all.

So, somewhat to my surprise, I think that v4-0001 might be basically
fine. I wonder if anyone else sees a problem that I'm missing?

--
Robert Haas
EDB: http://www.enterprisedb.com

Andres Freund

andres@anarazel.de

about 1 month ago

In reply to: Robert Haas (#8)

Re: Qual push down to table AM

Hi,

On 2025-12-09 16:40:17 -0500, Robert Haas wrote:

On Fri, Aug 29, 2025 at 4:38 AM Julien Tachoires <julien@tachoires.me> wrote:
Potentially, there could be a performance problem

I think the big performance hazard with this is repeated deforming. The
scankey infrastructure deforms attributes one-by-one *and* it does not
"persist" the work of deforming for later accesses. So if you e.g. have
something like

SELECT sum(col_29) FROM tbl WHERE col_30 = common_value;
or
SELECT * FROM tbl WHERE col_30 = common_value;

we'll now deform col_30 in isolation for the ScanKey evaluation and then we'll
deform columns 1-29 in the slot (because we always deform all the leading
columns), during projection.

But even leaving the slot issue aside, I'd bet that you'll see overhead due to
*not* deforming multiple columns at once. If you have a ScanKey version of
something like
WHERE column_20 = common_val AND column_21 = some_val AND column_22 = another_val;

and there's a NULL or varlena value in one of the leading columns, we'll redo
a fair bit of work during the fastgetattr() for column_22.

I don't really see this being viable without first tackling two nontrivial
projects:

1) Make slot deforming for expressions & projections selective, i.e. don't
deform all the leading columns, but only ones that will eventually be
needed
2) Perform ScanKey evaluation in slot form, to be able to cache the deforming
and to make deforming of multiple columns sufficiently efficient.

So, somewhat to my surprise, I think that v4-0001 might be basically
fine. I wonder if anyone else sees a problem that I'm missing?

I doubt this would be safe as-is: ISTM that if you release the page lock
between tuples, things like the number of items on the page can change. But we
store stuff like that in registers / on the stack, which could change while
the lock is not held.

We could refetch the number items on the page for every loop iteration, but
that'd probably not be free. OTOH, it's probably nothing compared to the cost
of relocking the page...

Greetings,

Andres Freund

#10

Robert Haas

robertmhaas@gmail.com

about 1 month ago

In reply to: Andres Freund (#9)

Re: Qual push down to table AM

On Tue, Dec 9, 2025 at 6:08 PM Andres Freund <andres@anarazel.de> wrote:

On 2025-12-09 16:40:17 -0500, Robert Haas wrote:

On Fri, Aug 29, 2025 at 4:38 AM Julien Tachoires <julien@tachoires.me> wrote:
Potentially, there could be a performance problem

I think the big performance hazard with this is repeated deforming. The
scankey infrastructure deforms attributes one-by-one *and* it does not
"persist" the work of deforming for later accesses. So if you e.g. have
something like

SELECT sum(col_29) FROM tbl WHERE col_30 = common_value;
or
SELECT * FROM tbl WHERE col_30 = common_value;

we'll now deform col_30 in isolation for the ScanKey evaluation and then we'll
deform columns 1-29 in the slot (because we always deform all the leading
columns), during projection.

Hmm, this is a good point, and I agree that it's a huge challenge for
this patch set. Repeated tuple deforming is REALLY expensive, which is
why we've spent so much energy trying to use slots in an as many
places as possible. I find it easy to believe that HeapKeyTest's loop
over heap_getattr() is going to prohibitively painful and that this
code will need to somehow also be slot-ified for this to be a viable
project.

I don't really see this being viable without first tackling two nontrivial
projects:

1) Make slot deforming for expressions & projections selective, i.e. don't
deform all the leading columns, but only ones that will eventually be
needed
2) Perform ScanKey evaluation in slot form, to be able to cache the deforming
and to make deforming of multiple columns sufficiently efficient.

IOW, I agree that we probably need to do #2. I am not entirely sure
about #1. I'm a little afraid that trying to skip over columns without
deforming them will add a bunch of code complexity that doesn't really
pay off. You have to do the bookkeeping to know what to skip, and then
how much are you really gaining by skipping it? If you can skip over a
bunch of fixed-width columns, that's cool, but it's probably fairly
normal to have lots of varlena columns, and then I don't really see
that you're gaining much here. You still have to iterate through the
tuple, and not storing the pointer to the start of each column as you
find it doesn't seem like it will save much. What's your reasoning
behind thinking that #1 will be necessary?

So, somewhat to my surprise, I think that v4-0001 might be basically
fine. I wonder if anyone else sees a problem that I'm missing?

I doubt this would be safe as-is: ISTM that if you release the page lock
between tuples, things like the number of items on the page can change. But we
store stuff like that in registers / on the stack, which could change while
the lock is not held.

We could refetch the number items on the page for every loop iteration, but
that'd probably not be free. OTOH, it's probably nothing compared to the cost
of relocking the page...

We still hold a pin, though, which I think means very little can
change. More items can be added to the page, so we might want to
refresh the number of items on the page at least when we think we're
done, but I believe that any sort of more invasive page rearrangement
would be precluded by the pin.

I kind of wonder if it would be good to make a change along the lines
of v4-0001 even if this patch set doesn't move forward overall, or
will need a lot of slot-ification to do so. It seems weird to me that
we're OK with calling out to arbitrary code with a buffer lock held,
and even weirder that whether or not we do that depends on whether
SO_ALLOW_PAGEMODE was set. I don't think a difference of this kind
between pagemode behavior and non-pagemode behavior would survive
review if someone proposed it today; the fact that it works the way it
does is probably an artifact of this mechanism having been added
twenty years ago when the project was in a very different place.

--
Robert Haas
EDB: http://www.enterprisedb.com

#11

Amit Langote

amitlangote09@gmail.com

29 days ago

In reply to: Robert Haas (#10)

Re: Qual push down to table AM

On Thu, Dec 11, 2025 at 12:41 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Dec 9, 2025 at 6:08 PM Andres Freund <andres@anarazel.de> wrote:

On 2025-12-09 16:40:17 -0500, Robert Haas wrote:

On Fri, Aug 29, 2025 at 4:38 AM Julien Tachoires <julien@tachoires.me> wrote:
Potentially, there could be a performance problem

I think the big performance hazard with this is repeated deforming. The
scankey infrastructure deforms attributes one-by-one *and* it does not
"persist" the work of deforming for later accesses. So if you e.g. have
something like

SELECT sum(col_29) FROM tbl WHERE col_30 = common_value;
or
SELECT * FROM tbl WHERE col_30 = common_value;

we'll now deform col_30 in isolation for the ScanKey evaluation and then we'll
deform columns 1-29 in the slot (because we always deform all the leading
columns), during projection.

Hmm, this is a good point, and I agree that it's a huge challenge for
this patch set. Repeated tuple deforming is REALLY expensive, which is
why we've spent so much energy trying to use slots in an as many
places as possible. I find it easy to believe that HeapKeyTest's loop
over heap_getattr() is going to prohibitively painful and that this
code will need to somehow also be slot-ified for this to be a viable
project.

I don't really see this being viable without first tackling two nontrivial
projects:

1) Make slot deforming for expressions & projections selective, i.e. don't
deform all the leading columns, but only ones that will eventually be
needed
2) Perform ScanKey evaluation in slot form, to be able to cache the deforming
and to make deforming of multiple columns sufficiently efficient.

IOW, I agree that we probably need to do #2. I am not entirely sure
about #1.

I'm also curious to understand why Andres sees #1 as a prerequisite
for qual pushdown.

I'm a little afraid that trying to skip over columns without
deforming them will add a bunch of code complexity that doesn't really
pay off.

I think it might be worthwhile. I have a PoC [1]/messages/by-id/CA+HiwqHXDY6TxegR2Cr_4sRa_LY1QJnoL8XRmOqdfrx21pZ6cw@mail.gmail.com I worked on (at
Andres's suggestion) that showed ~2x improvement on simple aggregation
queries over wide tables (all pages buffered) by tracking the minimum
needed attribute and using cached offsets stored in TupleDesc to skip
fixed-not-null prefixes. I'm thinking of reviving it with proper
tracking of which attributes are needed and deformed (bitmapset or
flag array in TupleTableSlot).

So, somewhat to my surprise, I think that v4-0001 might be basically
fine. I wonder if anyone else sees a problem that I'm missing?

I doubt this would be safe as-is: ISTM that if you release the page lock
between tuples, things like the number of items on the page can change. But we
store stuff like that in registers / on the stack, which could change while
the lock is not held.

We could refetch the number items on the page for every loop iteration, but
that'd probably not be free. OTOH, it's probably nothing compared to the cost
of relocking the page...

We still hold a pin, though, which I think means very little can
change. More items can be added to the page, so we might want to
refresh the number of items on the page at least when we think we're
done, but I believe that any sort of more invasive page rearrangement
would be precluded by the pin.

I kind of wonder if it would be good to make a change along the lines
of v4-0001 even if this patch set doesn't move forward overall, or
will need a lot of slot-ification to do so. It seems weird to me that
we're OK with calling out to arbitrary code with a buffer lock held,
and even weirder that whether or not we do that depends on whether
SO_ALLOW_PAGEMODE was set. I don't think a difference of this kind
between pagemode behavior and non-pagemode behavior would survive
review if someone proposed it today; the fact that it works the way it
does is probably an artifact of this mechanism having been added
twenty years ago when the project was in a very different place.

One maybe crazy thought: what about only enabling qual pushdown when
pagemode is used, since it already processes all tuples on a page in
one locked phase? That raises the question of whether there's a class
of quals simple enough (built-in ops?) that evaluating them alongside
visibility checking would be acceptable, with lock held that is -- but
it would avoid the lock churn and racy loop termination issues with
v4-0001.

--
Thanks, Amit Langote

[1]: /messages/by-id/CA+HiwqHXDY6TxegR2Cr_4sRa_LY1QJnoL8XRmOqdfrx21pZ6cw@mail.gmail.com

#12

Andres Freund

andres@anarazel.de

29 days ago

In reply to: Amit Langote (#11)

Re: Qual push down to table AM

Hi,

On 2025-12-15 21:56:12 +0900, Amit Langote wrote:

On Thu, Dec 11, 2025 at 12:41 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Dec 9, 2025 at 6:08 PM Andres Freund <andres@anarazel.de> wrote:

On 2025-12-09 16:40:17 -0500, Robert Haas wrote:

On Fri, Aug 29, 2025 at 4:38 AM Julien Tachoires <julien@tachoires.me> wrote:
Potentially, there could be a performance problem

I think the big performance hazard with this is repeated deforming. The
scankey infrastructure deforms attributes one-by-one *and* it does not
"persist" the work of deforming for later accesses. So if you e.g. have
something like

SELECT sum(col_29) FROM tbl WHERE col_30 = common_value;
or
SELECT * FROM tbl WHERE col_30 = common_value;

we'll now deform col_30 in isolation for the ScanKey evaluation and then we'll
deform columns 1-29 in the slot (because we always deform all the leading
columns), during projection.

Hmm, this is a good point, and I agree that it's a huge challenge for
this patch set. Repeated tuple deforming is REALLY expensive, which is
why we've spent so much energy trying to use slots in an as many
places as possible. I find it easy to believe that HeapKeyTest's loop
over heap_getattr() is going to prohibitively painful and that this
code will need to somehow also be slot-ified for this to be a viable
project.

I don't really see this being viable without first tackling two nontrivial
projects:

1) Make slot deforming for expressions & projections selective, i.e. don't
deform all the leading columns, but only ones that will eventually be
needed
2) Perform ScanKey evaluation in slot form, to be able to cache the deforming
and to make deforming of multiple columns sufficiently efficient.

IOW, I agree that we probably need to do #2. I am not entirely sure
about #1.

I'm also curious to understand why Andres sees #1 as a prerequisite
for qual pushdown.

I suspect you'll not see a whole lot of gain without it. When I experimented
with it, a good portion (but not all!) of the gain seemed to be from just
deforming the immediately required columns - but also a lot of the visible
regressions were from that.

We still hold a pin, though, which I think means very little can
change. More items can be added to the page, so we might want to
refresh the number of items on the page at least when we think we're
done, but I believe that any sort of more invasive page rearrangement
would be precluded by the pin.

I kind of wonder if it would be good to make a change along the lines
of v4-0001 even if this patch set doesn't move forward overall, or
will need a lot of slot-ification to do so. It seems weird to me that
we're OK with calling out to arbitrary code with a buffer lock held,
and even weirder that whether or not we do that depends on whether
SO_ALLOW_PAGEMODE was set. I don't think a difference of this kind
between pagemode behavior and non-pagemode behavior would survive
review if someone proposed it today; the fact that it works the way it
does is probably an artifact of this mechanism having been added
twenty years ago when the project was in a very different place.

One maybe crazy thought: what about only enabling qual pushdown when
pagemode is used, since it already processes all tuples on a page in
one locked phase? That raises the question of whether there's a class
of quals simple enough (built-in ops?) that evaluating them alongside
visibility checking would be acceptable, with lock held that is -- but
it would avoid the lock churn and racy loop termination issues with
v4-0001.

I think that's the wrong direction to go. We shouldn't do more under the lock,
we should be less. You certainly couldn't just use builtin ops, as some of
them *do* other catalog lookups, which would lead to deadlock potential.

I think it's also just unnecessary to try to do anything under a lock here,
it's not hard to first do all the visibility checks while locked and then,
after unlocking, filter the tuples based on quals.

Greetings,

Andres Freund

#13

Andres Freund

andres@anarazel.de

29 days ago

In reply to: Robert Haas (#10)

Re: Qual push down to table AM

Hi,

On 2025-12-10 10:41:19 -0500, Robert Haas wrote:

On Tue, Dec 9, 2025 at 6:08 PM Andres Freund <andres@anarazel.de> wrote:

I don't really see this being viable without first tackling two nontrivial
projects:

1) Make slot deforming for expressions & projections selective, i.e. don't
deform all the leading columns, but only ones that will eventually be
needed
2) Perform ScanKey evaluation in slot form, to be able to cache the deforming
and to make deforming of multiple columns sufficiently efficient.

IOW, I agree that we probably need to do #2. I am not entirely sure
about #1. I'm a little afraid that trying to skip over columns without
deforming them will add a bunch of code complexity that doesn't really
pay off. You have to do the bookkeeping to know what to skip, and then
how much are you really gaining by skipping it? If you can skip over a
bunch of fixed-width columns, that's cool, but it's probably fairly
normal to have lots of varlena columns, and then I don't really see
that you're gaining much here. You still have to iterate through the
tuple, and not storing the pointer to the start of each column as you
find it doesn't seem like it will save much.

FWIW, in experiments I observed that not storing all the columns that never
are used saves pretty decent amount of cycles, just due to not having to store
the never-accessed datums in the slot (and in case of non-varlena columns, not
having to fetch the relevant data). It's probably true that the gain for
varlenas is smaller, due to the cost of determining the length, but the
difference is that if the cost of storing the columns is relevant, fields
after a NULL or a varlena still can beenfit from the optimization.

What's your reasoning behind thinking that #1 will be necessary?

Tried to answer that downthread, in response to Amit.

So, somewhat to my surprise, I think that v4-0001 might be basically
fine. I wonder if anyone else sees a problem that I'm missing?

I doubt this would be safe as-is: ISTM that if you release the page lock
between tuples, things like the number of items on the page can change. But we
store stuff like that in registers / on the stack, which could change while
the lock is not held.

We could refetch the number items on the page for every loop iteration, but
that'd probably not be free. OTOH, it's probably nothing compared to the cost
of relocking the page...

We still hold a pin, though, which I think means very little can
change. More items can be added to the page, so we might want to
refresh the number of items on the page at least when we think we're
done, but I believe that any sort of more invasive page rearrangement
would be precluded by the pin.

I kind of wonder if it would be good to make a change along the lines
of v4-0001 even if this patch set doesn't move forward overall, or
will need a lot of slot-ification to do so. It seems weird to me that
we're OK with calling out to arbitrary code with a buffer lock held,
and even weirder that whether or not we do that depends on whether
SO_ALLOW_PAGEMODE was set.

I don't think it's stated clearly anywhere - the only reason this is remotely
within a stone's throw of ok is that the only code using ScanKeys for table
scans is catcache, which in turn means that the set of effectively allowed
operators is tiny (oideq, int2eq, int4eq, int8eq, nameeq, chareq, booleq and
perhaps 2-3 more).

And for those we only support ScanKey mode because it's required as a fallback
for index based catcache searches.

I'm not against fixing qual-eval-under-buffer-lock, it shouldn't ever be used
in particularly performance sensitive cases.

Greetings,

Andres Freund

#14

Maxime Schoemans

maxime.schoemans@enterprisedb.com

26 days ago

In reply to: Andres Freund (#9)

1 attachment(s)

Re: Qual push down to table AM

Hi,

On 10 Dec 2025, at 00:08, Andres Freund <andres@anarazel.de> wrote:
I don't really see this being viable without first tackling two nontrivial
projects:

2) Perform ScanKey evaluation in slot form, to be able to cache the deforming
and to make deforming of multiple columns sufficiently efficient.

Am I right in understanding that you think that the repeated calls to
heap_getattr in HeapKeyTest is not ideal if we have NULL or varlena
columns? I have written a small patch (see attached) that stores the heap
tuple in a TupleTableSlot first and then calls slot_getattr instead, which
should benefit from caching. Is that the type of solution you were thinking of?

It is definitely not a complete patch (needs comments and a description),
and it is not merged into the patch set of Julien yet, but I just wanted to
check that this was what you were proposing and that I was not
misunderstanding something.

1) Make slot deforming for expressions & projections selective, i.e. don't
deform all the leading columns, but only ones that will eventually be
needed

Concerning 1), I’m also not certain I understand why this is a prerequisite for
the pushdown work. It could certainly be beneficial, but it seems to be
complementary. In any case, I’d be interested to look at your POC patch
on the subject, Amit.

Best,

Maxime Schoemans

Attachments:

Perform-ScanKey-evaluation-in-slot-form.patchapplication/octet-stream; name=Perform-ScanKey-evaluation-in-slot-form.patch; x-unix-mode=0644Download

From 9035f48eb107e454012e37fa62d67cee6d1d5f8f Mon Sep 17 00:00:00 2001
From: Maxime Schoemans <maxime.schoemans@enterprisedb.com>
Date: Thu, 18 Dec 2025 15:34:26 +0100
Subject: [PATCH v1] Perform ScanKey evaluation in slot form

---
 src/backend/access/heap/heapam.c | 33 +++++++++++++++++++-------------
 src/include/access/valid.h       |  9 +++++++--
 2 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6daf4a87dec..97fc583d8c4 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -31,6 +31,7 @@
  */
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/heapam.h"
 #include "access/heaptoast.h"
 #include "access/hio.h"
@@ -43,6 +44,7 @@
 #include "catalog/pg_database.h"
 #include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
+#include "executor/tuptable.h"
 #include "pgstat.h"
 #include "port/pg_bitutils.h"
 #include "storage/lmgr.h"
@@ -911,7 +913,8 @@ static void
 heapgettup(HeapScanDesc scan,
 		   ScanDirection dir,
 		   int nkeys,
-		   ScanKey key)
+		   ScanKey key,
+		   TupleTableSlot *slot)
 {
 	HeapTuple	tuple = &(scan->rs_ctup);
 	Page		page;
@@ -975,10 +978,13 @@ continue_page:
 			if (!visible)
 				continue;
 
+			if (slot)
+				ExecStoreBufferHeapTuple(&scan->rs_ctup, slot, scan->rs_cbuf);
+
 			/* skip any tuples that don't match the scan key */
 			if (key != NULL &&
 				!HeapKeyTest(tuple, RelationGetDescr(scan->rs_base.rs_rd),
-							 nkeys, key))
+							 nkeys, key, slot))
 				continue;
 
 			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
@@ -1021,7 +1027,8 @@ static void
 heapgettup_pagemode(HeapScanDesc scan,
 					ScanDirection dir,
 					int nkeys,
-					ScanKey key)
+					ScanKey key,
+					TupleTableSlot *slot)
 {
 	HeapTuple	tuple = &(scan->rs_ctup);
 	Page		page;
@@ -1083,10 +1090,13 @@ continue_page:
 			tuple->t_len = ItemIdGetLength(lpp);
 			ItemPointerSetOffsetNumber(&tuple->t_self, lineoff);
 
+			if (slot)
+				ExecStoreBufferHeapTuple(&scan->rs_ctup, slot, scan->rs_cbuf);
+
 			/* skip any tuples that don't match the scan key */
 			if (key != NULL &&
 				!HeapKeyTest(tuple, RelationGetDescr(scan->rs_base.rs_rd),
-							 nkeys, key))
+							 nkeys, key, slot))
 				continue;
 
 			scan->rs_cindex = lineindex;
@@ -1388,10 +1398,10 @@ heap_getnext(TableScanDesc sscan, ScanDirection direction)
 
 	if (scan->rs_base.rs_flags & SO_ALLOW_PAGEMODE)
 		heapgettup_pagemode(scan, direction,
-							scan->rs_base.rs_nkeys, scan->rs_base.rs_key);
+							scan->rs_base.rs_nkeys, scan->rs_base.rs_key, NULL);
 	else
 		heapgettup(scan, direction,
-				   scan->rs_base.rs_nkeys, scan->rs_base.rs_key);
+				   scan->rs_base.rs_nkeys, scan->rs_base.rs_key, NULL);
 
 	if (scan->rs_ctup.t_data == NULL)
 		return NULL;
@@ -1414,9 +1424,9 @@ heap_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *s
 	/* Note: no locking manipulations needed */
 
 	if (sscan->rs_flags & SO_ALLOW_PAGEMODE)
-		heapgettup_pagemode(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+		heapgettup_pagemode(scan, direction, sscan->rs_nkeys, sscan->rs_key, slot);
 	else
-		heapgettup(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+		heapgettup(scan, direction, sscan->rs_nkeys, sscan->rs_key, slot);
 
 	if (scan->rs_ctup.t_data == NULL)
 	{
@@ -1431,8 +1441,6 @@ heap_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *s
 
 	pgstat_count_heap_getnext(scan->rs_base.rs_rd);
 
-	ExecStoreBufferHeapTuple(&scan->rs_ctup, slot,
-							 scan->rs_cbuf);
 	return true;
 }
 
@@ -1521,9 +1529,9 @@ heap_getnextslot_tidrange(TableScanDesc sscan, ScanDirection direction,
 	for (;;)
 	{
 		if (sscan->rs_flags & SO_ALLOW_PAGEMODE)
-			heapgettup_pagemode(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+			heapgettup_pagemode(scan, direction, sscan->rs_nkeys, sscan->rs_key, slot);
 		else
-			heapgettup(scan, direction, sscan->rs_nkeys, sscan->rs_key);
+			heapgettup(scan, direction, sscan->rs_nkeys, sscan->rs_key, slot);
 
 		if (scan->rs_ctup.t_data == NULL)
 		{
@@ -1579,7 +1587,6 @@ heap_getnextslot_tidrange(TableScanDesc sscan, ScanDirection direction,
 	 */
 	pgstat_count_heap_getnext(scan->rs_base.rs_rd);
 
-	ExecStoreBufferHeapTuple(&scan->rs_ctup, slot, scan->rs_cbuf);
 	return true;
 }
 
diff --git a/src/include/access/valid.h b/src/include/access/valid.h
index 8b33089dac4..2f272294784 100644
--- a/src/include/access/valid.h
+++ b/src/include/access/valid.h
@@ -18,6 +18,7 @@
 #include "access/htup_details.h"
 #include "access/skey.h"
 #include "access/tupdesc.h"
+#include "executor/tuptable.h"
 
 /*
  *		HeapKeyTest
@@ -25,7 +26,8 @@
  *		Test a heap tuple to see if it satisfies a scan key.
  */
 static inline bool
-HeapKeyTest(HeapTuple tuple, TupleDesc tupdesc, int nkeys, ScanKey keys)
+HeapKeyTest(HeapTuple tuple, TupleDesc tupdesc, int nkeys, ScanKey keys,
+	TupleTableSlot *slot)
 {
 	int			cur_nkeys = nkeys;
 	ScanKey		cur_key = keys;
@@ -39,7 +41,10 @@ HeapKeyTest(HeapTuple tuple, TupleDesc tupdesc, int nkeys, ScanKey keys)
 		if (cur_key->sk_flags & SK_ISNULL)
 			return false;
 
-		atp = heap_getattr(tuple, cur_key->sk_attno, tupdesc, &isnull);
+		if (slot)
+			atp = slot_getattr(slot, cur_key->sk_attno, &isnull);
+		else
+			atp = heap_getattr(tuple, cur_key->sk_attno, tupdesc, &isnull);
 
 		if (isnull)
 			return false;
-- 
2.50.1 (Apple Git-155)

#15

Andres Freund

andres@anarazel.de

26 days ago

In reply to: Maxime Schoemans (#14)

Re: Qual push down to table AM

Hi,

On 2025-12-18 20:40:31 +0100, Maxime Schoemans wrote:

On 10 Dec 2025, at 00:08, Andres Freund <andres@anarazel.de> wrote:
I don't really see this being viable without first tackling two nontrivial
projects:

2) Perform ScanKey evaluation in slot form, to be able to cache the deforming
and to make deforming of multiple columns sufficiently efficient.

Am I right in understanding that you think that the repeated calls to
heap_getattr in HeapKeyTest is not ideal if we have NULL or varlena
columns?

That's part of it, but not all of it: The other aspect is that if you do a
bunch of heap_getattr()s inside HeapKeyTest() and then project that column in
nodeSeqscan.c, you'll do the work to deform twice.

1) Make slot deforming for expressions & projections selective, i.e. don't
deform all the leading columns, but only ones that will eventually be
needed

Concerning 1), I’m also not certain I understand why this is a prerequisite
for the pushdown work. It could certainly be beneficial, but it seems to be
complementary.

As hinted at in [1]/messages/by-id/CAAh00EQUwG5khqJO7nSV0nsqsG1OP=kA6ACfxV3rnNSVd4b6TQ@mail.gmail.com I suspect that you're just not going to see big enough
wins without the above optimization. A decent portion of the win from using
HeapKeyTest is to only selectively deform.

Greetings,

Andres Freund

[1]: /messages/by-id/CAAh00EQUwG5khqJO7nSV0nsqsG1OP=kA6ACfxV3rnNSVd4b6TQ@mail.gmail.com