Some ExecSeqScan optimizations

Started by Amit Langoteabout 1 year ago27 messages

amitlangote09@gmail.com

about 1 year ago

1 attachment(s)

Hi,

I’ve been looking into possible optimizations for ExecSeqScan() and
chatting with colleagues about cases where it shows up prominently in
analytics-style query plans.

For example, in queries like SELECT agg(somecol) FROM big_table WHERE
<condition>, ExecScan() often dominates the profile. Digging into it,
I found two potential sources of overhead:

1. Run-time checks for PlanState.qual and PlanState.ps_ProjInfo
nullness: these checks are done repeatedly, which seems unnecessary if
we know the values at plan init time.

2. Overhead from ExecScanFetch() when EvalPlanQual() isn’t relevant:
Andres pointed out that ExecScanFetch() introduces unnecessary
overhead even in the common case where EvalPlanQual() isn’t
applicable.

To address (1), I tried assigning specialized functions to
PlanState.ExecProcNode in ExecInitSeqScan() based on whether qual or
projInfo are NULL. Inspired by David Rowley’s suggestion to look at
ExecHashJoinImpl(), I wrote variants like ExecSeqScanNoQual() (for
qual == NULL) and ExecSeqScanNoProj() (for projInfo == NULL). These
call a local version of ExecScan() that lives in nodeSeqScan.c, marked
always-inline. This local copy takes qual and projInfo as arguments,
letting compilers inline and optimize unnecessary branches away.

For (2), the local ExecScan() copy avoids the generic ExecScanFetch()
logic, simplifying things further when EvalPlanQual() doesn’t apply.
That has the additional benefit of allowing SeqNext() to be called
directly instead of via an indirect function pointer. This reduces the
overhead of indirect calls and enables better compiler optimizations
like inlining.

Junwang Zhao helped with creating a benchmark to test the patch, the
results of which can be accessed in the spreadsheet at [1]https://docs.google.com/spreadsheets/d/1AsJOUgIfSsYIJUJwbXk4aO9FVOFOrBCvrfmdQYkHIw4/edit?usp=sharing. The
results show that the patch makes the latency of queries of shape
`SELECT agg(somecol or *) FROM big_table WHERE <condition>` generally
faster with up to 5% improvement in some cases.

Would love to hear thoughts.

--
Thanks, Amit Langote

[1]: https://docs.google.com/spreadsheets/d/1AsJOUgIfSsYIJUJwbXk4aO9FVOFOrBCvrfmdQYkHIw4/edit?usp=sharing

Attachments:

v1-0001-Introduce-optimized-ExecSeqScan-variants-for-tail.patchapplication/octet-stream; name=v1-0001-Introduce-optimized-ExecSeqScan-variants-for-tail.patchDownload

From e5d89d41f7ffcf88ca5542fda3fcee92c2bb8b63 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 20 Dec 2024 20:33:02 +0900
Subject: [PATCH v1] Introduce optimized ExecSeqScan variants for tailored
 execution

This commit introduces optimized execution variants for ExecSeqScan,
tailored to specific combinations of conditions like the presence of
EvalPlanQual(), qualifiers (qual), and projections (ps_ProjInfo).
Instead of relying on a single generic execution function (ExecScan()),
these variants remove unnecessary runtime checks by specializing
execution for the specific needs of the plan.

To enable inlining, this commit creates a copy of ExecScan() that is
local to nodeSeqScan.c. This change also allows SeqNext() to be
called directly, avoiding the function pointer mechanism required by
ExecScan()'s generic interface. This reduces the overhead of indirect
function calls and enables better compiler optimizations.

Benchmarks performed with Junwang Zhao's help show that this patch
improves the latency of sequential scans by up to 5% in some cases,
particularly for analytical-style queries like
SELECT COUNT(*) FROM large_table WHERE <condition>

Reviewed-by: Junwang Zhao
Tested-by: Junwang Zhao
---
 src/backend/executor/nodeSeqscan.c | 167 ++++++++++++++++++++++++++++-
 1 file changed, 166 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 7cb12a11c2..1bc0ac249f 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -31,9 +31,17 @@
 #include "access/tableam.h"
 #include "executor/executor.h"
 #include "executor/nodeSeqscan.h"
+#include "miscadmin.h"
 #include "utils/rel.h"
 
 static TupleTableSlot *SeqNext(SeqScanState *node);
+static TupleTableSlot *ExecSeqScanNoQualNoProj(PlanState *pstate);
+static TupleTableSlot *ExecSeqScanNoQual(PlanState *pstate);
+static TupleTableSlot *ExecSeqScanNoProj(PlanState *pstate);
+static TupleTableSlot *ExecSeqScanNoEPQ(PlanState *pstate);
+static pg_attribute_always_inline TupleTableSlot *ExecSeqScanNoEPQImpl(PlanState *pstate,
+																	   ExprState *qual,
+																	   ProjectionInfo *projInfo);
 
 /* ----------------------------------------------------------------
  *						Scan Support
@@ -114,6 +122,147 @@ ExecSeqScan(PlanState *pstate)
 					(ExecScanRecheckMtd) SeqRecheck);
 }
 
+/*
+ * ExecSeqScanNoEPQImpl
+ *		A specialized version of ExecScan() for use in plans where
+ *		ExecInitSeqScan() determines that the node cannot be invoked by
+ *		EvalPlanQual().
+ *
+ * This function is marked with the always-inline attribute to allow compilers
+ * to create specialized versions for different combinations of qual (NULL or
+ * non-NULL) and projInfo (NULL or non-NULL). By inlining, unnecessary branches
+ * can be eliminated in each of the ExecSeqScan*() functions below.
+ *
+ * ExecInitSeqScan() assigns the appropriate variant to pstate->ExecProcNode of
+ * the SeqScan node minimizing execution-time overhead for checking qual and
+ * projInfo.
+ */
+static pg_attribute_always_inline TupleTableSlot *
+ExecSeqScanNoEPQImpl(PlanState *pstate, ExprState *qual,
+					 ProjectionInfo *projInfo)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+	ExprContext *econtext;
+
+	econtext = node->ss.ps.ps_ExprContext;
+
+	/*
+	 * If we have neither a qual to check nor a projection to do, just skip
+	 * all the overhead and return the raw scan tuple.
+	 */
+	if (!qual && !projInfo)
+	{
+		ResetExprContext(econtext);
+		return SeqNext(node);
+	}
+
+	/*
+	 * Reset per-tuple memory context to free any expression evaluation
+	 * storage allocated in the previous tuple cycle.
+	 */
+	ResetExprContext(econtext);
+
+	/*
+	 * get a tuple from the access method.  Loop until we obtain a tuple that
+	 * passes the qualification.
+	 */
+	for (;;)
+	{
+		TupleTableSlot *slot;
+
+		CHECK_FOR_INTERRUPTS();
+
+		slot = SeqNext(node);
+
+		/*
+		 * if the slot returned by the accessMtd contains NULL, then it means
+		 * there is nothing more to scan so we just return an empty slot,
+		 * being careful to use the projection result slot so it has correct
+		 * tupleDesc.
+		 */
+		if (TupIsNull(slot))
+		{
+			if (projInfo)
+				return ExecClearTuple(projInfo->pi_state.resultslot);
+			else
+				return slot;
+		}
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-null qual here to avoid a function call to ExecQual()
+		 * when the qual is null ... saves only a few cycles, but they add up
+		 * ...
+		 */
+		if (qual == NULL || ExecQual(qual, econtext))
+		{
+			/*
+			 * Found a satisfactory scan tuple.
+			 */
+			if (projInfo)
+			{
+				/*
+				 * Form a projection tuple, store it in the result tuple slot
+				 * and return it.
+				 */
+				return ExecProject(projInfo);
+			}
+			else
+			{
+				/*
+				 * Here, we aren't projecting, so just return scan tuple.
+				 */
+				return slot;
+			}
+		}
+		else
+			InstrCountFiltered1(node, 1);
+
+		/*
+		 * Tuple fails qual, so free per-tuple memory and try again.
+		 */
+		ResetExprContext(econtext);
+	}
+}
+
+/*
+ * Variants of ExecSeqScan() used when no EvalPlanQual() is necessary,
+ * specialized based on the presence of qual and projection as described in
+ * the comment above ExecSeqScanNoEPQImpl().
+ */
+static TupleTableSlot *
+ExecSeqScanNoQualNoProj(PlanState *pstate)
+{
+	Assert(pstate->qual == NULL && pstate->ps_ProjInfo == NULL);
+	return ExecSeqScanNoEPQImpl(pstate, NULL, NULL);
+}
+
+static TupleTableSlot *
+ExecSeqScanNoQual(PlanState *pstate)
+{
+	Assert(pstate->qual == NULL);
+	return ExecSeqScanNoEPQImpl(pstate, NULL, pstate->ps_ProjInfo);
+}
+
+static TupleTableSlot *
+ExecSeqScanNoProj(PlanState *pstate)
+{
+	Assert(pstate->ps_ProjInfo == NULL);
+	return ExecSeqScanNoEPQImpl(pstate, pstate->qual, NULL);
+}
+
+static TupleTableSlot *
+ExecSeqScanNoEPQ(PlanState *pstate)
+{
+	Assert(pstate->qual != NULL && pstate->ps_ProjInfo != NULL);
+	return ExecSeqScanNoEPQImpl(pstate, pstate->qual, pstate->ps_ProjInfo);
+}
 
 /* ----------------------------------------------------------------
  *		ExecInitSeqScan
@@ -137,7 +286,6 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate = makeNode(SeqScanState);
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
-	scanstate->ss.ps.ExecProcNode = ExecSeqScan;
 
 	/*
 	 * Miscellaneous initialization
@@ -171,6 +319,23 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.qual =
 		ExecInitQual(node->scan.plan.qual, (PlanState *) scanstate);
 
+	/*
+	 * When EvalPlanQual() is not in use, assign ExecProcNode for this node
+	 * based on the presence of qual and projection. Each ExecSeqScan*()
+	 * variant is optimized for the specific combination of these conditions.
+	 */
+	if (estate->es_epq_active != NULL)
+		scanstate->ss.ps.ExecProcNode = ExecSeqScan;
+	else if (scanstate->ss.ps.qual == NULL &&
+			 scanstate->ss.ps.ps_ProjInfo == NULL)
+		scanstate->ss.ps.ExecProcNode = ExecSeqScanNoQualNoProj;
+	else if (scanstate->ss.ps.qual == NULL)
+		scanstate->ss.ps.ExecProcNode = ExecSeqScanNoQual;
+	else if (scanstate->ss.ps.ps_ProjInfo == NULL)
+		scanstate->ss.ps.ExecProcNode = ExecSeqScanNoProj;
+	else
+		scanstate->ss.ps.ExecProcNode = ExecSeqScanNoEPQ;
+
 	return scanstate;
 }
 
-- 
2.43.0

David Rowley

dgrowleyml@gmail.com

about 1 year ago

In reply to: Amit Langote (#1)

1 attachment(s)

Re: Some ExecSeqScan optimizations

On Sat, 21 Dec 2024 at 00:41, Amit Langote <amitlangote09@gmail.com> wrote:

To address (1), I tried assigning specialized functions to
PlanState.ExecProcNode in ExecInitSeqScan() based on whether qual or
projInfo are NULL. Inspired by David Rowley’s suggestion to look at
ExecHashJoinImpl(), I wrote variants like ExecSeqScanNoQual() (for
qual == NULL) and ExecSeqScanNoProj() (for projInfo == NULL). These
call a local version of ExecScan() that lives in nodeSeqScan.c, marked
always-inline. This local copy takes qual and projInfo as arguments,
letting compilers inline and optimize unnecessary branches away.

I tested the performance of this and I do see close to a 5%
performance increase in TPC-H Q1. Nice.

I'm a little concerned with the method the patch takes where it copies
most of ExecScan and includes it in nodeSeqscan.c. If there are any
future changes to ExecScan, someone might forget to propagate those
changes into nodeSeqscan.c's version. What if instead you moved
ExecScan() into a header file and made it static inline? That way
callers would get their own inlined copy with the callback functions
inlined too, which for nodeSeqscan is good, since the recheck callback
does nothing.

Just as an additional reason for why I think this might be a better
idea is that the patch doesn't seem to quite keep things equivalent as
in the process of having ExecSeqScanNoEPQImpl() directly call
SeqNext() without going through ExecScanFetch is that you've lost a
call to CHECK_FOR_INTERRUPTS().

On the other hand, one possible drawback from making ExecScan a static
inline is that any non-core code that uses ExecScan won't get any bug
fixes if we were to fix some bug in ExecScan in a minor release unless
the extension is compiled again. That could be fixed by keeping
ExecScan as an extern function and maybe just having ExecScanExtended
as the static inline version.

Another thing I wondered about is the naming conversion you're using
for these ExecSeqScan variant functions.

+ExecSeqScanNoQualNoProj(PlanState *pstate)
+ExecSeqScanNoQual(PlanState *pstate)
+ExecSeqScanNoProj(PlanState *pstate)
+ExecSeqScanNoEPQ(PlanState *pstate)

I think it's better to have a naming convention that aims to convey
what the function does do rather than what it does not do.

I've attached my workings of what I was messing around with. It seems
to perform about the same as your version. I think maybe we'd need
some sort of execScan.h instead of where I've stuffed the functions
in.

It would also be good if there was some way to give guarantees to the
compiler that a given pointer isn't NULL. For example in:

return ExecScanExtended(&node->ss,
(ExecScanAccessMtd) SeqNext,
(ExecScanRecheckMtd) SeqRecheck,
NULL,
pstate->qual,
NULL);

It would be good if when ExecScanExtended is inlined the compiler
wouldn't emit code for the "if (qual == NULL)" ... part. I don't know
if there's any way to do that. I thought I'd mention it in case
someone can think of a way... I guess you could add another parameter
that gets passed as a const and have the "if" test look at that
instead, that's a bit ugly though.

David

Attachments:

inline_ExecScan.patch.txttext/plain; charset=US-ASCII; name=inline_ExecScan.patch.txtDownload

diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 556a5d98e7..31e028dc84 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -21,238 +21,6 @@
 #include "executor/executor.h"
 #include "miscadmin.h"
 
-
-
-/*
- * ExecScanFetch -- check interrupts & fetch next potential tuple
- *
- * This routine is concerned with substituting a test tuple if we are
- * inside an EvalPlanQual recheck.  If we aren't, just execute
- * the access method's next-tuple routine.
- */
-static inline TupleTableSlot *
-ExecScanFetch(ScanState *node,
-			  ExecScanAccessMtd accessMtd,
-			  ExecScanRecheckMtd recheckMtd)
-{
-	EState	   *estate = node->ps.state;
-
-	CHECK_FOR_INTERRUPTS();
-
-	if (estate->es_epq_active != NULL)
-	{
-		EPQState   *epqstate = estate->es_epq_active;
-
-		/*
-		 * We are inside an EvalPlanQual recheck.  Return the test tuple if
-		 * one is available, after rechecking any access-method-specific
-		 * conditions.
-		 */
-		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
-
-		if (scanrelid == 0)
-		{
-			/*
-			 * This is a ForeignScan or CustomScan which has pushed down a
-			 * join to the remote side.  The recheck method is responsible not
-			 * only for rechecking the scan/join quals but also for storing
-			 * the correct tuple in the slot.
-			 */
-
-			TupleTableSlot *slot = node->ss_ScanTupleSlot;
-
-			if (!(*recheckMtd) (node, slot))
-				ExecClearTuple(slot);	/* would not be returned by scan */
-			return slot;
-		}
-		else if (epqstate->relsubs_done[scanrelid - 1])
-		{
-			/*
-			 * Return empty slot, as either there is no EPQ tuple for this rel
-			 * or we already returned it.
-			 */
-
-			TupleTableSlot *slot = node->ss_ScanTupleSlot;
-
-			return ExecClearTuple(slot);
-		}
-		else if (epqstate->relsubs_slot[scanrelid - 1] != NULL)
-		{
-			/*
-			 * Return replacement tuple provided by the EPQ caller.
-			 */
-
-			TupleTableSlot *slot = epqstate->relsubs_slot[scanrelid - 1];
-
-			Assert(epqstate->relsubs_rowmark[scanrelid - 1] == NULL);
-
-			/* Mark to remember that we shouldn't return it again */
-			epqstate->relsubs_done[scanrelid - 1] = true;
-
-			/* Return empty slot if we haven't got a test tuple */
-			if (TupIsNull(slot))
-				return NULL;
-
-			/* Check if it meets the access-method conditions */
-			if (!(*recheckMtd) (node, slot))
-				return ExecClearTuple(slot);	/* would not be returned by
-												 * scan */
-			return slot;
-		}
-		else if (epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
-		{
-			/*
-			 * Fetch and return replacement tuple using a non-locking rowmark.
-			 */
-
-			TupleTableSlot *slot = node->ss_ScanTupleSlot;
-
-			/* Mark to remember that we shouldn't return more */
-			epqstate->relsubs_done[scanrelid - 1] = true;
-
-			if (!EvalPlanQualFetchRowMark(epqstate, scanrelid, slot))
-				return NULL;
-
-			/* Return empty slot if we haven't got a test tuple */
-			if (TupIsNull(slot))
-				return NULL;
-
-			/* Check if it meets the access-method conditions */
-			if (!(*recheckMtd) (node, slot))
-				return ExecClearTuple(slot);	/* would not be returned by
-												 * scan */
-			return slot;
-		}
-	}
-
-	/*
-	 * Run the node-type-specific access method function to get the next tuple
-	 */
-	return (*accessMtd) (node);
-}
-
-/* ----------------------------------------------------------------
- *		ExecScan
- *
- *		Scans the relation using the 'access method' indicated and
- *		returns the next qualifying tuple.
- *		The access method returns the next tuple and ExecScan() is
- *		responsible for checking the tuple returned against the qual-clause.
- *
- *		A 'recheck method' must also be provided that can check an
- *		arbitrary tuple of the relation against any qual conditions
- *		that are implemented internal to the access method.
- *
- *		Conditions:
- *		  -- the "cursor" maintained by the AMI is positioned at the tuple
- *			 returned previously.
- *
- *		Initial States:
- *		  -- the relation indicated is opened for scanning so that the
- *			 "cursor" is positioned before the first qualifying tuple.
- * ----------------------------------------------------------------
- */
-TupleTableSlot *
-ExecScan(ScanState *node,
-		 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
-		 ExecScanRecheckMtd recheckMtd)
-{
-	ExprContext *econtext;
-	ExprState  *qual;
-	ProjectionInfo *projInfo;
-
-	/*
-	 * Fetch data from node
-	 */
-	qual = node->ps.qual;
-	projInfo = node->ps.ps_ProjInfo;
-	econtext = node->ps.ps_ExprContext;
-
-	/* interrupt checks are in ExecScanFetch */
-
-	/*
-	 * If we have neither a qual to check nor a projection to do, just skip
-	 * all the overhead and return the raw scan tuple.
-	 */
-	if (!qual && !projInfo)
-	{
-		ResetExprContext(econtext);
-		return ExecScanFetch(node, accessMtd, recheckMtd);
-	}
-
-	/*
-	 * Reset per-tuple memory context to free any expression evaluation
-	 * storage allocated in the previous tuple cycle.
-	 */
-	ResetExprContext(econtext);
-
-	/*
-	 * get a tuple from the access method.  Loop until we obtain a tuple that
-	 * passes the qualification.
-	 */
-	for (;;)
-	{
-		TupleTableSlot *slot;
-
-		slot = ExecScanFetch(node, accessMtd, recheckMtd);
-
-		/*
-		 * if the slot returned by the accessMtd contains NULL, then it means
-		 * there is nothing more to scan so we just return an empty slot,
-		 * being careful to use the projection result slot so it has correct
-		 * tupleDesc.
-		 */
-		if (TupIsNull(slot))
-		{
-			if (projInfo)
-				return ExecClearTuple(projInfo->pi_state.resultslot);
-			else
-				return slot;
-		}
-
-		/*
-		 * place the current tuple into the expr context
-		 */
-		econtext->ecxt_scantuple = slot;
-
-		/*
-		 * check that the current tuple satisfies the qual-clause
-		 *
-		 * check for non-null qual here to avoid a function call to ExecQual()
-		 * when the qual is null ... saves only a few cycles, but they add up
-		 * ...
-		 */
-		if (qual == NULL || ExecQual(qual, econtext))
-		{
-			/*
-			 * Found a satisfactory scan tuple.
-			 */
-			if (projInfo)
-			{
-				/*
-				 * Form a projection tuple, store it in the result tuple slot
-				 * and return it.
-				 */
-				return ExecProject(projInfo);
-			}
-			else
-			{
-				/*
-				 * Here, we aren't projecting, so just return scan tuple.
-				 */
-				return slot;
-			}
-		}
-		else
-			InstrCountFiltered1(node, 1);
-
-		/*
-		 * Tuple fails qual, so free per-tuple memory and try again.
-		 */
-		ResetExprContext(econtext);
-	}
-}
-
 /*
  * ExecAssignScanProjectionInfo
  *		Set up projection info for a scan node, if necessary.
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index fa2d522b25..c150224f2e 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -99,9 +99,10 @@ SeqRecheck(SeqScanState *node, TupleTableSlot *slot)
  *		ExecSeqScan(node)
  *
  *		Scans the relation sequentially and returns the next qualifying
- *		tuple.
- *		We call the ExecScan() routine and pass it the appropriate
- *		access method functions.
+ *		tuple. This variant is used when there is no es_eqp_active, no qual
+ *		and no projection.  Passing const-NULLs for these to ExecScanExtended
+ *		allows the compiler to eliminate the additional code that would
+ *		ordinarily be required for evalualtion of these.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -109,12 +110,94 @@ ExecSeqScan(PlanState *pstate)
 {
 	SeqScanState *node = castNode(SeqScanState, pstate);
 
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual == NULL);
+	Assert(pstate->ps_ProjInfo == NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							NULL,
+							NULL);
+}
+
+/*
+ * Variant of ExecSeqScan() but when qual evaluation is required.
+ */
+static TupleTableSlot *
+ExecSeqScanWithQual(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual != NULL);
+	Assert(pstate->ps_ProjInfo == NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							pstate->qual,
+							NULL);
+}
+
+/*
+ * Variant of ExecSeqScan() but when projection is required.
+ */
+static TupleTableSlot *
+ExecSeqScanProject(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual == NULL);
+	Assert(pstate->ps_ProjInfo != NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							NULL,
+							pstate->ps_ProjInfo);
+}
+
+/*
+ * Variant of ExecSeqScan() but when qual evaluation and projection are
+ * required.
+ */
+static TupleTableSlot *
+ExecSeqScanWithQualProject(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual != NULL);
+	Assert(pstate->ps_ProjInfo != NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							pstate->qual,
+							pstate->ps_ProjInfo);
+}
+
+/*
+ * Variant of ExecSeqScan for when EPQ evaluation is required.  We don't
+ * bother adding variants of this for with/without qual and projection as
+ * EPQ doesn't seem as exciting a case to optimize for.
+ */
+static TupleTableSlot *
+ExecSeqScanEPQ(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
 	return ExecScan(&node->ss,
 					(ExecScanAccessMtd) SeqNext,
 					(ExecScanRecheckMtd) SeqRecheck);
 }
 
-
 /* ----------------------------------------------------------------
  *		ExecInitSeqScan
  * ----------------------------------------------------------------
@@ -137,7 +220,6 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate = makeNode(SeqScanState);
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
-	scanstate->ss.ps.ExecProcNode = ExecSeqScan;
 
 	/*
 	 * Miscellaneous initialization
@@ -171,6 +253,28 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.qual =
 		ExecInitQual(node->scan.plan.qual, (PlanState *) scanstate);
 
+	/*
+	 * When EvalPlanQual() is not in use, assign ExecProcNode for this node
+	 * based on the presence of qual and projection. Each ExecSeqScan*()
+	 * variant is optimized for the specific combination of these conditions.
+	 */
+	if (scanstate->ss.ps.state->es_epq_active != NULL)
+		scanstate->ss.ps.ExecProcNode = ExecSeqScanEPQ;
+	else if (scanstate->ss.ps.qual == NULL)
+	{
+		if (scanstate->ss.ps.ps_ProjInfo == NULL)
+			scanstate->ss.ps.ExecProcNode = ExecSeqScan;
+		else
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanProject;
+	}
+	else
+	{
+		if (scanstate->ss.ps.ps_ProjInfo == NULL)
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanWithQual;
+		else
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanWithQualProject;
+	}
+
 	return scanstate;
 }
 
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index f8a8d03e53..940fcb7789 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -18,6 +18,7 @@
 #include "fmgr.h"
 #include "nodes/lockoptions.h"
 #include "nodes/parsenodes.h"
+#include "miscadmin.h"
 #include "utils/memutils.h"
 
 
@@ -486,8 +487,6 @@ extern Datum ExecMakeFunctionResultSet(SetExprState *fcache,
 typedef TupleTableSlot *(*ExecScanAccessMtd) (ScanState *node);
 typedef bool (*ExecScanRecheckMtd) (ScanState *node, TupleTableSlot *slot);
 
-extern TupleTableSlot *ExecScan(ScanState *node, ExecScanAccessMtd accessMtd,
-								ExecScanRecheckMtd recheckMtd);
 extern void ExecAssignScanProjectionInfo(ScanState *node);
 extern void ExecAssignScanProjectionInfoWithVarno(ScanState *node, int varno);
 extern void ExecScanReScan(ScanState *node);
@@ -695,4 +694,280 @@ extern ResultRelInfo *ExecLookupResultRelByOid(ModifyTableState *node,
 											   bool missing_ok,
 											   bool update_cache);
 
+
+/*
+ * inline functions for execScan.c
+ */
+/*
+ * ExecScanFetch -- check interrupts & fetch next potential tuple
+ *
+ * This routine is concerned with substituting a test tuple if we are
+ * inside an EvalPlanQual recheck.  If we aren't, just execute
+ * the access method's next-tuple routine.
+ */
+static pg_attribute_always_inline TupleTableSlot *
+ExecScanFetch(ScanState *node,
+			  EPQState *epqstate,
+			  ExecScanAccessMtd accessMtd,
+			  ExecScanRecheckMtd recheckMtd)
+{
+	CHECK_FOR_INTERRUPTS();
+
+	if (epqstate != NULL)
+	{
+		/*
+		 * We are inside an EvalPlanQual recheck.  Return the test tuple if
+		 * one is available, after rechecking any access-method-specific
+		 * conditions.
+		 */
+		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
+
+		if (scanrelid == 0)
+		{
+			/*
+			 * This is a ForeignScan or CustomScan which has pushed down a
+			 * join to the remote side.  The recheck method is responsible not
+			 * only for rechecking the scan/join quals but also for storing
+			 * the correct tuple in the slot.
+			 */
+
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			if (!(*recheckMtd) (node, slot))
+				ExecClearTuple(slot);	/* would not be returned by scan */
+			return slot;
+		}
+		else if (epqstate->relsubs_done[scanrelid - 1])
+		{
+			/*
+			 * Return empty slot, as either there is no EPQ tuple for this rel
+			 * or we already returned it.
+			 */
+
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			return ExecClearTuple(slot);
+		}
+		else if (epqstate->relsubs_slot[scanrelid - 1] != NULL)
+		{
+			/*
+			 * Return replacement tuple provided by the EPQ caller.
+			 */
+
+			TupleTableSlot *slot = epqstate->relsubs_slot[scanrelid - 1];
+
+			Assert(epqstate->relsubs_rowmark[scanrelid - 1] == NULL);
+
+			/* Mark to remember that we shouldn't return it again */
+			epqstate->relsubs_done[scanrelid - 1] = true;
+
+			/* Return empty slot if we haven't got a test tuple */
+			if (TupIsNull(slot))
+				return NULL;
+
+			/* Check if it meets the access-method conditions */
+			if (!(*recheckMtd) (node, slot))
+				return ExecClearTuple(slot);	/* would not be returned by
+												 * scan */
+			return slot;
+		}
+		else if (epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/*
+			 * Fetch and return replacement tuple using a non-locking rowmark.
+			 */
+
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			/* Mark to remember that we shouldn't return more */
+			epqstate->relsubs_done[scanrelid - 1] = true;
+
+			if (!EvalPlanQualFetchRowMark(epqstate, scanrelid, slot))
+				return NULL;
+
+			/* Return empty slot if we haven't got a test tuple */
+			if (TupIsNull(slot))
+				return NULL;
+
+			/* Check if it meets the access-method conditions */
+			if (!(*recheckMtd) (node, slot))
+				return ExecClearTuple(slot);	/* would not be returned by
+												 * scan */
+			return slot;
+		}
+	}
+
+	/*
+	 * Run the node-type-specific access method function to get the next tuple
+	 */
+	return (*accessMtd) (node);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecScanWithQualAndProjection
+ *
+ *		Scans the relation using the 'access method' indicated and
+ *		returns the next qualifying tuple.
+ *		The access method returns the next tuple and the tuple is checked
+ *		against the optional 'qual'.
+ *
+ *		A 'recheck method' must also be provided that can check an
+ *		arbitrary tuple of the relation against any qual conditions
+ *		that are implemented internal to the access method.
+ *
+ *		When a non-NULL 'projInfo' is given, qualifying tuples are projected
+ *		using this.
+ *
+ *		This function may be used as an alternative to ExecScan when
+ *		callers don't have a 'qual' or don't have a 'projInfo'.  The inlining
+ *		allows the compiler to eliminate the non-relevant branches, which
+ *		can save having to do run-time checks on every tuple.
+ *
+ *		Conditions:
+ *		  -- the "cursor" maintained by the AMI is positioned at the tuple
+ *			 returned previously.
+ *
+ *		Initial States:
+ *		  -- the relation indicated is opened for scanning so that the
+ *			 "cursor" is positioned before the first qualifying tuple.
+ * ----------------------------------------------------------------
+ */
+static pg_attribute_always_inline TupleTableSlot *
+ExecScanExtended(ScanState *node,
+				 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
+				 ExecScanRecheckMtd recheckMtd,
+				 EPQState *epqstate,
+				 ExprState *qual,
+				 ProjectionInfo *projInfo)
+{
+	ExprContext *econtext = node->ps.ps_ExprContext;
+
+	/* interrupt checks are in ExecScanFetch */
+
+	/*
+	 * If we have neither a qual to check nor a projection to do, just skip
+	 * all the overhead and return the raw scan tuple.
+	 */
+	if (!qual && !projInfo)
+	{
+		ResetExprContext(econtext);
+		return ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
+	}
+
+	/*
+	 * Reset per-tuple memory context to free any expression evaluation
+	 * storage allocated in the previous tuple cycle.
+	 */
+	ResetExprContext(econtext);
+
+	/*
+	 * get a tuple from the access method.  Loop until we obtain a tuple that
+	 * passes the qualification.
+	 */
+	for (;;)
+	{
+		TupleTableSlot *slot;
+
+		slot = ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
+
+		/*
+		 * if the slot returned by the accessMtd contains NULL, then it means
+		 * there is nothing more to scan so we just return an empty slot,
+		 * being careful to use the projection result slot so it has correct
+		 * tupleDesc.
+		 */
+		if (TupIsNull(slot))
+		{
+			if (projInfo)
+				return ExecClearTuple(projInfo->pi_state.resultslot);
+			else
+				return slot;
+		}
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-null qual here to avoid a function call to ExecQual()
+		 * when the qual is null ... saves only a few cycles, but they add up
+		 * ...
+		 */
+		if (qual == NULL || ExecQual(qual, econtext))
+		{
+			/*
+			 * Found a satisfactory scan tuple.
+			 */
+			if (projInfo)
+			{
+				/*
+				 * Form a projection tuple, store it in the result tuple slot
+				 * and return it.
+				 */
+				return ExecProject(projInfo);
+			}
+			else
+			{
+				/*
+				 * Here, we aren't projecting, so just return scan tuple.
+				 */
+				return slot;
+			}
+		}
+		else
+			InstrCountFiltered1(node, 1);
+
+		/*
+		 * Tuple fails qual, so free per-tuple memory and try again.
+		 */
+		ResetExprContext(econtext);
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		ExecScan
+ *
+ *		Scans the relation using the 'access method' indicated and
+ *		returns the next qualifying tuple.
+ *		The access method returns the next tuple and ExecScan() is
+ *		responsible for checking the tuple returned against the qual-clause.
+ *
+ *		A 'recheck method' must also be provided that can check an
+ *		arbitrary tuple of the relation against any qual conditions
+ *		that are implemented internal to the access method.
+ *
+ *		Conditions:
+ *		  -- the "cursor" maintained by the AMI is positioned at the tuple
+ *			 returned previously.
+ *
+ *		Initial States:
+ *		  -- the relation indicated is opened for scanning so that the
+ *			 "cursor" is positioned before the first qualifying tuple.
+ * ----------------------------------------------------------------
+ */
+static inline TupleTableSlot *
+ExecScan(ScanState *node,
+		 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
+		 ExecScanRecheckMtd recheckMtd)
+
+{
+	EPQState *epqstate;
+	ExprState  *qual;
+	ProjectionInfo *projInfo;
+
+	epqstate = node->ps.state->es_epq_active;
+	qual = node->ps.qual;
+	projInfo = node->ps.ps_ProjInfo;
+
+	return ExecScanExtended(node,
+							accessMtd,
+							recheckMtd,
+							epqstate,
+							qual,
+							projInfo);
+}
+
 #endif							/* EXECUTOR_H  */

Amit Langote

amitlangote09@gmail.com

about 1 year ago

In reply to: David Rowley (#2)

2 attachment(s)

Re: Some ExecSeqScan optimizations

On Mon, Jan 6, 2025 at 10:18 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Sat, 21 Dec 2024 at 00:41, Amit Langote <amitlangote09@gmail.com> wrote:

To address (1), I tried assigning specialized functions to
PlanState.ExecProcNode in ExecInitSeqScan() based on whether qual or
projInfo are NULL. Inspired by David Rowley’s suggestion to look at
ExecHashJoinImpl(), I wrote variants like ExecSeqScanNoQual() (for
qual == NULL) and ExecSeqScanNoProj() (for projInfo == NULL). These
call a local version of ExecScan() that lives in nodeSeqScan.c, marked
always-inline. This local copy takes qual and projInfo as arguments,
letting compilers inline and optimize unnecessary branches away.

I tested the performance of this and I do see close to a 5%
performance increase in TPC-H Q1. Nice.

Thanks David for looking at this.

I'm a little concerned with the method the patch takes where it copies
most of ExecScan and includes it in nodeSeqscan.c. If there are any
future changes to ExecScan, someone might forget to propagate those
changes into nodeSeqscan.c's version. What if instead you moved
ExecScan() into a header file and made it static inline? That way
callers would get their own inlined copy with the callback functions
inlined too, which for nodeSeqscan is good, since the recheck callback
does nothing.

Yeah, having an inline-able version of ExecScan() in a separate header
sounds better than what I proposed.

Just as an additional reason for why I think this might be a better
idea is that the patch doesn't seem to quite keep things equivalent as
in the process of having ExecSeqScanNoEPQImpl() directly call
SeqNext() without going through ExecScanFetch is that you've lost a
call to CHECK_FOR_INTERRUPTS().

Yeah, that was clearly a bug in my patch.

On the other hand, one possible drawback from making ExecScan a static
inline is that any non-core code that uses ExecScan won't get any bug
fixes if we were to fix some bug in ExecScan in a minor release unless
the extension is compiled again. That could be fixed by keeping
ExecScan as an extern function and maybe just having ExecScanExtended
as the static inline version.

Yes, keeping ExecScan()'s interface unchanged seems better for the
considerations you mention.

Another thing I wondered about is the naming conversion you're using
for these ExecSeqScan variant functions.
+ExecSeqScanNoQualNoProj(PlanState *pstate)
+ExecSeqScanNoQual(PlanState *pstate)
+ExecSeqScanNoProj(PlanState *pstate)
+ExecSeqScanNoEPQ(PlanState *pstate)
I think it's better to have a naming convention that aims to convey
what the function does do rather than what it does not do.

Agreed.

I've attached my workings of what I was messing around with. It seems
to perform about the same as your version. I think maybe we'd need
some sort of execScan.h instead of where I've stuffed the functions
in.

I've done that in the attached v2.

It would also be good if there was some way to give guarantees to the
compiler that a given pointer isn't NULL. For example in:

return ExecScanExtended(&node->ss,
(ExecScanAccessMtd) SeqNext,
(ExecScanRecheckMtd) SeqRecheck,
NULL,
pstate->qual,
NULL);

It would be good if when ExecScanExtended is inlined the compiler
wouldn't emit code for the "if (qual == NULL)" ... part. I don't know
if there's any way to do that. I thought I'd mention it in case
someone can think of a way... I guess you could add another parameter
that gets passed as a const and have the "if" test look at that
instead, that's a bit ugly though.

I too am not sure of a way short of breaking ExecScanExtended() down
into individual functions, each for the following cases:

1. qual != NULL && projInfo != NULL
2. qual != NULL (&& projInfo == NULL)
3. projInfo != NULL (&& qual == NULL)

So basically, mirroring the variants we now have in nodeSeqScan.c to
the execScan.h. To avoid inlining of the EPQ code when epqstate ==
NULL, rename ExecScanFetch() to ExecScanGetEPQTuple() and move the
(*accessMtd) call to the caller when epqstate == NULL.
CHECK_FOR_INTERRUPTS() is now repeated at every place that needs it.

Attached 0002 shows a PoC of that.

--
Thanks, Amit Langote

Attachments:

v2-0001-Refactor-ExecScan-to-inline-scan-filtering-and-pr.patchapplication/octet-stream; name=v2-0001-Refactor-ExecScan-to-inline-scan-filtering-and-pr.patchDownload

From 16d4aed07dd7456aa8523830d5a8b0b4559ef1dd Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 9 Jan 2025 16:34:55 +0900
Subject: [PATCH v2 1/2] Refactor ExecScan() to inline scan, filtering, and
 projection logic

This commit refactors ExecScan() by moving its tuple-fetching,
filtering, and projection logic into an inline-able function,
ExecScanExtended(), defined in src/include/executor/execScan.h.
ExecScanExtended() accepts parameters for EvalPlanQual state,
qualifiers (ExprState), and projection (ProjectionInfo).

Specialized variants of the execution function of a given Scan node
can then pass const-NULL for unused parameters.  This allows the
compiler to inline the logic and eliminate unnecessary branches or
checks. Each variant function thus contains only the necessary code,
optimizing execution for sequential scans where these features are
not needed.

Currently, only ExecSeqScan() is modified to take advantage of this
inline-ability.  Other Scan nodes might benefit from such specialized
variant functions but that is left as future work.

Benchmarks performed by Junwang Zhao and David Rowley show up to a 5%
reduction in execution time for queries that rely heavily on Seq
Scans. The most significant improvements were observed in scenarios
where EvalPlanQual, qualifiers, and projection were not required, but
other cases also benefit from reduced runtime overhead due to the
inlining and removal of unnecessary code paths.

The refactoring approach implemented here is based on a proposal by
David Rowley, significantly improving upon an earlier idea I (amitlan)
suggested.

Author: Amit Langote
Co-authored-by: David Rowley
Reviewed-by: Junwang Zhao
Reviewed-by: David Rowley
Tested-by: Junwang Zhao
Tested-by: David Rowley
Discussion: https://postgr.es/m/CA+HiwqGaH-otvqW_ce-paL=96JvU4j+Xbuk+14esJNDwefdkOg@mail.gmail.com
---
 src/backend/executor/execScan.c    | 207 ++----------------------
 src/backend/executor/nodeSeqscan.c | 115 +++++++++++++-
 src/include/executor/execScan.h    | 247 +++++++++++++++++++++++++++++
 3 files changed, 366 insertions(+), 203 deletions(-)
 create mode 100644 src/include/executor/execScan.h

diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 556a5d98e78..25a776a6a19 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -19,118 +19,9 @@
 #include "postgres.h"
 
 #include "executor/executor.h"
+#include "executor/execScan.h"
 #include "miscadmin.h"
 
-
-
-/*
- * ExecScanFetch -- check interrupts & fetch next potential tuple
- *
- * This routine is concerned with substituting a test tuple if we are
- * inside an EvalPlanQual recheck.  If we aren't, just execute
- * the access method's next-tuple routine.
- */
-static inline TupleTableSlot *
-ExecScanFetch(ScanState *node,
-			  ExecScanAccessMtd accessMtd,
-			  ExecScanRecheckMtd recheckMtd)
-{
-	EState	   *estate = node->ps.state;
-
-	CHECK_FOR_INTERRUPTS();
-
-	if (estate->es_epq_active != NULL)
-	{
-		EPQState   *epqstate = estate->es_epq_active;
-
-		/*
-		 * We are inside an EvalPlanQual recheck.  Return the test tuple if
-		 * one is available, after rechecking any access-method-specific
-		 * conditions.
-		 */
-		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
-
-		if (scanrelid == 0)
-		{
-			/*
-			 * This is a ForeignScan or CustomScan which has pushed down a
-			 * join to the remote side.  The recheck method is responsible not
-			 * only for rechecking the scan/join quals but also for storing
-			 * the correct tuple in the slot.
-			 */
-
-			TupleTableSlot *slot = node->ss_ScanTupleSlot;
-
-			if (!(*recheckMtd) (node, slot))
-				ExecClearTuple(slot);	/* would not be returned by scan */
-			return slot;
-		}
-		else if (epqstate->relsubs_done[scanrelid - 1])
-		{
-			/*
-			 * Return empty slot, as either there is no EPQ tuple for this rel
-			 * or we already returned it.
-			 */
-
-			TupleTableSlot *slot = node->ss_ScanTupleSlot;
-
-			return ExecClearTuple(slot);
-		}
-		else if (epqstate->relsubs_slot[scanrelid - 1] != NULL)
-		{
-			/*
-			 * Return replacement tuple provided by the EPQ caller.
-			 */
-
-			TupleTableSlot *slot = epqstate->relsubs_slot[scanrelid - 1];
-
-			Assert(epqstate->relsubs_rowmark[scanrelid - 1] == NULL);
-
-			/* Mark to remember that we shouldn't return it again */
-			epqstate->relsubs_done[scanrelid - 1] = true;
-
-			/* Return empty slot if we haven't got a test tuple */
-			if (TupIsNull(slot))
-				return NULL;
-
-			/* Check if it meets the access-method conditions */
-			if (!(*recheckMtd) (node, slot))
-				return ExecClearTuple(slot);	/* would not be returned by
-												 * scan */
-			return slot;
-		}
-		else if (epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
-		{
-			/*
-			 * Fetch and return replacement tuple using a non-locking rowmark.
-			 */
-
-			TupleTableSlot *slot = node->ss_ScanTupleSlot;
-
-			/* Mark to remember that we shouldn't return more */
-			epqstate->relsubs_done[scanrelid - 1] = true;
-
-			if (!EvalPlanQualFetchRowMark(epqstate, scanrelid, slot))
-				return NULL;
-
-			/* Return empty slot if we haven't got a test tuple */
-			if (TupIsNull(slot))
-				return NULL;
-
-			/* Check if it meets the access-method conditions */
-			if (!(*recheckMtd) (node, slot))
-				return ExecClearTuple(slot);	/* would not be returned by
-												 * scan */
-			return slot;
-		}
-	}
-
-	/*
-	 * Run the node-type-specific access method function to get the next tuple
-	 */
-	return (*accessMtd) (node);
-}
-
 /* ----------------------------------------------------------------
  *		ExecScan
  *
@@ -157,100 +48,20 @@ ExecScan(ScanState *node,
 		 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
 		 ExecScanRecheckMtd recheckMtd)
 {
-	ExprContext *econtext;
+	EPQState *epqstate;
 	ExprState  *qual;
 	ProjectionInfo *projInfo;
 
-	/*
-	 * Fetch data from node
-	 */
+	epqstate = node->ps.state->es_epq_active;
 	qual = node->ps.qual;
 	projInfo = node->ps.ps_ProjInfo;
-	econtext = node->ps.ps_ExprContext;
-
-	/* interrupt checks are in ExecScanFetch */
-
-	/*
-	 * If we have neither a qual to check nor a projection to do, just skip
-	 * all the overhead and return the raw scan tuple.
-	 */
-	if (!qual && !projInfo)
-	{
-		ResetExprContext(econtext);
-		return ExecScanFetch(node, accessMtd, recheckMtd);
-	}
-
-	/*
-	 * Reset per-tuple memory context to free any expression evaluation
-	 * storage allocated in the previous tuple cycle.
-	 */
-	ResetExprContext(econtext);
-
-	/*
-	 * get a tuple from the access method.  Loop until we obtain a tuple that
-	 * passes the qualification.
-	 */
-	for (;;)
-	{
-		TupleTableSlot *slot;
 
-		slot = ExecScanFetch(node, accessMtd, recheckMtd);
-
-		/*
-		 * if the slot returned by the accessMtd contains NULL, then it means
-		 * there is nothing more to scan so we just return an empty slot,
-		 * being careful to use the projection result slot so it has correct
-		 * tupleDesc.
-		 */
-		if (TupIsNull(slot))
-		{
-			if (projInfo)
-				return ExecClearTuple(projInfo->pi_state.resultslot);
-			else
-				return slot;
-		}
-
-		/*
-		 * place the current tuple into the expr context
-		 */
-		econtext->ecxt_scantuple = slot;
-
-		/*
-		 * check that the current tuple satisfies the qual-clause
-		 *
-		 * check for non-null qual here to avoid a function call to ExecQual()
-		 * when the qual is null ... saves only a few cycles, but they add up
-		 * ...
-		 */
-		if (qual == NULL || ExecQual(qual, econtext))
-		{
-			/*
-			 * Found a satisfactory scan tuple.
-			 */
-			if (projInfo)
-			{
-				/*
-				 * Form a projection tuple, store it in the result tuple slot
-				 * and return it.
-				 */
-				return ExecProject(projInfo);
-			}
-			else
-			{
-				/*
-				 * Here, we aren't projecting, so just return scan tuple.
-				 */
-				return slot;
-			}
-		}
-		else
-			InstrCountFiltered1(node, 1);
-
-		/*
-		 * Tuple fails qual, so free per-tuple memory and try again.
-		 */
-		ResetExprContext(econtext);
-	}
+	return ExecScanExtended(node,
+							accessMtd,
+							recheckMtd,
+							epqstate,
+							qual,
+							projInfo);
 }
 
 /*
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index fa2d522b25f..f93ccc761eb 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -29,6 +29,7 @@
 
 #include "access/relscan.h"
 #include "access/tableam.h"
+#include "executor/execScan.h"
 #include "executor/executor.h"
 #include "executor/nodeSeqscan.h"
 #include "utils/rel.h"
@@ -99,9 +100,10 @@ SeqRecheck(SeqScanState *node, TupleTableSlot *slot)
  *		ExecSeqScan(node)
  *
  *		Scans the relation sequentially and returns the next qualifying
- *		tuple.
- *		We call the ExecScan() routine and pass it the appropriate
- *		access method functions.
+ *		tuple. This variant is used when there is no es_eqp_active, no qual
+ *		and no projection.  Passing const-NULLs for these to ExecScanExtended
+ *		allows the compiler to eliminate the additional code that would
+ *		ordinarily be required for evalualtion of these.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -109,12 +111,94 @@ ExecSeqScan(PlanState *pstate)
 {
 	SeqScanState *node = castNode(SeqScanState, pstate);
 
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual == NULL);
+	Assert(pstate->ps_ProjInfo == NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							NULL,
+							NULL);
+}
+
+/*
+ * Variant of ExecSeqScan() but when qual evaluation is required.
+ */
+static TupleTableSlot *
+ExecSeqScanWithQual(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual != NULL);
+	Assert(pstate->ps_ProjInfo == NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							pstate->qual,
+							NULL);
+}
+
+/*
+ * Variant of ExecSeqScan() but when projection is required.
+ */
+static TupleTableSlot *
+ExecSeqScanWithProject(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual == NULL);
+	Assert(pstate->ps_ProjInfo != NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							NULL,
+							pstate->ps_ProjInfo);
+}
+
+/*
+ * Variant of ExecSeqScan() but when qual evaluation and projection are
+ * required.
+ */
+static TupleTableSlot *
+ExecSeqScanWithQualProject(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual != NULL);
+	Assert(pstate->ps_ProjInfo != NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							pstate->qual,
+							pstate->ps_ProjInfo);
+}
+
+/*
+ * Variant of ExecSeqScan for when EPQ evaluation is required.  We don't
+ * bother adding variants of this for with/without qual and projection as
+ * EPQ doesn't seem as exciting a case to optimize for.
+ */
+static TupleTableSlot *
+ExecSeqScanEPQ(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
 	return ExecScan(&node->ss,
 					(ExecScanAccessMtd) SeqNext,
 					(ExecScanRecheckMtd) SeqRecheck);
 }
 
-
 /* ----------------------------------------------------------------
  *		ExecInitSeqScan
  * ----------------------------------------------------------------
@@ -137,7 +221,6 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate = makeNode(SeqScanState);
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
-	scanstate->ss.ps.ExecProcNode = ExecSeqScan;
 
 	/*
 	 * Miscellaneous initialization
@@ -171,6 +254,28 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.qual =
 		ExecInitQual(node->scan.plan.qual, (PlanState *) scanstate);
 
+	/*
+	 * When EvalPlanQual() is not in use, assign ExecProcNode for this node
+	 * based on the presence of qual and projection. Each ExecSeqScan*()
+	 * variant is optimized for the specific combination of these conditions.
+	 */
+	if (scanstate->ss.ps.state->es_epq_active != NULL)
+		scanstate->ss.ps.ExecProcNode = ExecSeqScanEPQ;
+	else if (scanstate->ss.ps.qual == NULL)
+	{
+		if (scanstate->ss.ps.ps_ProjInfo == NULL)
+			scanstate->ss.ps.ExecProcNode = ExecSeqScan;
+		else
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanWithProject;
+	}
+	else
+	{
+		if (scanstate->ss.ps.ps_ProjInfo == NULL)
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanWithQual;
+		else
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanWithQualProject;
+	}
+
 	return scanstate;
 }
 
diff --git a/src/include/executor/execScan.h b/src/include/executor/execScan.h
new file mode 100644
index 00000000000..194be0ea1c0
--- /dev/null
+++ b/src/include/executor/execScan.h
@@ -0,0 +1,247 @@
+/*-------------------------------------------------------------------------
+ * execScan.h
+ *		Inline-able support functions for Scan nodes
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *		src/include/executor/execScan.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef EXECASYNC_H
+#define EXECASYNC_H
+
+#include "miscadmin.h"
+#include "executor/executor.h"
+#include "nodes/execnodes.h"
+
+/*
+ * ExecScanFetch -- check interrupts & fetch next potential tuple
+ *
+ * This routine substitutes a test tuple if inside an EvalPlanQual recheck.
+ * Otherwise, it simply executes the access method's next-tuple routine.
+ *
+ * The pg_attribute_always_inline attribute allows the compiler to inline
+ * this function into its caller. When EPQState is NULL, the EvalPlanQual
+ * logic is completely eliminated at compile time, avoiding unnecessary
+ * run-time checks and code for cases where EPQ is not required.
+ */
+static pg_attribute_always_inline TupleTableSlot *
+ExecScanFetch(ScanState *node,
+			  EPQState *epqstate,
+			  ExecScanAccessMtd accessMtd,
+			  ExecScanRecheckMtd recheckMtd)
+{
+	CHECK_FOR_INTERRUPTS();
+
+	if (epqstate != NULL)
+	{
+		/*
+		 * We are inside an EvalPlanQual recheck.  Return the test tuple if
+		 * one is available, after rechecking any access-method-specific
+		 * conditions.
+		 */
+		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
+
+		if (scanrelid == 0)
+		{
+			/*
+			 * This is a ForeignScan or CustomScan which has pushed down a
+			 * join to the remote side.  The recheck method is responsible not
+			 * only for rechecking the scan/join quals but also for storing
+			 * the correct tuple in the slot.
+			 */
+
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			if (!(*recheckMtd) (node, slot))
+				ExecClearTuple(slot);	/* would not be returned by scan */
+			return slot;
+		}
+		else if (epqstate->relsubs_done[scanrelid - 1])
+		{
+			/*
+			 * Return empty slot, as either there is no EPQ tuple for this rel
+			 * or we already returned it.
+			 */
+
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			return ExecClearTuple(slot);
+		}
+		else if (epqstate->relsubs_slot[scanrelid - 1] != NULL)
+		{
+			/*
+			 * Return replacement tuple provided by the EPQ caller.
+			 */
+
+			TupleTableSlot *slot = epqstate->relsubs_slot[scanrelid - 1];
+
+			Assert(epqstate->relsubs_rowmark[scanrelid - 1] == NULL);
+
+			/* Mark to remember that we shouldn't return it again */
+			epqstate->relsubs_done[scanrelid - 1] = true;
+
+			/* Return empty slot if we haven't got a test tuple */
+			if (TupIsNull(slot))
+				return NULL;
+
+			/* Check if it meets the access-method conditions */
+			if (!(*recheckMtd) (node, slot))
+				return ExecClearTuple(slot);	/* would not be returned by
+												 * scan */
+			return slot;
+		}
+		else if (epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/*
+			 * Fetch and return replacement tuple using a non-locking rowmark.
+			 */
+
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			/* Mark to remember that we shouldn't return more */
+			epqstate->relsubs_done[scanrelid - 1] = true;
+
+			if (!EvalPlanQualFetchRowMark(epqstate, scanrelid, slot))
+				return NULL;
+
+			/* Return empty slot if we haven't got a test tuple */
+			if (TupIsNull(slot))
+				return NULL;
+
+			/* Check if it meets the access-method conditions */
+			if (!(*recheckMtd) (node, slot))
+				return ExecClearTuple(slot);	/* would not be returned by
+												 * scan */
+			return slot;
+		}
+	}
+
+	/*
+	 * Run the node-type-specific access method function to get the next tuple
+	 */
+	return (*accessMtd) (node);
+}
+
+/* ----------------------------------------------------------------
+ * ExecScanExtended
+ *		Scans the relation using the given 'access method' and returns
+ *		the next qualifying tuple. The tuple is optionally checked
+ *		against 'qual' and, if provided, projected using 'projInfo'.
+ *
+ * The 'recheck method' validates an arbitrary tuple of the relation
+ * against conditions enforced by the access method.
+ *
+ * This function is an alternative to ExecScan, used when callers
+ * may omit 'qual' or 'projInfo'. The pg_attribute_always_inline
+ * attribute allows the compiler to eliminate non-relevant branches
+ * at compile time, avoiding run-time checks in those cases.
+ *
+ * Conditions:
+ *	-- The AMI "cursor" is positioned at the previously returned tuple.
+ *
+ * Initial States:
+ *	-- The relation is opened for scanning, with the "cursor"
+ *	positioned before the first qualifying tuple.
+ * ----------------------------------------------------------------
+ */
+
+static pg_attribute_always_inline TupleTableSlot *
+ExecScanExtended(ScanState *node,
+				 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
+				 ExecScanRecheckMtd recheckMtd,
+				 EPQState *epqstate,
+				 ExprState *qual,
+				 ProjectionInfo *projInfo)
+{
+	ExprContext *econtext = node->ps.ps_ExprContext;
+
+	/* interrupt checks are in ExecScanFetch */
+
+	/*
+	 * If we have neither a qual to check nor a projection to do, just skip
+	 * all the overhead and return the raw scan tuple.
+	 */
+	if (!qual && !projInfo)
+	{
+		ResetExprContext(econtext);
+		return ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
+	}
+
+	/*
+	 * Reset per-tuple memory context to free any expression evaluation
+	 * storage allocated in the previous tuple cycle.
+	 */
+	ResetExprContext(econtext);
+
+	/*
+	 * get a tuple from the access method.  Loop until we obtain a tuple that
+	 * passes the qualification.
+	 */
+	for (;;)
+	{
+		TupleTableSlot *slot;
+
+		slot = ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
+
+		/*
+		 * if the slot returned by the accessMtd contains NULL, then it means
+		 * there is nothing more to scan so we just return an empty slot,
+		 * being careful to use the projection result slot so it has correct
+		 * tupleDesc.
+		 */
+		if (TupIsNull(slot))
+		{
+			if (projInfo)
+				return ExecClearTuple(projInfo->pi_state.resultslot);
+			else
+				return slot;
+		}
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-null qual here to avoid a function call to ExecQual()
+		 * when the qual is null ... saves only a few cycles, but they add up
+		 * ...
+		 */
+		if (qual == NULL || ExecQual(qual, econtext))
+		{
+			/*
+			 * Found a satisfactory scan tuple.
+			 */
+			if (projInfo)
+			{
+				/*
+				 * Form a projection tuple, store it in the result tuple slot
+				 * and return it.
+				 */
+				return ExecProject(projInfo);
+			}
+			else
+			{
+				/*
+				 * Here, we aren't projecting, so just return scan tuple.
+				 */
+				return slot;
+			}
+		}
+		else
+			InstrCountFiltered1(node, 1);
+
+		/*
+		 * Tuple fails qual, so free per-tuple memory and try again.
+		 */
+		ResetExprContext(econtext);
+	}
+}
+
+#endif							/* EXECASYNC_H */
-- 
2.43.0

v2-0002-Break-ExecScanExtended-into-variants-based-on-qua.patchapplication/octet-stream; name=v2-0002-Break-ExecScanExtended-into-variants-based-on-qua.patchDownload

From f09505e3e2d109afc4b0771ace0c23294ce637b6 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 9 Jan 2025 22:45:44 +0900
Subject: [PATCH v2 2/2] Break ExecScanExtended() into variants based on qual
 and proj nullness

---
 src/include/executor/execScan.h | 289 ++++++++++++++++++++++++--------
 1 file changed, 216 insertions(+), 73 deletions(-)

diff --git a/src/include/executor/execScan.h b/src/include/executor/execScan.h
index 194be0ea1c0..e067e5349a8 100644
--- a/src/include/executor/execScan.h
+++ b/src/include/executor/execScan.h
@@ -18,25 +18,17 @@
 #include "nodes/execnodes.h"
 
 /*
- * ExecScanFetch -- check interrupts & fetch next potential tuple
+ * ExecScanGetEPQTuple -- substitutes a test tuple for EvalPlanQual recheck.
  *
- * This routine substitutes a test tuple if inside an EvalPlanQual recheck.
- * Otherwise, it simply executes the access method's next-tuple routine.
- *
- * The pg_attribute_always_inline attribute allows the compiler to inline
- * this function into its caller. When EPQState is NULL, the EvalPlanQual
- * logic is completely eliminated at compile time, avoiding unnecessary
- * run-time checks and code for cases where EPQ is not required.
+ * Must only be called if the Scan is running under EvalPlanQual().
  */
 static pg_attribute_always_inline TupleTableSlot *
-ExecScanFetch(ScanState *node,
-			  EPQState *epqstate,
-			  ExecScanAccessMtd accessMtd,
-			  ExecScanRecheckMtd recheckMtd)
+ExecScanGetEPQTuple(ScanState *node,
+					EPQState *epqstate,
+					ExecScanRecheckMtd recheckMtd)
 {
-	CHECK_FOR_INTERRUPTS();
+	Assert(epqstate != NULL);
 
-	if (epqstate != NULL)
 	{
 		/*
 		 * We are inside an EvalPlanQual recheck.  Return the test tuple if
@@ -120,56 +112,160 @@ ExecScanFetch(ScanState *node,
 		}
 	}
 
+	Assert(false);
+	return NULL;
+}
+
+/*
+ * Fetches tuples using the access method callback until one is found that
+ * safisfies the 'qual'.
+ */
+static pg_attribute_always_inline TupleTableSlot *
+ExecScanWithQualNoProj(ScanState *node,
+					   ExecScanAccessMtd accessMtd,	/* function returning a tuple */
+					   ExecScanRecheckMtd recheckMtd,
+					   EPQState *epqstate,
+					   ExprState *qual)
+{
+	ExprContext *econtext = node->ps.ps_ExprContext;
+
+	Assert(qual != NULL);
+
 	/*
-	 * Run the node-type-specific access method function to get the next tuple
+	 * Reset per-tuple memory context to free any expression evaluation
+	 * storage allocated in the previous tuple cycle.
+	 */
+	ResetExprContext(econtext);
+
+	/*
+	 * get a tuple from the access method.  Loop until we obtain a tuple that
+	 * passes the qualification.
 	 */
-	return (*accessMtd) (node);
+	for (;;)
+	{
+		TupleTableSlot *slot;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* interrupt checks are in ExecScanFetch() when it's used */
+		if (epqstate == NULL)
+		{
+			slot = (*accessMtd) (node);
+		}
+		else
+			slot = ExecScanGetEPQTuple(node, epqstate, recheckMtd);
+
+		/*
+		 * if the slot returned by the accessMtd contains NULL, then it means
+		 * there is nothing more to scan so we just return an empty slot,
+		 * being careful to use the projection result slot so it has correct
+		 * tupleDesc.
+		 */
+		if (TupIsNull(slot))
+			return slot;
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-null qual here to avoid a function call to ExecQual()
+		 * when the qual is null ... saves only a few cycles, but they add up
+		 * ...
+		 */
+		if (ExecQual(qual, econtext))
+		{
+			/*
+			 * Found a satisfactory scan tuple.
+			 *
+			 * Here, we aren't projecting, so just return scan tuple.
+			 */
+			return slot;
+		}
+		else
+			InstrCountFiltered1(node, 1);
+
+		/*
+		 * Tuple fails qual, so free per-tuple memory and try again.
+		 */
+		ResetExprContext(econtext);
+	}
 }
 
-/* ----------------------------------------------------------------
- * ExecScanExtended
- *		Scans the relation using the given 'access method' and returns
- *		the next qualifying tuple. The tuple is optionally checked
- *		against 'qual' and, if provided, projected using 'projInfo'.
- *
- * The 'recheck method' validates an arbitrary tuple of the relation
- * against conditions enforced by the access method.
- *
- * This function is an alternative to ExecScan, used when callers
- * may omit 'qual' or 'projInfo'. The pg_attribute_always_inline
- * attribute allows the compiler to eliminate non-relevant branches
- * at compile time, avoiding run-time checks in those cases.
- *
- * Conditions:
- *	-- The AMI "cursor" is positioned at the previously returned tuple.
- *
- * Initial States:
- *	-- The relation is opened for scanning, with the "cursor"
- *	positioned before the first qualifying tuple.
- * ----------------------------------------------------------------
+/*
+ * Fetches the next tuple using the access method callback and returns the
+ * tuple obtained by projecting using the 'projInfo'.
  */
-
 static pg_attribute_always_inline TupleTableSlot *
-ExecScanExtended(ScanState *node,
-				 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
-				 ExecScanRecheckMtd recheckMtd,
-				 EPQState *epqstate,
-				 ExprState *qual,
-				 ProjectionInfo *projInfo)
+ExecScanWithProjNoQual(ScanState *node,
+					   ExecScanAccessMtd accessMtd,	/* function returning a tuple */
+					   ExecScanRecheckMtd recheckMtd,
+					   EPQState *epqstate,
+					   ProjectionInfo *projInfo)
 {
 	ExprContext *econtext = node->ps.ps_ExprContext;
+	TupleTableSlot *slot;
 
-	/* interrupt checks are in ExecScanFetch */
+	Assert(projInfo != NULL);
+
+	CHECK_FOR_INTERRUPTS();
 
 	/*
-	 * If we have neither a qual to check nor a projection to do, just skip
-	 * all the overhead and return the raw scan tuple.
+	 * Reset per-tuple memory context to free any expression evaluation
+	 * storage allocated in the previous tuple cycle.
 	 */
-	if (!qual && !projInfo)
+	ResetExprContext(econtext);
+
+
+	/* interrupt checks are in ExecScanFetch() when it's used */
+	if (epqstate == NULL)
 	{
-		ResetExprContext(econtext);
-		return ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
+		slot = (*accessMtd) (node);
 	}
+	else
+		slot = ExecScanGetEPQTuple(node, epqstate, recheckMtd);
+
+	/*
+	 * if the slot returned by the accessMtd contains NULL, then it means
+	 * there is nothing more to scan so we just return an empty slot,
+	 * being careful to use the projection result slot so it has correct
+	 * tupleDesc.
+	 */
+	if (TupIsNull(slot))
+		return ExecClearTuple(projInfo->pi_state.resultslot);
+
+	/*
+	 * place the current tuple into the expr context
+	 */
+	econtext->ecxt_scantuple = slot;
+
+	/*
+	 * Form a projection tuple, store it in the result tuple slot
+	 * and return it.
+	 */
+	return ExecProject(projInfo);
+}
+
+/*
+ * Fetches tuples using the access method callback until one is found that
+ * safisfies the 'qual' and returns the tuple obtained by projecting using the
+ * 'projInfo'.
+ */
+static pg_attribute_always_inline TupleTableSlot *
+ExecScanWithQualAndProj(ScanState *node,
+						 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
+						 ExecScanRecheckMtd recheckMtd,
+						 EPQState *epqstate,
+						 ExprState *qual,
+						 ProjectionInfo *projInfo)
+{
+	ExprContext *econtext = node->ps.ps_ExprContext;
+
+	Assert(qual != NULL);
+	Assert(projInfo != NULL);
 
 	/*
 	 * Reset per-tuple memory context to free any expression evaluation
@@ -185,7 +281,15 @@ ExecScanExtended(ScanState *node,
 	{
 		TupleTableSlot *slot;
 
-		slot = ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
+		CHECK_FOR_INTERRUPTS();
+
+		/* interrupt checks are in ExecScanFetch() when it's used */
+		if (epqstate == NULL)
+		{
+			slot = (*accessMtd) (node);
+		}
+		else
+			slot = ExecScanGetEPQTuple(node, epqstate, recheckMtd);
 
 		/*
 		 * if the slot returned by the accessMtd contains NULL, then it means
@@ -194,12 +298,7 @@ ExecScanExtended(ScanState *node,
 		 * tupleDesc.
 		 */
 		if (TupIsNull(slot))
-		{
-			if (projInfo)
-				return ExecClearTuple(projInfo->pi_state.resultslot);
-			else
-				return slot;
-		}
+			return ExecClearTuple(projInfo->pi_state.resultslot);
 
 		/*
 		 * place the current tuple into the expr context
@@ -213,26 +312,15 @@ ExecScanExtended(ScanState *node,
 		 * when the qual is null ... saves only a few cycles, but they add up
 		 * ...
 		 */
-		if (qual == NULL || ExecQual(qual, econtext))
+		if (ExecQual(qual, econtext))
 		{
 			/*
 			 * Found a satisfactory scan tuple.
+			 *
+			 * Form a projection tuple, store it in the result tuple slot
+			 * and return it.
 			 */
-			if (projInfo)
-			{
-				/*
-				 * Form a projection tuple, store it in the result tuple slot
-				 * and return it.
-				 */
-				return ExecProject(projInfo);
-			}
-			else
-			{
-				/*
-				 * Here, we aren't projecting, so just return scan tuple.
-				 */
-				return slot;
-			}
+			return ExecProject(projInfo);
 		}
 		else
 			InstrCountFiltered1(node, 1);
@@ -244,4 +332,59 @@ ExecScanExtended(ScanState *node,
 	}
 }
 
+/* ----------------------------------------------------------------
+ * ExecScanExtended
+ *		Scans the relation using the given 'access method' and returns
+ *		the next qualifying tuple. The tuple is optionally checked
+ *		against 'qual' and, if provided, projected using 'projInfo'.
+ *
+ * The 'recheck method' validates an arbitrary tuple of the relation
+ * against conditions enforced by the access method.
+ *
+ * This function is an alternative to ExecScan, used when callers
+ * may omit 'qual' or 'projInfo'. The pg_attribute_always_inline
+ * attribute allows the compiler to eliminate non-relevant branches
+ * at compile time, avoiding run-time checks in those cases.
+ *
+ * Conditions:
+ *	-- The AMI "cursor" is positioned at the previously returned tuple.
+ *
+ * Initial States:
+ *	-- The relation is opened for scanning, with the "cursor"
+ *	positioned before the first qualifying tuple.
+ * ----------------------------------------------------------------
+ */
+
+static pg_attribute_always_inline TupleTableSlot *
+ExecScanExtended(ScanState *node,
+				 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
+				 ExecScanRecheckMtd recheckMtd,
+				 EPQState *epqstate,
+				 ExprState *qual,
+				 ProjectionInfo *projInfo)
+{
+	if (qual != NULL && projInfo != NULL)
+		return ExecScanWithQualAndProj(node, accessMtd, recheckMtd, epqstate, qual, projInfo);
+	else if (qual != NULL)
+		return ExecScanWithQualNoProj(node, accessMtd, recheckMtd, epqstate, qual);
+	else if (projInfo != NULL)
+		return ExecScanWithProjNoQual(node, accessMtd, recheckMtd, epqstate, projInfo);
+	/*
+	 * If we have neither a qual to check nor a projection to do, just skip
+	 * all the overhead and return the raw scan tuple.
+	 */
+	else
+	{
+		CHECK_FOR_INTERRUPTS();
+		ResetExprContext(node->ps.ps_ExprContext);
+		if (epqstate == NULL)
+			return (*accessMtd) (node);
+		else
+			return ExecScanGetEPQTuple(node, epqstate, recheckMtd);
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 #endif							/* EXECASYNC_H */
-- 
2.43.0

David Rowley

dgrowleyml@gmail.com

about 1 year ago

In reply to: Amit Langote (#3)

1 attachment(s)

Re: Some ExecSeqScan optimizations

On Fri, 10 Jan 2025 at 02:46, Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Jan 6, 2025 at 10:18 PM David Rowley <dgrowleyml@gmail.com> wrote:

I've attached my workings of what I was messing around with. It seems
to perform about the same as your version. I think maybe we'd need
some sort of execScan.h instead of where I've stuffed the functions
in.

I've done that in the attached v2.

I think 0001 looks ok, aside from what the attached fixes. (at least
one is my mistake)

Did you test the performance of 0002? I didn't look at it hard enough
to understand what you've done. I can look if performance tests show
that it might be worthwhile considering.

David

Attachments:

minor_fixes.txttext/plain; charset=US-ASCII; name=minor_fixes.txtDownload

diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index f93ccc761e..6f9e991eea 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -103,7 +103,7 @@ SeqRecheck(SeqScanState *node, TupleTableSlot *slot)
  *		tuple. This variant is used when there is no es_eqp_active, no qual
  *		and no projection.  Passing const-NULLs for these to ExecScanExtended
  *		allows the compiler to eliminate the additional code that would
- *		ordinarily be required for evalualtion of these.
+ *		ordinarily be required for the evaluation of these.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
diff --git a/src/include/executor/execScan.h b/src/include/executor/execScan.h
index 194be0ea1c..da8e5ab8a7 100644
--- a/src/include/executor/execScan.h
+++ b/src/include/executor/execScan.h
@@ -10,8 +10,8 @@
  *-------------------------------------------------------------------------
  */
 
-#ifndef EXECASYNC_H
-#define EXECASYNC_H
+#ifndef EXECSCAN_H
+#define EXECSCAN_H
 
 #include "miscadmin.h"
 #include "executor/executor.h"
@@ -148,7 +148,6 @@ ExecScanFetch(ScanState *node,
  *	positioned before the first qualifying tuple.
  * ----------------------------------------------------------------
  */
-
 static pg_attribute_always_inline TupleTableSlot *
 ExecScanExtended(ScanState *node,
 				 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
@@ -244,4 +243,4 @@ ExecScanExtended(ScanState *node,
 	}
 }
 
-#endif							/* EXECASYNC_H */
+#endif							/* EXECSCAN_H */

Amit Langote

amitlangote09@gmail.com

about 1 year ago

In reply to: David Rowley (#4)

2 attachment(s)

Re: Some ExecSeqScan optimizations

On Fri, Jan 10, 2025 at 1:06 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 10 Jan 2025 at 02:46, Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Jan 6, 2025 at 10:18 PM David Rowley <dgrowleyml@gmail.com> wrote:

I've attached my workings of what I was messing around with. It seems
to perform about the same as your version. I think maybe we'd need
some sort of execScan.h instead of where I've stuffed the functions
in.

I've done that in the attached v2.

I think 0001 looks ok, aside from what the attached fixes. (at least
one is my mistake)

Oops, thanks for the fixes. Attaching an updated version.

Did you test the performance of 0002? I didn't look at it hard enough
to understand what you've done.

I reran the test suite I used before and I don't see a consistent
improvement due to 0002 or perhaps rather degradation. I've saved the
results in the sheet named 2025-01-10 in the spreadsheet at [1].

Comparing the latency for the query `select count(*) from test_table
where <first_column> = <nonexistant_value>` (where test_table has 30
integer columns and 1 million rows in it) between v17, master, and the
patched (0001 or 0001+0002) shows an improvement of close to 10% with
the patch.

-- v17
select count(*) from test_table where a_1 = 1000000;
count
-------
0
(1 row)
Time: 286.618 ms

-- master
select count(*) from test_table where a_1 = 1000000;
count
-------
0
(1 row)
Time: 283.564 ms

-- patched (0001+0002)
select count(*) from test_table where a_1 = 1000000;
count
-------
0
(1 row)
Time: 260.547 ms

Note that I turned off Gather for these tests, because then I find
that the improvements to ExecScan() are better measurable.

I can look if performance tests show
that it might be worthwhile considering.

Sure, that would be great.

--
Thanks, Amit Langote

Attachments:

v3-0002-Break-ExecScanExtended-into-variants-based-on-qua.patchapplication/octet-stream; name=v3-0002-Break-ExecScanExtended-into-variants-based-on-qua.patchDownload

From 9fef3794c49c36204a4e03b4389f583a5d3013f8 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 9 Jan 2025 22:45:44 +0900
Subject: [PATCH v3 2/2] Break ExecScanExtended() into variants based on qual
 and proj nullness

---
 src/include/executor/execScan.h | 287 ++++++++++++++++++++++++--------
 1 file changed, 215 insertions(+), 72 deletions(-)

diff --git a/src/include/executor/execScan.h b/src/include/executor/execScan.h
index da8e5ab8a76..82bbd24ca0a 100644
--- a/src/include/executor/execScan.h
+++ b/src/include/executor/execScan.h
@@ -18,25 +18,17 @@
 #include "nodes/execnodes.h"
 
 /*
- * ExecScanFetch -- check interrupts & fetch next potential tuple
+ * ExecScanGetEPQTuple -- substitutes a test tuple for EvalPlanQual recheck.
  *
- * This routine substitutes a test tuple if inside an EvalPlanQual recheck.
- * Otherwise, it simply executes the access method's next-tuple routine.
- *
- * The pg_attribute_always_inline attribute allows the compiler to inline
- * this function into its caller. When EPQState is NULL, the EvalPlanQual
- * logic is completely eliminated at compile time, avoiding unnecessary
- * run-time checks and code for cases where EPQ is not required.
+ * Must only be called if the Scan is running under EvalPlanQual().
  */
 static pg_attribute_always_inline TupleTableSlot *
-ExecScanFetch(ScanState *node,
-			  EPQState *epqstate,
-			  ExecScanAccessMtd accessMtd,
-			  ExecScanRecheckMtd recheckMtd)
+ExecScanGetEPQTuple(ScanState *node,
+					EPQState *epqstate,
+					ExecScanRecheckMtd recheckMtd)
 {
-	CHECK_FOR_INTERRUPTS();
+	Assert(epqstate != NULL);
 
-	if (epqstate != NULL)
 	{
 		/*
 		 * We are inside an EvalPlanQual recheck.  Return the test tuple if
@@ -120,55 +112,160 @@ ExecScanFetch(ScanState *node,
 		}
 	}
 
+	Assert(false);
+	return NULL;
+}
+
+/*
+ * Fetches tuples using the access method callback until one is found that
+ * safisfies the 'qual'.
+ */
+static pg_attribute_always_inline TupleTableSlot *
+ExecScanWithQualNoProj(ScanState *node,
+					   ExecScanAccessMtd accessMtd,	/* function returning a tuple */
+					   ExecScanRecheckMtd recheckMtd,
+					   EPQState *epqstate,
+					   ExprState *qual)
+{
+	ExprContext *econtext = node->ps.ps_ExprContext;
+
+	Assert(qual != NULL);
+
 	/*
-	 * Run the node-type-specific access method function to get the next tuple
+	 * Reset per-tuple memory context to free any expression evaluation
+	 * storage allocated in the previous tuple cycle.
+	 */
+	ResetExprContext(econtext);
+
+	/*
+	 * get a tuple from the access method.  Loop until we obtain a tuple that
+	 * passes the qualification.
 	 */
-	return (*accessMtd) (node);
+	for (;;)
+	{
+		TupleTableSlot *slot;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* interrupt checks are in ExecScanFetch() when it's used */
+		if (epqstate == NULL)
+		{
+			slot = (*accessMtd) (node);
+		}
+		else
+			slot = ExecScanGetEPQTuple(node, epqstate, recheckMtd);
+
+		/*
+		 * if the slot returned by the accessMtd contains NULL, then it means
+		 * there is nothing more to scan so we just return an empty slot,
+		 * being careful to use the projection result slot so it has correct
+		 * tupleDesc.
+		 */
+		if (TupIsNull(slot))
+			return slot;
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-null qual here to avoid a function call to ExecQual()
+		 * when the qual is null ... saves only a few cycles, but they add up
+		 * ...
+		 */
+		if (ExecQual(qual, econtext))
+		{
+			/*
+			 * Found a satisfactory scan tuple.
+			 *
+			 * Here, we aren't projecting, so just return scan tuple.
+			 */
+			return slot;
+		}
+		else
+			InstrCountFiltered1(node, 1);
+
+		/*
+		 * Tuple fails qual, so free per-tuple memory and try again.
+		 */
+		ResetExprContext(econtext);
+	}
 }
 
-/* ----------------------------------------------------------------
- * ExecScanExtended
- *		Scans the relation using the given 'access method' and returns
- *		the next qualifying tuple. The tuple is optionally checked
- *		against 'qual' and, if provided, projected using 'projInfo'.
- *
- * The 'recheck method' validates an arbitrary tuple of the relation
- * against conditions enforced by the access method.
- *
- * This function is an alternative to ExecScan, used when callers
- * may omit 'qual' or 'projInfo'. The pg_attribute_always_inline
- * attribute allows the compiler to eliminate non-relevant branches
- * at compile time, avoiding run-time checks in those cases.
- *
- * Conditions:
- *	-- The AMI "cursor" is positioned at the previously returned tuple.
- *
- * Initial States:
- *	-- The relation is opened for scanning, with the "cursor"
- *	positioned before the first qualifying tuple.
- * ----------------------------------------------------------------
+/*
+ * Fetches the next tuple using the access method callback and returns the
+ * tuple obtained by projecting using the 'projInfo'.
  */
 static pg_attribute_always_inline TupleTableSlot *
-ExecScanExtended(ScanState *node,
-				 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
-				 ExecScanRecheckMtd recheckMtd,
-				 EPQState *epqstate,
-				 ExprState *qual,
-				 ProjectionInfo *projInfo)
+ExecScanWithProjNoQual(ScanState *node,
+					   ExecScanAccessMtd accessMtd,	/* function returning a tuple */
+					   ExecScanRecheckMtd recheckMtd,
+					   EPQState *epqstate,
+					   ProjectionInfo *projInfo)
 {
 	ExprContext *econtext = node->ps.ps_ExprContext;
+	TupleTableSlot *slot;
 
-	/* interrupt checks are in ExecScanFetch */
+	Assert(projInfo != NULL);
+
+	CHECK_FOR_INTERRUPTS();
 
 	/*
-	 * If we have neither a qual to check nor a projection to do, just skip
-	 * all the overhead and return the raw scan tuple.
+	 * Reset per-tuple memory context to free any expression evaluation
+	 * storage allocated in the previous tuple cycle.
 	 */
-	if (!qual && !projInfo)
+	ResetExprContext(econtext);
+
+
+	/* interrupt checks are in ExecScanFetch() when it's used */
+	if (epqstate == NULL)
 	{
-		ResetExprContext(econtext);
-		return ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
+		slot = (*accessMtd) (node);
 	}
+	else
+		slot = ExecScanGetEPQTuple(node, epqstate, recheckMtd);
+
+	/*
+	 * if the slot returned by the accessMtd contains NULL, then it means
+	 * there is nothing more to scan so we just return an empty slot,
+	 * being careful to use the projection result slot so it has correct
+	 * tupleDesc.
+	 */
+	if (TupIsNull(slot))
+		return ExecClearTuple(projInfo->pi_state.resultslot);
+
+	/*
+	 * place the current tuple into the expr context
+	 */
+	econtext->ecxt_scantuple = slot;
+
+	/*
+	 * Form a projection tuple, store it in the result tuple slot
+	 * and return it.
+	 */
+	return ExecProject(projInfo);
+}
+
+/*
+ * Fetches tuples using the access method callback until one is found that
+ * safisfies the 'qual' and returns the tuple obtained by projecting using the
+ * 'projInfo'.
+ */
+static pg_attribute_always_inline TupleTableSlot *
+ExecScanWithQualAndProj(ScanState *node,
+						 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
+						 ExecScanRecheckMtd recheckMtd,
+						 EPQState *epqstate,
+						 ExprState *qual,
+						 ProjectionInfo *projInfo)
+{
+	ExprContext *econtext = node->ps.ps_ExprContext;
+
+	Assert(qual != NULL);
+	Assert(projInfo != NULL);
 
 	/*
 	 * Reset per-tuple memory context to free any expression evaluation
@@ -184,7 +281,15 @@ ExecScanExtended(ScanState *node,
 	{
 		TupleTableSlot *slot;
 
-		slot = ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
+		CHECK_FOR_INTERRUPTS();
+
+		/* interrupt checks are in ExecScanFetch() when it's used */
+		if (epqstate == NULL)
+		{
+			slot = (*accessMtd) (node);
+		}
+		else
+			slot = ExecScanGetEPQTuple(node, epqstate, recheckMtd);
 
 		/*
 		 * if the slot returned by the accessMtd contains NULL, then it means
@@ -193,12 +298,7 @@ ExecScanExtended(ScanState *node,
 		 * tupleDesc.
 		 */
 		if (TupIsNull(slot))
-		{
-			if (projInfo)
-				return ExecClearTuple(projInfo->pi_state.resultslot);
-			else
-				return slot;
-		}
+			return ExecClearTuple(projInfo->pi_state.resultslot);
 
 		/*
 		 * place the current tuple into the expr context
@@ -212,26 +312,15 @@ ExecScanExtended(ScanState *node,
 		 * when the qual is null ... saves only a few cycles, but they add up
 		 * ...
 		 */
-		if (qual == NULL || ExecQual(qual, econtext))
+		if (ExecQual(qual, econtext))
 		{
 			/*
 			 * Found a satisfactory scan tuple.
+			 *
+			 * Form a projection tuple, store it in the result tuple slot
+			 * and return it.
 			 */
-			if (projInfo)
-			{
-				/*
-				 * Form a projection tuple, store it in the result tuple slot
-				 * and return it.
-				 */
-				return ExecProject(projInfo);
-			}
-			else
-			{
-				/*
-				 * Here, we aren't projecting, so just return scan tuple.
-				 */
-				return slot;
-			}
+			return ExecProject(projInfo);
 		}
 		else
 			InstrCountFiltered1(node, 1);
@@ -243,4 +332,58 @@ ExecScanExtended(ScanState *node,
 	}
 }
 
+/* ----------------------------------------------------------------
+ * ExecScanExtended
+ *		Scans the relation using the given 'access method' and returns
+ *		the next qualifying tuple. The tuple is optionally checked
+ *		against 'qual' and, if provided, projected using 'projInfo'.
+ *
+ * The 'recheck method' validates an arbitrary tuple of the relation
+ * against conditions enforced by the access method.
+ *
+ * This function is an alternative to ExecScan, used when callers
+ * may omit 'qual' or 'projInfo'. The pg_attribute_always_inline
+ * attribute allows the compiler to eliminate non-relevant branches
+ * at compile time, avoiding run-time checks in those cases.
+ *
+ * Conditions:
+ *	-- The AMI "cursor" is positioned at the previously returned tuple.
+ *
+ * Initial States:
+ *	-- The relation is opened for scanning, with the "cursor"
+ *	positioned before the first qualifying tuple.
+ * ----------------------------------------------------------------
+ */
+static pg_attribute_always_inline TupleTableSlot *
+ExecScanExtended(ScanState *node,
+				 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
+				 ExecScanRecheckMtd recheckMtd,
+				 EPQState *epqstate,
+				 ExprState *qual,
+				 ProjectionInfo *projInfo)
+{
+	if (qual != NULL && projInfo != NULL)
+		return ExecScanWithQualAndProj(node, accessMtd, recheckMtd, epqstate, qual, projInfo);
+	else if (qual != NULL)
+		return ExecScanWithQualNoProj(node, accessMtd, recheckMtd, epqstate, qual);
+	else if (projInfo != NULL)
+		return ExecScanWithProjNoQual(node, accessMtd, recheckMtd, epqstate, projInfo);
+	/*
+	 * If we have neither a qual to check nor a projection to do, just skip
+	 * all the overhead and return the raw scan tuple.
+	 */
+	else
+	{
+		CHECK_FOR_INTERRUPTS();
+		ResetExprContext(node->ps.ps_ExprContext);
+		if (epqstate == NULL)
+			return (*accessMtd) (node);
+		else
+			return ExecScanGetEPQTuple(node, epqstate, recheckMtd);
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 #endif							/* EXECSCAN_H */
-- 
2.43.0

v3-0001-Refactor-ExecScan-to-inline-scan-filtering-and-pr.patchapplication/octet-stream; name=v3-0001-Refactor-ExecScan-to-inline-scan-filtering-and-pr.patchDownload

From 538aae977fd53660aff6165675de43914fd62fbb Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 9 Jan 2025 16:34:55 +0900
Subject: [PATCH v3 1/2] Refactor ExecScan() to inline scan, filtering, and
 projection logic

This commit refactors ExecScan() by moving its tuple-fetching,
filtering, and projection logic into an inline-able function,
ExecScanExtended(), defined in src/include/executor/execScan.h.
ExecScanExtended() accepts parameters for EvalPlanQual state,
qualifiers (ExprState), and projection (ProjectionInfo).

Specialized variants of the execution function of a given Scan node
can then pass const-NULL for unused parameters.  This allows the
compiler to inline the logic and eliminate unnecessary branches or
checks. Each variant function thus contains only the necessary code,
optimizing execution for sequential scans where these features are
not needed.

Currently, only ExecSeqScan() is modified to take advantage of this
inline-ability.  Other Scan nodes might benefit from such specialized
variant functions but that is left as future work.

Benchmarks performed by Junwang Zhao and David Rowley show up to a 5%
reduction in execution time for queries that rely heavily on Seq
Scans. The most significant improvements were observed in scenarios
where EvalPlanQual, qualifiers, and projection were not required, but
other cases also benefit from reduced runtime overhead due to the
inlining and removal of unnecessary code paths.

The refactoring approach implemented here is based on a proposal by
David Rowley, significantly improving upon an earlier idea I (amitlan)
suggested.

Author: Amit Langote
Co-authored-by: David Rowley
Reviewed-by: Junwang Zhao
Reviewed-by: David Rowley
Tested-by: Junwang Zhao
Tested-by: David Rowley
Discussion: https://postgr.es/m/CA+HiwqGaH-otvqW_ce-paL=96JvU4j+Xbuk+14esJNDwefdkOg@mail.gmail.com
---
 src/backend/executor/execScan.c    | 207 ++----------------------
 src/backend/executor/nodeSeqscan.c | 115 +++++++++++++-
 src/include/executor/execScan.h    | 246 +++++++++++++++++++++++++++++
 3 files changed, 365 insertions(+), 203 deletions(-)
 create mode 100644 src/include/executor/execScan.h

diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 556a5d98e78..25a776a6a19 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -19,118 +19,9 @@
 #include "postgres.h"
 
 #include "executor/executor.h"
+#include "executor/execScan.h"
 #include "miscadmin.h"
 
-
-
-/*
- * ExecScanFetch -- check interrupts & fetch next potential tuple
- *
- * This routine is concerned with substituting a test tuple if we are
- * inside an EvalPlanQual recheck.  If we aren't, just execute
- * the access method's next-tuple routine.
- */
-static inline TupleTableSlot *
-ExecScanFetch(ScanState *node,
-			  ExecScanAccessMtd accessMtd,
-			  ExecScanRecheckMtd recheckMtd)
-{
-	EState	   *estate = node->ps.state;
-
-	CHECK_FOR_INTERRUPTS();
-
-	if (estate->es_epq_active != NULL)
-	{
-		EPQState   *epqstate = estate->es_epq_active;
-
-		/*
-		 * We are inside an EvalPlanQual recheck.  Return the test tuple if
-		 * one is available, after rechecking any access-method-specific
-		 * conditions.
-		 */
-		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
-
-		if (scanrelid == 0)
-		{
-			/*
-			 * This is a ForeignScan or CustomScan which has pushed down a
-			 * join to the remote side.  The recheck method is responsible not
-			 * only for rechecking the scan/join quals but also for storing
-			 * the correct tuple in the slot.
-			 */
-
-			TupleTableSlot *slot = node->ss_ScanTupleSlot;
-
-			if (!(*recheckMtd) (node, slot))
-				ExecClearTuple(slot);	/* would not be returned by scan */
-			return slot;
-		}
-		else if (epqstate->relsubs_done[scanrelid - 1])
-		{
-			/*
-			 * Return empty slot, as either there is no EPQ tuple for this rel
-			 * or we already returned it.
-			 */
-
-			TupleTableSlot *slot = node->ss_ScanTupleSlot;
-
-			return ExecClearTuple(slot);
-		}
-		else if (epqstate->relsubs_slot[scanrelid - 1] != NULL)
-		{
-			/*
-			 * Return replacement tuple provided by the EPQ caller.
-			 */
-
-			TupleTableSlot *slot = epqstate->relsubs_slot[scanrelid - 1];
-
-			Assert(epqstate->relsubs_rowmark[scanrelid - 1] == NULL);
-
-			/* Mark to remember that we shouldn't return it again */
-			epqstate->relsubs_done[scanrelid - 1] = true;
-
-			/* Return empty slot if we haven't got a test tuple */
-			if (TupIsNull(slot))
-				return NULL;
-
-			/* Check if it meets the access-method conditions */
-			if (!(*recheckMtd) (node, slot))
-				return ExecClearTuple(slot);	/* would not be returned by
-												 * scan */
-			return slot;
-		}
-		else if (epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
-		{
-			/*
-			 * Fetch and return replacement tuple using a non-locking rowmark.
-			 */
-
-			TupleTableSlot *slot = node->ss_ScanTupleSlot;
-
-			/* Mark to remember that we shouldn't return more */
-			epqstate->relsubs_done[scanrelid - 1] = true;
-
-			if (!EvalPlanQualFetchRowMark(epqstate, scanrelid, slot))
-				return NULL;
-
-			/* Return empty slot if we haven't got a test tuple */
-			if (TupIsNull(slot))
-				return NULL;
-
-			/* Check if it meets the access-method conditions */
-			if (!(*recheckMtd) (node, slot))
-				return ExecClearTuple(slot);	/* would not be returned by
-												 * scan */
-			return slot;
-		}
-	}
-
-	/*
-	 * Run the node-type-specific access method function to get the next tuple
-	 */
-	return (*accessMtd) (node);
-}
-
 /* ----------------------------------------------------------------
  *		ExecScan
  *
@@ -157,100 +48,20 @@ ExecScan(ScanState *node,
 		 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
 		 ExecScanRecheckMtd recheckMtd)
 {
-	ExprContext *econtext;
+	EPQState *epqstate;
 	ExprState  *qual;
 	ProjectionInfo *projInfo;
 
-	/*
-	 * Fetch data from node
-	 */
+	epqstate = node->ps.state->es_epq_active;
 	qual = node->ps.qual;
 	projInfo = node->ps.ps_ProjInfo;
-	econtext = node->ps.ps_ExprContext;
-
-	/* interrupt checks are in ExecScanFetch */
-
-	/*
-	 * If we have neither a qual to check nor a projection to do, just skip
-	 * all the overhead and return the raw scan tuple.
-	 */
-	if (!qual && !projInfo)
-	{
-		ResetExprContext(econtext);
-		return ExecScanFetch(node, accessMtd, recheckMtd);
-	}
-
-	/*
-	 * Reset per-tuple memory context to free any expression evaluation
-	 * storage allocated in the previous tuple cycle.
-	 */
-	ResetExprContext(econtext);
-
-	/*
-	 * get a tuple from the access method.  Loop until we obtain a tuple that
-	 * passes the qualification.
-	 */
-	for (;;)
-	{
-		TupleTableSlot *slot;
 
-		slot = ExecScanFetch(node, accessMtd, recheckMtd);
-
-		/*
-		 * if the slot returned by the accessMtd contains NULL, then it means
-		 * there is nothing more to scan so we just return an empty slot,
-		 * being careful to use the projection result slot so it has correct
-		 * tupleDesc.
-		 */
-		if (TupIsNull(slot))
-		{
-			if (projInfo)
-				return ExecClearTuple(projInfo->pi_state.resultslot);
-			else
-				return slot;
-		}
-
-		/*
-		 * place the current tuple into the expr context
-		 */
-		econtext->ecxt_scantuple = slot;
-
-		/*
-		 * check that the current tuple satisfies the qual-clause
-		 *
-		 * check for non-null qual here to avoid a function call to ExecQual()
-		 * when the qual is null ... saves only a few cycles, but they add up
-		 * ...
-		 */
-		if (qual == NULL || ExecQual(qual, econtext))
-		{
-			/*
-			 * Found a satisfactory scan tuple.
-			 */
-			if (projInfo)
-			{
-				/*
-				 * Form a projection tuple, store it in the result tuple slot
-				 * and return it.
-				 */
-				return ExecProject(projInfo);
-			}
-			else
-			{
-				/*
-				 * Here, we aren't projecting, so just return scan tuple.
-				 */
-				return slot;
-			}
-		}
-		else
-			InstrCountFiltered1(node, 1);
-
-		/*
-		 * Tuple fails qual, so free per-tuple memory and try again.
-		 */
-		ResetExprContext(econtext);
-	}
+	return ExecScanExtended(node,
+							accessMtd,
+							recheckMtd,
+							epqstate,
+							qual,
+							projInfo);
 }
 
 /*
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index fa2d522b25f..6f9e991eeae 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -29,6 +29,7 @@
 
 #include "access/relscan.h"
 #include "access/tableam.h"
+#include "executor/execScan.h"
 #include "executor/executor.h"
 #include "executor/nodeSeqscan.h"
 #include "utils/rel.h"
@@ -99,9 +100,10 @@ SeqRecheck(SeqScanState *node, TupleTableSlot *slot)
  *		ExecSeqScan(node)
  *
  *		Scans the relation sequentially and returns the next qualifying
- *		tuple.
- *		We call the ExecScan() routine and pass it the appropriate
- *		access method functions.
+ *		tuple. This variant is used when there is no es_eqp_active, no qual
+ *		and no projection.  Passing const-NULLs for these to ExecScanExtended
+ *		allows the compiler to eliminate the additional code that would
+ *		ordinarily be required for the evaluation of these.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -109,12 +111,94 @@ ExecSeqScan(PlanState *pstate)
 {
 	SeqScanState *node = castNode(SeqScanState, pstate);
 
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual == NULL);
+	Assert(pstate->ps_ProjInfo == NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							NULL,
+							NULL);
+}
+
+/*
+ * Variant of ExecSeqScan() but when qual evaluation is required.
+ */
+static TupleTableSlot *
+ExecSeqScanWithQual(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual != NULL);
+	Assert(pstate->ps_ProjInfo == NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							pstate->qual,
+							NULL);
+}
+
+/*
+ * Variant of ExecSeqScan() but when projection is required.
+ */
+static TupleTableSlot *
+ExecSeqScanWithProject(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual == NULL);
+	Assert(pstate->ps_ProjInfo != NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							NULL,
+							pstate->ps_ProjInfo);
+}
+
+/*
+ * Variant of ExecSeqScan() but when qual evaluation and projection are
+ * required.
+ */
+static TupleTableSlot *
+ExecSeqScanWithQualProject(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual != NULL);
+	Assert(pstate->ps_ProjInfo != NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							pstate->qual,
+							pstate->ps_ProjInfo);
+}
+
+/*
+ * Variant of ExecSeqScan for when EPQ evaluation is required.  We don't
+ * bother adding variants of this for with/without qual and projection as
+ * EPQ doesn't seem as exciting a case to optimize for.
+ */
+static TupleTableSlot *
+ExecSeqScanEPQ(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
 	return ExecScan(&node->ss,
 					(ExecScanAccessMtd) SeqNext,
 					(ExecScanRecheckMtd) SeqRecheck);
 }
 
-
 /* ----------------------------------------------------------------
  *		ExecInitSeqScan
  * ----------------------------------------------------------------
@@ -137,7 +221,6 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate = makeNode(SeqScanState);
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
-	scanstate->ss.ps.ExecProcNode = ExecSeqScan;
 
 	/*
 	 * Miscellaneous initialization
@@ -171,6 +254,28 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.qual =
 		ExecInitQual(node->scan.plan.qual, (PlanState *) scanstate);
 
+	/*
+	 * When EvalPlanQual() is not in use, assign ExecProcNode for this node
+	 * based on the presence of qual and projection. Each ExecSeqScan*()
+	 * variant is optimized for the specific combination of these conditions.
+	 */
+	if (scanstate->ss.ps.state->es_epq_active != NULL)
+		scanstate->ss.ps.ExecProcNode = ExecSeqScanEPQ;
+	else if (scanstate->ss.ps.qual == NULL)
+	{
+		if (scanstate->ss.ps.ps_ProjInfo == NULL)
+			scanstate->ss.ps.ExecProcNode = ExecSeqScan;
+		else
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanWithProject;
+	}
+	else
+	{
+		if (scanstate->ss.ps.ps_ProjInfo == NULL)
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanWithQual;
+		else
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanWithQualProject;
+	}
+
 	return scanstate;
 }
 
diff --git a/src/include/executor/execScan.h b/src/include/executor/execScan.h
new file mode 100644
index 00000000000..da8e5ab8a76
--- /dev/null
+++ b/src/include/executor/execScan.h
@@ -0,0 +1,246 @@
+/*-------------------------------------------------------------------------
+ * execScan.h
+ *		Inline-able support functions for Scan nodes
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *		src/include/executor/execScan.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef EXECSCAN_H
+#define EXECSCAN_H
+
+#include "miscadmin.h"
+#include "executor/executor.h"
+#include "nodes/execnodes.h"
+
+/*
+ * ExecScanFetch -- check interrupts & fetch next potential tuple
+ *
+ * This routine substitutes a test tuple if inside an EvalPlanQual recheck.
+ * Otherwise, it simply executes the access method's next-tuple routine.
+ *
+ * The pg_attribute_always_inline attribute allows the compiler to inline
+ * this function into its caller. When EPQState is NULL, the EvalPlanQual
+ * logic is completely eliminated at compile time, avoiding unnecessary
+ * run-time checks and code for cases where EPQ is not required.
+ */
+static pg_attribute_always_inline TupleTableSlot *
+ExecScanFetch(ScanState *node,
+			  EPQState *epqstate,
+			  ExecScanAccessMtd accessMtd,
+			  ExecScanRecheckMtd recheckMtd)
+{
+	CHECK_FOR_INTERRUPTS();
+
+	if (epqstate != NULL)
+	{
+		/*
+		 * We are inside an EvalPlanQual recheck.  Return the test tuple if
+		 * one is available, after rechecking any access-method-specific
+		 * conditions.
+		 */
+		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
+
+		if (scanrelid == 0)
+		{
+			/*
+			 * This is a ForeignScan or CustomScan which has pushed down a
+			 * join to the remote side.  The recheck method is responsible not
+			 * only for rechecking the scan/join quals but also for storing
+			 * the correct tuple in the slot.
+			 */
+
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			if (!(*recheckMtd) (node, slot))
+				ExecClearTuple(slot);	/* would not be returned by scan */
+			return slot;
+		}
+		else if (epqstate->relsubs_done[scanrelid - 1])
+		{
+			/*
+			 * Return empty slot, as either there is no EPQ tuple for this rel
+			 * or we already returned it.
+			 */
+
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			return ExecClearTuple(slot);
+		}
+		else if (epqstate->relsubs_slot[scanrelid - 1] != NULL)
+		{
+			/*
+			 * Return replacement tuple provided by the EPQ caller.
+			 */
+
+			TupleTableSlot *slot = epqstate->relsubs_slot[scanrelid - 1];
+
+			Assert(epqstate->relsubs_rowmark[scanrelid - 1] == NULL);
+
+			/* Mark to remember that we shouldn't return it again */
+			epqstate->relsubs_done[scanrelid - 1] = true;
+
+			/* Return empty slot if we haven't got a test tuple */
+			if (TupIsNull(slot))
+				return NULL;
+
+			/* Check if it meets the access-method conditions */
+			if (!(*recheckMtd) (node, slot))
+				return ExecClearTuple(slot);	/* would not be returned by
+												 * scan */
+			return slot;
+		}
+		else if (epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/*
+			 * Fetch and return replacement tuple using a non-locking rowmark.
+			 */
+
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			/* Mark to remember that we shouldn't return more */
+			epqstate->relsubs_done[scanrelid - 1] = true;
+
+			if (!EvalPlanQualFetchRowMark(epqstate, scanrelid, slot))
+				return NULL;
+
+			/* Return empty slot if we haven't got a test tuple */
+			if (TupIsNull(slot))
+				return NULL;
+
+			/* Check if it meets the access-method conditions */
+			if (!(*recheckMtd) (node, slot))
+				return ExecClearTuple(slot);	/* would not be returned by
+												 * scan */
+			return slot;
+		}
+	}
+
+	/*
+	 * Run the node-type-specific access method function to get the next tuple
+	 */
+	return (*accessMtd) (node);
+}
+
+/* ----------------------------------------------------------------
+ * ExecScanExtended
+ *		Scans the relation using the given 'access method' and returns
+ *		the next qualifying tuple. The tuple is optionally checked
+ *		against 'qual' and, if provided, projected using 'projInfo'.
+ *
+ * The 'recheck method' validates an arbitrary tuple of the relation
+ * against conditions enforced by the access method.
+ *
+ * This function is an alternative to ExecScan, used when callers
+ * may omit 'qual' or 'projInfo'. The pg_attribute_always_inline
+ * attribute allows the compiler to eliminate non-relevant branches
+ * at compile time, avoiding run-time checks in those cases.
+ *
+ * Conditions:
+ *	-- The AMI "cursor" is positioned at the previously returned tuple.
+ *
+ * Initial States:
+ *	-- The relation is opened for scanning, with the "cursor"
+ *	positioned before the first qualifying tuple.
+ * ----------------------------------------------------------------
+ */
+static pg_attribute_always_inline TupleTableSlot *
+ExecScanExtended(ScanState *node,
+				 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
+				 ExecScanRecheckMtd recheckMtd,
+				 EPQState *epqstate,
+				 ExprState *qual,
+				 ProjectionInfo *projInfo)
+{
+	ExprContext *econtext = node->ps.ps_ExprContext;
+
+	/* interrupt checks are in ExecScanFetch */
+
+	/*
+	 * If we have neither a qual to check nor a projection to do, just skip
+	 * all the overhead and return the raw scan tuple.
+	 */
+	if (!qual && !projInfo)
+	{
+		ResetExprContext(econtext);
+		return ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
+	}
+
+	/*
+	 * Reset per-tuple memory context to free any expression evaluation
+	 * storage allocated in the previous tuple cycle.
+	 */
+	ResetExprContext(econtext);
+
+	/*
+	 * get a tuple from the access method.  Loop until we obtain a tuple that
+	 * passes the qualification.
+	 */
+	for (;;)
+	{
+		TupleTableSlot *slot;
+
+		slot = ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
+
+		/*
+		 * if the slot returned by the accessMtd contains NULL, then it means
+		 * there is nothing more to scan so we just return an empty slot,
+		 * being careful to use the projection result slot so it has correct
+		 * tupleDesc.
+		 */
+		if (TupIsNull(slot))
+		{
+			if (projInfo)
+				return ExecClearTuple(projInfo->pi_state.resultslot);
+			else
+				return slot;
+		}
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-null qual here to avoid a function call to ExecQual()
+		 * when the qual is null ... saves only a few cycles, but they add up
+		 * ...
+		 */
+		if (qual == NULL || ExecQual(qual, econtext))
+		{
+			/*
+			 * Found a satisfactory scan tuple.
+			 */
+			if (projInfo)
+			{
+				/*
+				 * Form a projection tuple, store it in the result tuple slot
+				 * and return it.
+				 */
+				return ExecProject(projInfo);
+			}
+			else
+			{
+				/*
+				 * Here, we aren't projecting, so just return scan tuple.
+				 */
+				return slot;
+			}
+		}
+		else
+			InstrCountFiltered1(node, 1);
+
+		/*
+		 * Tuple fails qual, so free per-tuple memory and try again.
+		 */
+		ResetExprContext(econtext);
+	}
+}
+
+#endif							/* EXECSCAN_H */
-- 
2.43.0

Vladlen Popolitov

v.popolitov@postgrespro.ru

about 1 year ago

In reply to: Amit Langote (#5)

Re: Some ExecSeqScan optimizations

Amit Langote писал(а) 2025-01-10 16:22:

On Fri, Jan 10, 2025 at 1:06 PM David Rowley <dgrowleyml@gmail.com>
wrote:

On Fri, 10 Jan 2025 at 02:46, Amit Langote <amitlangote09@gmail.com>
wrote:

On Mon, Jan 6, 2025 at 10:18 PM David Rowley <dgrowleyml@gmail.com> wrote:

I've attached my workings of what I was messing around with. It seems
to perform about the same as your version. I think maybe we'd need
some sort of execScan.h instead of where I've stuffed the functions
in.

I've done that in the attached v2.

I think 0001 looks ok, aside from what the attached fixes. (at least
one is my mistake)

Oops, thanks for the fixes. Attaching an updated version.

Did you test the performance of 0002? I didn't look at it hard enough
to understand what you've done.

I reran the test suite I used before and I don't see a consistent
improvement due to 0002 or perhaps rather degradation. I've saved the
results in the sheet named 2025-01-10 in the spreadsheet at [1].

Comparing the latency for the query `select count(*) from test_table
where <first_column> = <nonexistant_value>` (where test_table has 30
integer columns and 1 million rows in it) between v17, master, and the
patched (0001 or 0001+0002) shows an improvement of close to 10% with
the patch.

-- v17
select count(*) from test_table where a_1 = 1000000;
count
-------
0
(1 row)
Time: 286.618 ms

-- master
select count(*) from test_table where a_1 = 1000000;
count
-------
0
(1 row)
Time: 283.564 ms

-- patched (0001+0002)
select count(*) from test_table where a_1 = 1000000;
count
-------
0
(1 row)
Time: 260.547 ms

Note that I turned off Gather for these tests, because then I find
that the improvements to ExecScan() are better measurable.

I can look if performance tests show
that it might be worthwhile considering.

Sure, that would be great.

Could you clarify, how do you get this improvements (283 ms to 260 ms)
in this patch?
I see additional code ( if ... else if ... else if ...) and the same
function declared
as inline, but it is called by pointer as before, and it does not
matter, that it is
declared as inline.

In case of query
select count(*) from test_table where a_1 = 1000000;
I would expect increase of query time due to additional if...else . It
is not clear
what code was eliminated to decrease query time.

--
Best regards,

Vladlen Popolitov.

David Rowley

dgrowleyml@gmail.com

about 1 year ago

In reply to: Vladlen Popolitov (#6)

Re: Some ExecSeqScan optimizations

On Fri, 10 Jan 2025 at 22:53, Vladlen Popolitov
<v.popolitov@postgrespro.ru> wrote:

In case of query
select count(*) from test_table where a_1 = 1000000;
I would expect increase of query time due to additional if...else . It
is not clear
what code was eliminated to decrease query time.

Are you talking about the code added to ExecInitSeqScan() to determine
which node function to call? If so, that's only called during executor
startup. The idea here is to reduce the branching during execution by
calling one of those special functions which has a more specialised
version of the ExecScan code for the particular purpose it's going to
be used for.

David

David Rowley

dgrowleyml@gmail.com

about 1 year ago

In reply to: Amit Langote (#5)

Re: Some ExecSeqScan optimizations

On Fri, 10 Jan 2025 at 22:22, Amit Langote <amitlangote09@gmail.com> wrote:

On Fri, Jan 10, 2025 at 1:06 PM David Rowley <dgrowleyml@gmail.com> wrote:

I can look if performance tests show
that it might be worthwhile considering.

Sure, that would be great.

What I wanted to know was if 0002 shows any additional gains over just
0001. If there isn't any, I didn't see the point in looking at it.

David

Amit Langote

amitlangote09@gmail.com

about 1 year ago

In reply to: David Rowley (#7)

1 attachment(s)

Re: Some ExecSeqScan optimizations

On Fri, Jan 10, 2025 at 7:36 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 10 Jan 2025 at 22:53, Vladlen Popolitov
<v.popolitov@postgrespro.ru> wrote:

In case of query
select count(*) from test_table where a_1 = 1000000;
I would expect increase of query time due to additional if...else . It
is not clear
what code was eliminated to decrease query time.

Are you talking about the code added to ExecInitSeqScan() to determine
which node function to call? If so, that's only called during executor
startup. The idea here is to reduce the branching during execution by
calling one of those special functions which has a more specialised
version of the ExecScan code for the particular purpose it's going to
be used for.

Looks like I hadn't mentioned this key aspect of the patch in the
commit message, so did that in the attached.

Vladlen, does what David wrote and the new commit message answer your
question(s)?

--
Thanks, Amit Langote

Attachments:

v4-0001-Refactor-ExecScan-to-inline-scan-filtering-and-pr.patchapplication/octet-stream; name=v4-0001-Refactor-ExecScan-to-inline-scan-filtering-and-pr.patchDownload

From 2297c8b6a4699cea245671938dd37737ef8a70db Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 10 Jan 2025 20:19:51 +0900
Subject: [PATCH v4] Refactor ExecScan() to inline scan, filtering, and
 projection logic

This commit refactors ExecScan() by moving its tuple-fetching,
filtering, and projection logic into an inline-able function,
ExecScanExtended(), defined in src/include/executor/execScan.h.
ExecScanExtended() accepts parameters for EvalPlanQual state,
qualifiers (ExprState), and projection (ProjectionInfo).

Specialized variants of the execution function of a given Scan node
can then pass const-NULL for unused parameters.  This allows the
compiler to inline the logic and eliminate unnecessary branches or
checks. Each variant function thus contains only the necessary code,
optimizing execution for sequential scans where these features are
not needed.

The variant function to be used is determined during ExecInit*() of
the node and assigned to the ExecProcNode function pointer in the
node's PlanState, effectively turning runtime checks and conditional
branches on the NULLness of epqstate, qual, and projInfo into static
ones.

Currently, only ExecSeqScan() is modified to take advantage of this
inline-ability.  Other Scan nodes might benefit from such specialized
variant functions but that is left as future work.

Benchmarks performed by Junwang Zhao and David Rowley show up to a 5%
reduction in execution time for queries that rely heavily on Seq
Scans. The most significant improvements were observed in scenarios
where EvalPlanQual, qualifiers, and projection were not required, but
other cases also benefit from reduced runtime overhead due to the
inlining and removal of unnecessary code paths.

The refactoring approach implemented here is based on a proposal by
David Rowley, significantly improving upon an earlier idea I (amitlan)
suggested.

Author: Amit Langote
Co-authored-by: David Rowley
Reviewed-by: Junwang Zhao
Reviewed-by: David Rowley
Tested-by: Junwang Zhao
Tested-by: David Rowley
Discussion: https://postgr.es/m/CA+HiwqGaH-otvqW_ce-paL=96JvU4j+Xbuk+14esJNDwefdkOg@mail.gmail.com
---
 src/backend/executor/execScan.c    | 207 ++----------------------
 src/backend/executor/nodeSeqscan.c | 115 +++++++++++++-
 src/include/executor/execScan.h    | 246 +++++++++++++++++++++++++++++
 3 files changed, 365 insertions(+), 203 deletions(-)
 create mode 100644 src/include/executor/execScan.h

diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 556a5d98e78..25a776a6a19 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -19,118 +19,9 @@
 #include "postgres.h"
 
 #include "executor/executor.h"
+#include "executor/execScan.h"
 #include "miscadmin.h"
 
-
-
-/*
- * ExecScanFetch -- check interrupts & fetch next potential tuple
- *
- * This routine is concerned with substituting a test tuple if we are
- * inside an EvalPlanQual recheck.  If we aren't, just execute
- * the access method's next-tuple routine.
- */
-static inline TupleTableSlot *
-ExecScanFetch(ScanState *node,
-			  ExecScanAccessMtd accessMtd,
-			  ExecScanRecheckMtd recheckMtd)
-{
-	EState	   *estate = node->ps.state;
-
-	CHECK_FOR_INTERRUPTS();
-
-	if (estate->es_epq_active != NULL)
-	{
-		EPQState   *epqstate = estate->es_epq_active;
-
-		/*
-		 * We are inside an EvalPlanQual recheck.  Return the test tuple if
-		 * one is available, after rechecking any access-method-specific
-		 * conditions.
-		 */
-		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
-
-		if (scanrelid == 0)
-		{
-			/*
-			 * This is a ForeignScan or CustomScan which has pushed down a
-			 * join to the remote side.  The recheck method is responsible not
-			 * only for rechecking the scan/join quals but also for storing
-			 * the correct tuple in the slot.
-			 */
-
-			TupleTableSlot *slot = node->ss_ScanTupleSlot;
-
-			if (!(*recheckMtd) (node, slot))
-				ExecClearTuple(slot);	/* would not be returned by scan */
-			return slot;
-		}
-		else if (epqstate->relsubs_done[scanrelid - 1])
-		{
-			/*
-			 * Return empty slot, as either there is no EPQ tuple for this rel
-			 * or we already returned it.
-			 */
-
-			TupleTableSlot *slot = node->ss_ScanTupleSlot;
-
-			return ExecClearTuple(slot);
-		}
-		else if (epqstate->relsubs_slot[scanrelid - 1] != NULL)
-		{
-			/*
-			 * Return replacement tuple provided by the EPQ caller.
-			 */
-
-			TupleTableSlot *slot = epqstate->relsubs_slot[scanrelid - 1];
-
-			Assert(epqstate->relsubs_rowmark[scanrelid - 1] == NULL);
-
-			/* Mark to remember that we shouldn't return it again */
-			epqstate->relsubs_done[scanrelid - 1] = true;
-
-			/* Return empty slot if we haven't got a test tuple */
-			if (TupIsNull(slot))
-				return NULL;
-
-			/* Check if it meets the access-method conditions */
-			if (!(*recheckMtd) (node, slot))
-				return ExecClearTuple(slot);	/* would not be returned by
-												 * scan */
-			return slot;
-		}
-		else if (epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
-		{
-			/*
-			 * Fetch and return replacement tuple using a non-locking rowmark.
-			 */
-
-			TupleTableSlot *slot = node->ss_ScanTupleSlot;
-
-			/* Mark to remember that we shouldn't return more */
-			epqstate->relsubs_done[scanrelid - 1] = true;
-
-			if (!EvalPlanQualFetchRowMark(epqstate, scanrelid, slot))
-				return NULL;
-
-			/* Return empty slot if we haven't got a test tuple */
-			if (TupIsNull(slot))
-				return NULL;
-
-			/* Check if it meets the access-method conditions */
-			if (!(*recheckMtd) (node, slot))
-				return ExecClearTuple(slot);	/* would not be returned by
-												 * scan */
-			return slot;
-		}
-	}
-
-	/*
-	 * Run the node-type-specific access method function to get the next tuple
-	 */
-	return (*accessMtd) (node);
-}
-
 /* ----------------------------------------------------------------
  *		ExecScan
  *
@@ -157,100 +48,20 @@ ExecScan(ScanState *node,
 		 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
 		 ExecScanRecheckMtd recheckMtd)
 {
-	ExprContext *econtext;
+	EPQState *epqstate;
 	ExprState  *qual;
 	ProjectionInfo *projInfo;
 
-	/*
-	 * Fetch data from node
-	 */
+	epqstate = node->ps.state->es_epq_active;
 	qual = node->ps.qual;
 	projInfo = node->ps.ps_ProjInfo;
-	econtext = node->ps.ps_ExprContext;
-
-	/* interrupt checks are in ExecScanFetch */
-
-	/*
-	 * If we have neither a qual to check nor a projection to do, just skip
-	 * all the overhead and return the raw scan tuple.
-	 */
-	if (!qual && !projInfo)
-	{
-		ResetExprContext(econtext);
-		return ExecScanFetch(node, accessMtd, recheckMtd);
-	}
-
-	/*
-	 * Reset per-tuple memory context to free any expression evaluation
-	 * storage allocated in the previous tuple cycle.
-	 */
-	ResetExprContext(econtext);
-
-	/*
-	 * get a tuple from the access method.  Loop until we obtain a tuple that
-	 * passes the qualification.
-	 */
-	for (;;)
-	{
-		TupleTableSlot *slot;
 
-		slot = ExecScanFetch(node, accessMtd, recheckMtd);
-
-		/*
-		 * if the slot returned by the accessMtd contains NULL, then it means
-		 * there is nothing more to scan so we just return an empty slot,
-		 * being careful to use the projection result slot so it has correct
-		 * tupleDesc.
-		 */
-		if (TupIsNull(slot))
-		{
-			if (projInfo)
-				return ExecClearTuple(projInfo->pi_state.resultslot);
-			else
-				return slot;
-		}
-
-		/*
-		 * place the current tuple into the expr context
-		 */
-		econtext->ecxt_scantuple = slot;
-
-		/*
-		 * check that the current tuple satisfies the qual-clause
-		 *
-		 * check for non-null qual here to avoid a function call to ExecQual()
-		 * when the qual is null ... saves only a few cycles, but they add up
-		 * ...
-		 */
-		if (qual == NULL || ExecQual(qual, econtext))
-		{
-			/*
-			 * Found a satisfactory scan tuple.
-			 */
-			if (projInfo)
-			{
-				/*
-				 * Form a projection tuple, store it in the result tuple slot
-				 * and return it.
-				 */
-				return ExecProject(projInfo);
-			}
-			else
-			{
-				/*
-				 * Here, we aren't projecting, so just return scan tuple.
-				 */
-				return slot;
-			}
-		}
-		else
-			InstrCountFiltered1(node, 1);
-
-		/*
-		 * Tuple fails qual, so free per-tuple memory and try again.
-		 */
-		ResetExprContext(econtext);
-	}
+	return ExecScanExtended(node,
+							accessMtd,
+							recheckMtd,
+							epqstate,
+							qual,
+							projInfo);
 }
 
 /*
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index fa2d522b25f..6f9e991eeae 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -29,6 +29,7 @@
 
 #include "access/relscan.h"
 #include "access/tableam.h"
+#include "executor/execScan.h"
 #include "executor/executor.h"
 #include "executor/nodeSeqscan.h"
 #include "utils/rel.h"
@@ -99,9 +100,10 @@ SeqRecheck(SeqScanState *node, TupleTableSlot *slot)
  *		ExecSeqScan(node)
  *
  *		Scans the relation sequentially and returns the next qualifying
- *		tuple.
- *		We call the ExecScan() routine and pass it the appropriate
- *		access method functions.
+ *		tuple. This variant is used when there is no es_eqp_active, no qual
+ *		and no projection.  Passing const-NULLs for these to ExecScanExtended
+ *		allows the compiler to eliminate the additional code that would
+ *		ordinarily be required for the evaluation of these.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -109,12 +111,94 @@ ExecSeqScan(PlanState *pstate)
 {
 	SeqScanState *node = castNode(SeqScanState, pstate);
 
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual == NULL);
+	Assert(pstate->ps_ProjInfo == NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							NULL,
+							NULL);
+}
+
+/*
+ * Variant of ExecSeqScan() but when qual evaluation is required.
+ */
+static TupleTableSlot *
+ExecSeqScanWithQual(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual != NULL);
+	Assert(pstate->ps_ProjInfo == NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							pstate->qual,
+							NULL);
+}
+
+/*
+ * Variant of ExecSeqScan() but when projection is required.
+ */
+static TupleTableSlot *
+ExecSeqScanWithProject(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual == NULL);
+	Assert(pstate->ps_ProjInfo != NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							NULL,
+							pstate->ps_ProjInfo);
+}
+
+/*
+ * Variant of ExecSeqScan() but when qual evaluation and projection are
+ * required.
+ */
+static TupleTableSlot *
+ExecSeqScanWithQualProject(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual != NULL);
+	Assert(pstate->ps_ProjInfo != NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							pstate->qual,
+							pstate->ps_ProjInfo);
+}
+
+/*
+ * Variant of ExecSeqScan for when EPQ evaluation is required.  We don't
+ * bother adding variants of this for with/without qual and projection as
+ * EPQ doesn't seem as exciting a case to optimize for.
+ */
+static TupleTableSlot *
+ExecSeqScanEPQ(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
 	return ExecScan(&node->ss,
 					(ExecScanAccessMtd) SeqNext,
 					(ExecScanRecheckMtd) SeqRecheck);
 }
 
-
 /* ----------------------------------------------------------------
  *		ExecInitSeqScan
  * ----------------------------------------------------------------
@@ -137,7 +221,6 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate = makeNode(SeqScanState);
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
-	scanstate->ss.ps.ExecProcNode = ExecSeqScan;
 
 	/*
 	 * Miscellaneous initialization
@@ -171,6 +254,28 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.qual =
 		ExecInitQual(node->scan.plan.qual, (PlanState *) scanstate);
 
+	/*
+	 * When EvalPlanQual() is not in use, assign ExecProcNode for this node
+	 * based on the presence of qual and projection. Each ExecSeqScan*()
+	 * variant is optimized for the specific combination of these conditions.
+	 */
+	if (scanstate->ss.ps.state->es_epq_active != NULL)
+		scanstate->ss.ps.ExecProcNode = ExecSeqScanEPQ;
+	else if (scanstate->ss.ps.qual == NULL)
+	{
+		if (scanstate->ss.ps.ps_ProjInfo == NULL)
+			scanstate->ss.ps.ExecProcNode = ExecSeqScan;
+		else
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanWithProject;
+	}
+	else
+	{
+		if (scanstate->ss.ps.ps_ProjInfo == NULL)
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanWithQual;
+		else
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanWithQualProject;
+	}
+
 	return scanstate;
 }
 
diff --git a/src/include/executor/execScan.h b/src/include/executor/execScan.h
new file mode 100644
index 00000000000..da8e5ab8a76
--- /dev/null
+++ b/src/include/executor/execScan.h
@@ -0,0 +1,246 @@
+/*-------------------------------------------------------------------------
+ * execScan.h
+ *		Inline-able support functions for Scan nodes
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *		src/include/executor/execScan.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef EXECSCAN_H
+#define EXECSCAN_H
+
+#include "miscadmin.h"
+#include "executor/executor.h"
+#include "nodes/execnodes.h"
+
+/*
+ * ExecScanFetch -- check interrupts & fetch next potential tuple
+ *
+ * This routine substitutes a test tuple if inside an EvalPlanQual recheck.
+ * Otherwise, it simply executes the access method's next-tuple routine.
+ *
+ * The pg_attribute_always_inline attribute allows the compiler to inline
+ * this function into its caller. When EPQState is NULL, the EvalPlanQual
+ * logic is completely eliminated at compile time, avoiding unnecessary
+ * run-time checks and code for cases where EPQ is not required.
+ */
+static pg_attribute_always_inline TupleTableSlot *
+ExecScanFetch(ScanState *node,
+			  EPQState *epqstate,
+			  ExecScanAccessMtd accessMtd,
+			  ExecScanRecheckMtd recheckMtd)
+{
+	CHECK_FOR_INTERRUPTS();
+
+	if (epqstate != NULL)
+	{
+		/*
+		 * We are inside an EvalPlanQual recheck.  Return the test tuple if
+		 * one is available, after rechecking any access-method-specific
+		 * conditions.
+		 */
+		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
+
+		if (scanrelid == 0)
+		{
+			/*
+			 * This is a ForeignScan or CustomScan which has pushed down a
+			 * join to the remote side.  The recheck method is responsible not
+			 * only for rechecking the scan/join quals but also for storing
+			 * the correct tuple in the slot.
+			 */
+
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			if (!(*recheckMtd) (node, slot))
+				ExecClearTuple(slot);	/* would not be returned by scan */
+			return slot;
+		}
+		else if (epqstate->relsubs_done[scanrelid - 1])
+		{
+			/*
+			 * Return empty slot, as either there is no EPQ tuple for this rel
+			 * or we already returned it.
+			 */
+
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			return ExecClearTuple(slot);
+		}
+		else if (epqstate->relsubs_slot[scanrelid - 1] != NULL)
+		{
+			/*
+			 * Return replacement tuple provided by the EPQ caller.
+			 */
+
+			TupleTableSlot *slot = epqstate->relsubs_slot[scanrelid - 1];
+
+			Assert(epqstate->relsubs_rowmark[scanrelid - 1] == NULL);
+
+			/* Mark to remember that we shouldn't return it again */
+			epqstate->relsubs_done[scanrelid - 1] = true;
+
+			/* Return empty slot if we haven't got a test tuple */
+			if (TupIsNull(slot))
+				return NULL;
+
+			/* Check if it meets the access-method conditions */
+			if (!(*recheckMtd) (node, slot))
+				return ExecClearTuple(slot);	/* would not be returned by
+												 * scan */
+			return slot;
+		}
+		else if (epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/*
+			 * Fetch and return replacement tuple using a non-locking rowmark.
+			 */
+
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			/* Mark to remember that we shouldn't return more */
+			epqstate->relsubs_done[scanrelid - 1] = true;
+
+			if (!EvalPlanQualFetchRowMark(epqstate, scanrelid, slot))
+				return NULL;
+
+			/* Return empty slot if we haven't got a test tuple */
+			if (TupIsNull(slot))
+				return NULL;
+
+			/* Check if it meets the access-method conditions */
+			if (!(*recheckMtd) (node, slot))
+				return ExecClearTuple(slot);	/* would not be returned by
+												 * scan */
+			return slot;
+		}
+	}
+
+	/*
+	 * Run the node-type-specific access method function to get the next tuple
+	 */
+	return (*accessMtd) (node);
+}
+
+/* ----------------------------------------------------------------
+ * ExecScanExtended
+ *		Scans the relation using the given 'access method' and returns
+ *		the next qualifying tuple. The tuple is optionally checked
+ *		against 'qual' and, if provided, projected using 'projInfo'.
+ *
+ * The 'recheck method' validates an arbitrary tuple of the relation
+ * against conditions enforced by the access method.
+ *
+ * This function is an alternative to ExecScan, used when callers
+ * may omit 'qual' or 'projInfo'. The pg_attribute_always_inline
+ * attribute allows the compiler to eliminate non-relevant branches
+ * at compile time, avoiding run-time checks in those cases.
+ *
+ * Conditions:
+ *	-- The AMI "cursor" is positioned at the previously returned tuple.
+ *
+ * Initial States:
+ *	-- The relation is opened for scanning, with the "cursor"
+ *	positioned before the first qualifying tuple.
+ * ----------------------------------------------------------------
+ */
+static pg_attribute_always_inline TupleTableSlot *
+ExecScanExtended(ScanState *node,
+				 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
+				 ExecScanRecheckMtd recheckMtd,
+				 EPQState *epqstate,
+				 ExprState *qual,
+				 ProjectionInfo *projInfo)
+{
+	ExprContext *econtext = node->ps.ps_ExprContext;
+
+	/* interrupt checks are in ExecScanFetch */
+
+	/*
+	 * If we have neither a qual to check nor a projection to do, just skip
+	 * all the overhead and return the raw scan tuple.
+	 */
+	if (!qual && !projInfo)
+	{
+		ResetExprContext(econtext);
+		return ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
+	}
+
+	/*
+	 * Reset per-tuple memory context to free any expression evaluation
+	 * storage allocated in the previous tuple cycle.
+	 */
+	ResetExprContext(econtext);
+
+	/*
+	 * get a tuple from the access method.  Loop until we obtain a tuple that
+	 * passes the qualification.
+	 */
+	for (;;)
+	{
+		TupleTableSlot *slot;
+
+		slot = ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
+
+		/*
+		 * if the slot returned by the accessMtd contains NULL, then it means
+		 * there is nothing more to scan so we just return an empty slot,
+		 * being careful to use the projection result slot so it has correct
+		 * tupleDesc.
+		 */
+		if (TupIsNull(slot))
+		{
+			if (projInfo)
+				return ExecClearTuple(projInfo->pi_state.resultslot);
+			else
+				return slot;
+		}
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-null qual here to avoid a function call to ExecQual()
+		 * when the qual is null ... saves only a few cycles, but they add up
+		 * ...
+		 */
+		if (qual == NULL || ExecQual(qual, econtext))
+		{
+			/*
+			 * Found a satisfactory scan tuple.
+			 */
+			if (projInfo)
+			{
+				/*
+				 * Form a projection tuple, store it in the result tuple slot
+				 * and return it.
+				 */
+				return ExecProject(projInfo);
+			}
+			else
+			{
+				/*
+				 * Here, we aren't projecting, so just return scan tuple.
+				 */
+				return slot;
+			}
+		}
+		else
+			InstrCountFiltered1(node, 1);
+
+		/*
+		 * Tuple fails qual, so free per-tuple memory and try again.
+		 */
+		ResetExprContext(econtext);
+	}
+}
+
+#endif							/* EXECSCAN_H */
-- 
2.43.0

#10

Vladlen Popolitov

v.popolitov@postgrespro.ru

about 1 year ago

In reply to: Amit Langote (#9)

Re: Some ExecSeqScan optimizations

Amit Langote писал(а) 2025-01-10 18:22:

On Fri, Jan 10, 2025 at 7:36 PM David Rowley <dgrowleyml@gmail.com>
wrote:

On Fri, 10 Jan 2025 at 22:53, Vladlen Popolitov
<v.popolitov@postgrespro.ru> wrote:

In case of query
select count(*) from test_table where a_1 = 1000000;
I would expect increase of query time due to additional if...else . It
is not clear
what code was eliminated to decrease query time.

Are you talking about the code added to ExecInitSeqScan() to determine
which node function to call? If so, that's only called during executor
startup. The idea here is to reduce the branching during execution by
calling one of those special functions which has a more specialised
version of the ExecScan code for the particular purpose it's going to
be used for.

Looks like I hadn't mentioned this key aspect of the patch in the
commit message, so did that in the attached.

Vladlen, does what David wrote and the new commit message answer your
question(s)?

Hi Amit,

Yes, David clarified the idea, but it is still hard to believe in 5% of
improvements.
The query
select count(*) from test_table where a_1 = 1000000;
has both qual and projection, and ExecScanExtended() will be generated
similar to ExecScan() (the same not NULL values to check in if()).
Do you have some scripts to reproduce your benchmark?
--
Best regards,

Vladlen Popolitov.

#11

Junwang Zhao

zhjwpku@gmail.com

about 1 year ago

In reply to: Vladlen Popolitov (#10)

Re: Some ExecSeqScan optimizations

On Fri, Jan 10, 2025 at 10:49 PM Vladlen Popolitov
<v.popolitov@postgrespro.ru> wrote:

Amit Langote писал(а) 2025-01-10 18:22:

On Fri, Jan 10, 2025 at 7:36 PM David Rowley <dgrowleyml@gmail.com>
wrote:

On Fri, 10 Jan 2025 at 22:53, Vladlen Popolitov
<v.popolitov@postgrespro.ru> wrote:

In case of query
select count(*) from test_table where a_1 = 1000000;
I would expect increase of query time due to additional if...else . It
is not clear
what code was eliminated to decrease query time.

Are you talking about the code added to ExecInitSeqScan() to determine
which node function to call? If so, that's only called during executor
startup. The idea here is to reduce the branching during execution by
calling one of those special functions which has a more specialised
version of the ExecScan code for the particular purpose it's going to
be used for.

Looks like I hadn't mentioned this key aspect of the patch in the
commit message, so did that in the attached.

Vladlen, does what David wrote and the new commit message answer your
question(s)?

Hi Amit,

Yes, David clarified the idea, but it is still hard to believe in 5% of
improvements.
The query
select count(*) from test_table where a_1 = 1000000;
has both qual and projection, and ExecScanExtended() will be generated
similar to ExecScan() (the same not NULL values to check in if()).
Do you have some scripts to reproduce your benchmark?

The benchmark is provided [0]https://docs.google.com/spreadsheets/d/1AsJOUgIfSsYIJUJwbXk4aO9FVOFOrBCvrfmdQYkHIw4/edit?usp=sharing.

Here is a rough comparison of compiled variants' assembly code.

<ExecSeqScan>: start 2b8590, end 2b868c => 252
<ExecSeqScanWithProject>: start 2b8034, end 2b8140 => 268
<ExecSeqScanWithQual>: start 2b8144, end 2b831c => 472
<ExecSeqScanWithQualProject>: start 2b8320, end 2b858c => 620

Before Amit's optimization, it was basically called the
ExecSeqScanWithQualProject, you
can see the other 3 variants all have some reduction in function size.

--
Best regards,

Vladlen Popolitov.

[0]: https://docs.google.com/spreadsheets/d/1AsJOUgIfSsYIJUJwbXk4aO9FVOFOrBCvrfmdQYkHIw4/edit?usp=sharing

--
Regards
Junwang Zhao

#12

Junwang Zhao

zhjwpku@gmail.com

about 1 year ago

In reply to: Junwang Zhao (#11)

Re: Some ExecSeqScan optimizations

On Sat, Jan 11, 2025 at 4:57 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Fri, Jan 10, 2025 at 10:49 PM Vladlen Popolitov
<v.popolitov@postgrespro.ru> wrote:

Amit Langote писал(а) 2025-01-10 18:22:

On Fri, Jan 10, 2025 at 7:36 PM David Rowley <dgrowleyml@gmail.com>
wrote:

On Fri, 10 Jan 2025 at 22:53, Vladlen Popolitov
<v.popolitov@postgrespro.ru> wrote:

In case of query
select count(*) from test_table where a_1 = 1000000;
I would expect increase of query time due to additional if...else . It
is not clear
what code was eliminated to decrease query time.

Are you talking about the code added to ExecInitSeqScan() to determine
which node function to call? If so, that's only called during executor
startup. The idea here is to reduce the branching during execution by
calling one of those special functions which has a more specialised
version of the ExecScan code for the particular purpose it's going to
be used for.

Looks like I hadn't mentioned this key aspect of the patch in the
commit message, so did that in the attached.

Vladlen, does what David wrote and the new commit message answer your
question(s)?

Hi Amit,

Yes, David clarified the idea, but it is still hard to believe in 5% of
improvements.
The query
select count(*) from test_table where a_1 = 1000000;
has both qual and projection, and ExecScanExtended() will be generated
similar to ExecScan() (the same not NULL values to check in if()).
Do you have some scripts to reproduce your benchmark?

The benchmark is provided [0].

Here is a rough comparison of compiled variants' assembly code.

<ExecSeqScan>: start 2b8590, end 2b868c => 252
<ExecSeqScanWithProject>: start 2b8034, end 2b8140 => 268
<ExecSeqScanWithQual>: start 2b8144, end 2b831c => 472
<ExecSeqScanWithQualProject>: start 2b8320, end 2b858c => 620

Here is my compile options:

meson setup $HOME/build --prefix=$HOME/pgsql --buildtype=release

and I use `objdump -D postgres | less` to see the assembly code.

Before Amit's optimization, it was basically called the
ExecSeqScanWithQualProject, you
can see the other 3 variants all have some reduction in function size.

--
Best regards,

Vladlen Popolitov.

[0] https://docs.google.com/spreadsheets/d/1AsJOUgIfSsYIJUJwbXk4aO9FVOFOrBCvrfmdQYkHIw4/edit?usp=sharing

--
Regards
Junwang Zhao

--
Regards
Junwang Zhao

#13

Junwang Zhao

zhjwpku@gmail.com

about 1 year ago

In reply to: Amit Langote (#9)

Re: Some ExecSeqScan optimizations

Hi Amit,

On Fri, Jan 10, 2025 at 7:22 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Fri, Jan 10, 2025 at 7:36 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 10 Jan 2025 at 22:53, Vladlen Popolitov
<v.popolitov@postgrespro.ru> wrote:

In case of query
select count(*) from test_table where a_1 = 1000000;
I would expect increase of query time due to additional if...else . It
is not clear
what code was eliminated to decrease query time.

Are you talking about the code added to ExecInitSeqScan() to determine
which node function to call? If so, that's only called during executor
startup. The idea here is to reduce the branching during execution by
calling one of those special functions which has a more specialised
version of the ExecScan code for the particular purpose it's going to
be used for.

Looks like I hadn't mentioned this key aspect of the patch in the
commit message, so did that in the attached.

Thanks for updating the patch. While seeing the patch, the es_epq_active
confused me a little bit mostly because its name, a field name ending with
"active" typically suggests a boolean value, but here it is not, should we
change it to sth like es_epqstate? However this is not related to this patch,
I can start a new thread if you think this is worth a patch.

There is one tiny indent issue(my IDE does this automatically), which I
guess you will fix before committing.

-       EPQState *epqstate;
+       EPQState   *epqstate;

Vladlen, does what David wrote and the new commit message answer your
question(s)?

--
Thanks, Amit Langote

--
Regards
Junwang Zhao

#14

Amit Langote

amitlangote09@gmail.com

12 months ago

In reply to: Vladlen Popolitov (#10)

Re: Some ExecSeqScan optimizations

Hi Vladlen,

On Fri, Jan 10, 2025 at 11:49 PM Vladlen Popolitov
<v.popolitov@postgrespro.ru> wrote:

Amit Langote писал(а) 2025-01-10 18:22:

On Fri, Jan 10, 2025 at 7:36 PM David Rowley <dgrowleyml@gmail.com>
wrote:

On Fri, 10 Jan 2025 at 22:53, Vladlen Popolitov
<v.popolitov@postgrespro.ru> wrote:

In case of query
select count(*) from test_table where a_1 = 1000000;
I would expect increase of query time due to additional if...else . It
is not clear
what code was eliminated to decrease query time.

Are you talking about the code added to ExecInitSeqScan() to determine
which node function to call? If so, that's only called during executor
startup. The idea here is to reduce the branching during execution by
calling one of those special functions which has a more specialised
version of the ExecScan code for the particular purpose it's going to
be used for.

Looks like I hadn't mentioned this key aspect of the patch in the
commit message, so did that in the attached.

Vladlen, does what David wrote and the new commit message answer your
question(s)?

Hi Amit,

Yes, David clarified the idea, but it is still hard to believe in 5% of
improvements.
The query
select count(*) from test_table where a_1 = 1000000;
has both qual and projection, and ExecScanExtended() will be generated
similar to ExecScan() (the same not NULL values to check in if()).

Yes, I've noticed that if the plan for the above query contains a
projection, like when it contains a Gather node, the inlined version
of ExecScanExtended() will look more or less the same as the full
ExecScan(). There won't be noticeable speedup with the patch in that
case.

However, I ran the benchmark tests with Gather disabled such that I
get a plan without projection, which uses an inlined version that
doesn't have branches related to projection. I illustrate my example
below.

Do you have some scripts to reproduce your benchmark?

Use these steps. Set max_parallel_workers_per_gather to 0,
shared_buffers to 512MB. Compile the patch using --buildtype=release.

create table foo (a int, b int, c int, d int, e int);
insert into foo select i, i, i, i, i from generate_series(1, 10000000) i;

-- pg_prewarm: to ensure that no buffers lead to I/O to reduce noise
select pg_size_pretty(pg_prewarm('foo'));

select count(*) from foo where a = 10000000;

Times I get on v17, master, and with the patch for the above query are
as follows:

v17: 173, 173, 174 ms

master: 173, 175, 169 ms

Patched: 160, 161, 158 ms

Please let me know if you're still unable to reproduce such numbers
with the steps I described.

--
Thanks, Amit Langote

#15

Amit Langote

amitlangote09@gmail.com

12 months ago

In reply to: Junwang Zhao (#13)

Re: Some ExecSeqScan optimizations

Hi Junwang,

On Sat, Jan 11, 2025 at 7:39 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

Hi Amit,

On Fri, Jan 10, 2025 at 7:22 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Fri, Jan 10, 2025 at 7:36 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 10 Jan 2025 at 22:53, Vladlen Popolitov
<v.popolitov@postgrespro.ru> wrote:

In case of query
select count(*) from test_table where a_1 = 1000000;
I would expect increase of query time due to additional if...else . It
is not clear
what code was eliminated to decrease query time.

Are you talking about the code added to ExecInitSeqScan() to determine
which node function to call? If so, that's only called during executor
startup. The idea here is to reduce the branching during execution by
calling one of those special functions which has a more specialised
version of the ExecScan code for the particular purpose it's going to
be used for.

Looks like I hadn't mentioned this key aspect of the patch in the
commit message, so did that in the attached.

Thanks for updating the patch. While seeing the patch, the es_epq_active
confused me a little bit mostly because its name, a field name ending with
"active" typically suggests a boolean value, but here it is not, should we
change it to sth like es_epqstate? However this is not related to this patch,
I can start a new thread if you think this is worth a patch.

Yeah, the name has confused me as well from time to time.

Though it might be a good idea to dig the thread that led to the
introduction of this field to find out if the naming has some logic
we're missing.

You may start a new thread to get the attention of other folks who
might have some clue.

There is one tiny indent issue(my IDE does this automatically), which I
guess you will fix before committing.
-       EPQState *epqstate;
+       EPQState   *epqstate;

Thanks for the heads up.

--
Thanks, Amit Langote

#16

Amit Langote

amitlangote09@gmail.com

12 months ago

In reply to: Amit Langote (#9)

1 attachment(s)

Re: Some ExecSeqScan optimizations

Here's v5 with a few commit message tweaks.

Barring objections, I would like to push this early next week.

--
Thanks, Amit Langote

Attachments:

v5-0001-Refactor-ExecScan-to-allow-inlining-of-its-core-l.patchapplication/octet-stream; name=v5-0001-Refactor-ExecScan-to-allow-inlining-of-its-core-l.patchDownload

From 5d91d50c9d269254f2321958a39de444cb7c4362 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 17 Jan 2025 13:46:18 +0900
Subject: [PATCH v5] Refactor ExecScan() to allow inlining of its core logic

This commit refactors ExecScan() by moving its tuple-fetching,
filtering, and projection logic into an inline-able function,
ExecScanExtended(), defined in src/include/executor/execScan.h.
ExecScanExtended() accepts parameters for EvalPlanQual state,
qualifiers (ExprState), and projection (ProjectionInfo).

Specialized variants of the execution function of a given Scan node
(for example, ExecSeqScan() for SeqScan) can then pass const-NULL for
unused parameters.  This allows the compiler to inline the logic and
eliminate unnecessary branches or checks.  Each variant function thus
contains only the necessary code, optimizing execution for scans
where these features are not needed.

The variant function to be used is determined in the ExecInit*()
function of the node and assigned to the ExecProcNode function pointer
in the node's PlanState, effectively turning runtime checks and
conditional branches on the NULLness of epqstate, qual, and projInfo
into static ones, provided the compiler successfully eliminates
unnecessary checks from the inlined code of ExecScanExtended().

Currently, only ExecSeqScan() is modified to take advantage of this
inline-ability.  Other Scan nodes might benefit from such specialized
variant functions but that is left as future work.

Benchmarks performed by Junwang Zhao, David Rowley and myself show up
to a 5% reduction in execution time for queries that rely heavily on
Seq Scans. The most significant improvements were observed in
scenarios where EvalPlanQual, qualifiers, and projection were not
required, but other cases also benefit from reduced runtime overhead
due to the inlining and removal of unnecessary code paths.

The idea for this patch first came from Andres Freund in an off-list
discussion. The refactoring approach implemented here is based on a
proposal by David Rowley, significantly improving upon the patch I
(amitlan) initially proposed.

Suggested-by: Andres Freund <andres@anarazel.de>
Co-authored-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Tested-by: Junwang Zhao <zhjwpku@gmail.com>
Tested-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqGaH-otvqW_ce-paL=96JvU4j+Xbuk+14esJNDwefdkOg@mail.gmail.com
---
 src/backend/executor/execScan.c    | 207 ++----------------------
 src/backend/executor/nodeSeqscan.c | 115 +++++++++++++-
 src/include/executor/execScan.h    | 246 +++++++++++++++++++++++++++++
 3 files changed, 365 insertions(+), 203 deletions(-)
 create mode 100644 src/include/executor/execScan.h

diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 556a5d98e78..25a776a6a19 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -19,118 +19,9 @@
 #include "postgres.h"
 
 #include "executor/executor.h"
+#include "executor/execScan.h"
 #include "miscadmin.h"
 
-
-
-/*
- * ExecScanFetch -- check interrupts & fetch next potential tuple
- *
- * This routine is concerned with substituting a test tuple if we are
- * inside an EvalPlanQual recheck.  If we aren't, just execute
- * the access method's next-tuple routine.
- */
-static inline TupleTableSlot *
-ExecScanFetch(ScanState *node,
-			  ExecScanAccessMtd accessMtd,
-			  ExecScanRecheckMtd recheckMtd)
-{
-	EState	   *estate = node->ps.state;
-
-	CHECK_FOR_INTERRUPTS();
-
-	if (estate->es_epq_active != NULL)
-	{
-		EPQState   *epqstate = estate->es_epq_active;
-
-		/*
-		 * We are inside an EvalPlanQual recheck.  Return the test tuple if
-		 * one is available, after rechecking any access-method-specific
-		 * conditions.
-		 */
-		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
-
-		if (scanrelid == 0)
-		{
-			/*
-			 * This is a ForeignScan or CustomScan which has pushed down a
-			 * join to the remote side.  The recheck method is responsible not
-			 * only for rechecking the scan/join quals but also for storing
-			 * the correct tuple in the slot.
-			 */
-
-			TupleTableSlot *slot = node->ss_ScanTupleSlot;
-
-			if (!(*recheckMtd) (node, slot))
-				ExecClearTuple(slot);	/* would not be returned by scan */
-			return slot;
-		}
-		else if (epqstate->relsubs_done[scanrelid - 1])
-		{
-			/*
-			 * Return empty slot, as either there is no EPQ tuple for this rel
-			 * or we already returned it.
-			 */
-
-			TupleTableSlot *slot = node->ss_ScanTupleSlot;
-
-			return ExecClearTuple(slot);
-		}
-		else if (epqstate->relsubs_slot[scanrelid - 1] != NULL)
-		{
-			/*
-			 * Return replacement tuple provided by the EPQ caller.
-			 */
-
-			TupleTableSlot *slot = epqstate->relsubs_slot[scanrelid - 1];
-
-			Assert(epqstate->relsubs_rowmark[scanrelid - 1] == NULL);
-
-			/* Mark to remember that we shouldn't return it again */
-			epqstate->relsubs_done[scanrelid - 1] = true;
-
-			/* Return empty slot if we haven't got a test tuple */
-			if (TupIsNull(slot))
-				return NULL;
-
-			/* Check if it meets the access-method conditions */
-			if (!(*recheckMtd) (node, slot))
-				return ExecClearTuple(slot);	/* would not be returned by
-												 * scan */
-			return slot;
-		}
-		else if (epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
-		{
-			/*
-			 * Fetch and return replacement tuple using a non-locking rowmark.
-			 */
-
-			TupleTableSlot *slot = node->ss_ScanTupleSlot;
-
-			/* Mark to remember that we shouldn't return more */
-			epqstate->relsubs_done[scanrelid - 1] = true;
-
-			if (!EvalPlanQualFetchRowMark(epqstate, scanrelid, slot))
-				return NULL;
-
-			/* Return empty slot if we haven't got a test tuple */
-			if (TupIsNull(slot))
-				return NULL;
-
-			/* Check if it meets the access-method conditions */
-			if (!(*recheckMtd) (node, slot))
-				return ExecClearTuple(slot);	/* would not be returned by
-												 * scan */
-			return slot;
-		}
-	}
-
-	/*
-	 * Run the node-type-specific access method function to get the next tuple
-	 */
-	return (*accessMtd) (node);
-}
-
 /* ----------------------------------------------------------------
  *		ExecScan
  *
@@ -157,100 +48,20 @@ ExecScan(ScanState *node,
 		 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
 		 ExecScanRecheckMtd recheckMtd)
 {
-	ExprContext *econtext;
+	EPQState *epqstate;
 	ExprState  *qual;
 	ProjectionInfo *projInfo;
 
-	/*
-	 * Fetch data from node
-	 */
+	epqstate = node->ps.state->es_epq_active;
 	qual = node->ps.qual;
 	projInfo = node->ps.ps_ProjInfo;
-	econtext = node->ps.ps_ExprContext;
-
-	/* interrupt checks are in ExecScanFetch */
-
-	/*
-	 * If we have neither a qual to check nor a projection to do, just skip
-	 * all the overhead and return the raw scan tuple.
-	 */
-	if (!qual && !projInfo)
-	{
-		ResetExprContext(econtext);
-		return ExecScanFetch(node, accessMtd, recheckMtd);
-	}
-
-	/*
-	 * Reset per-tuple memory context to free any expression evaluation
-	 * storage allocated in the previous tuple cycle.
-	 */
-	ResetExprContext(econtext);
-
-	/*
-	 * get a tuple from the access method.  Loop until we obtain a tuple that
-	 * passes the qualification.
-	 */
-	for (;;)
-	{
-		TupleTableSlot *slot;
 
-		slot = ExecScanFetch(node, accessMtd, recheckMtd);
-
-		/*
-		 * if the slot returned by the accessMtd contains NULL, then it means
-		 * there is nothing more to scan so we just return an empty slot,
-		 * being careful to use the projection result slot so it has correct
-		 * tupleDesc.
-		 */
-		if (TupIsNull(slot))
-		{
-			if (projInfo)
-				return ExecClearTuple(projInfo->pi_state.resultslot);
-			else
-				return slot;
-		}
-
-		/*
-		 * place the current tuple into the expr context
-		 */
-		econtext->ecxt_scantuple = slot;
-
-		/*
-		 * check that the current tuple satisfies the qual-clause
-		 *
-		 * check for non-null qual here to avoid a function call to ExecQual()
-		 * when the qual is null ... saves only a few cycles, but they add up
-		 * ...
-		 */
-		if (qual == NULL || ExecQual(qual, econtext))
-		{
-			/*
-			 * Found a satisfactory scan tuple.
-			 */
-			if (projInfo)
-			{
-				/*
-				 * Form a projection tuple, store it in the result tuple slot
-				 * and return it.
-				 */
-				return ExecProject(projInfo);
-			}
-			else
-			{
-				/*
-				 * Here, we aren't projecting, so just return scan tuple.
-				 */
-				return slot;
-			}
-		}
-		else
-			InstrCountFiltered1(node, 1);
-
-		/*
-		 * Tuple fails qual, so free per-tuple memory and try again.
-		 */
-		ResetExprContext(econtext);
-	}
+	return ExecScanExtended(node,
+							accessMtd,
+							recheckMtd,
+							epqstate,
+							qual,
+							projInfo);
 }
 
 /*
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index fa2d522b25f..6f9e991eeae 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -29,6 +29,7 @@
 
 #include "access/relscan.h"
 #include "access/tableam.h"
+#include "executor/execScan.h"
 #include "executor/executor.h"
 #include "executor/nodeSeqscan.h"
 #include "utils/rel.h"
@@ -99,9 +100,10 @@ SeqRecheck(SeqScanState *node, TupleTableSlot *slot)
  *		ExecSeqScan(node)
  *
  *		Scans the relation sequentially and returns the next qualifying
- *		tuple.
- *		We call the ExecScan() routine and pass it the appropriate
- *		access method functions.
+ *		tuple. This variant is used when there is no es_eqp_active, no qual
+ *		and no projection.  Passing const-NULLs for these to ExecScanExtended
+ *		allows the compiler to eliminate the additional code that would
+ *		ordinarily be required for the evaluation of these.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -109,12 +111,94 @@ ExecSeqScan(PlanState *pstate)
 {
 	SeqScanState *node = castNode(SeqScanState, pstate);
 
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual == NULL);
+	Assert(pstate->ps_ProjInfo == NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							NULL,
+							NULL);
+}
+
+/*
+ * Variant of ExecSeqScan() but when qual evaluation is required.
+ */
+static TupleTableSlot *
+ExecSeqScanWithQual(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual != NULL);
+	Assert(pstate->ps_ProjInfo == NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							pstate->qual,
+							NULL);
+}
+
+/*
+ * Variant of ExecSeqScan() but when projection is required.
+ */
+static TupleTableSlot *
+ExecSeqScanWithProject(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual == NULL);
+	Assert(pstate->ps_ProjInfo != NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							NULL,
+							pstate->ps_ProjInfo);
+}
+
+/*
+ * Variant of ExecSeqScan() but when qual evaluation and projection are
+ * required.
+ */
+static TupleTableSlot *
+ExecSeqScanWithQualProject(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual != NULL);
+	Assert(pstate->ps_ProjInfo != NULL);
+
+	return ExecScanExtended(&node->ss,
+							(ExecScanAccessMtd) SeqNext,
+							(ExecScanRecheckMtd) SeqRecheck,
+							NULL,
+							pstate->qual,
+							pstate->ps_ProjInfo);
+}
+
+/*
+ * Variant of ExecSeqScan for when EPQ evaluation is required.  We don't
+ * bother adding variants of this for with/without qual and projection as
+ * EPQ doesn't seem as exciting a case to optimize for.
+ */
+static TupleTableSlot *
+ExecSeqScanEPQ(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
 	return ExecScan(&node->ss,
 					(ExecScanAccessMtd) SeqNext,
 					(ExecScanRecheckMtd) SeqRecheck);
 }
 
-
 /* ----------------------------------------------------------------
  *		ExecInitSeqScan
  * ----------------------------------------------------------------
@@ -137,7 +221,6 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate = makeNode(SeqScanState);
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
-	scanstate->ss.ps.ExecProcNode = ExecSeqScan;
 
 	/*
 	 * Miscellaneous initialization
@@ -171,6 +254,28 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.qual =
 		ExecInitQual(node->scan.plan.qual, (PlanState *) scanstate);
 
+	/*
+	 * When EvalPlanQual() is not in use, assign ExecProcNode for this node
+	 * based on the presence of qual and projection. Each ExecSeqScan*()
+	 * variant is optimized for the specific combination of these conditions.
+	 */
+	if (scanstate->ss.ps.state->es_epq_active != NULL)
+		scanstate->ss.ps.ExecProcNode = ExecSeqScanEPQ;
+	else if (scanstate->ss.ps.qual == NULL)
+	{
+		if (scanstate->ss.ps.ps_ProjInfo == NULL)
+			scanstate->ss.ps.ExecProcNode = ExecSeqScan;
+		else
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanWithProject;
+	}
+	else
+	{
+		if (scanstate->ss.ps.ps_ProjInfo == NULL)
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanWithQual;
+		else
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanWithQualProject;
+	}
+
 	return scanstate;
 }
 
diff --git a/src/include/executor/execScan.h b/src/include/executor/execScan.h
new file mode 100644
index 00000000000..da8e5ab8a76
--- /dev/null
+++ b/src/include/executor/execScan.h
@@ -0,0 +1,246 @@
+/*-------------------------------------------------------------------------
+ * execScan.h
+ *		Inline-able support functions for Scan nodes
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *		src/include/executor/execScan.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef EXECSCAN_H
+#define EXECSCAN_H
+
+#include "miscadmin.h"
+#include "executor/executor.h"
+#include "nodes/execnodes.h"
+
+/*
+ * ExecScanFetch -- check interrupts & fetch next potential tuple
+ *
+ * This routine substitutes a test tuple if inside an EvalPlanQual recheck.
+ * Otherwise, it simply executes the access method's next-tuple routine.
+ *
+ * The pg_attribute_always_inline attribute allows the compiler to inline
+ * this function into its caller. When EPQState is NULL, the EvalPlanQual
+ * logic is completely eliminated at compile time, avoiding unnecessary
+ * run-time checks and code for cases where EPQ is not required.
+ */
+static pg_attribute_always_inline TupleTableSlot *
+ExecScanFetch(ScanState *node,
+			  EPQState *epqstate,
+			  ExecScanAccessMtd accessMtd,
+			  ExecScanRecheckMtd recheckMtd)
+{
+	CHECK_FOR_INTERRUPTS();
+
+	if (epqstate != NULL)
+	{
+		/*
+		 * We are inside an EvalPlanQual recheck.  Return the test tuple if
+		 * one is available, after rechecking any access-method-specific
+		 * conditions.
+		 */
+		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
+
+		if (scanrelid == 0)
+		{
+			/*
+			 * This is a ForeignScan or CustomScan which has pushed down a
+			 * join to the remote side.  The recheck method is responsible not
+			 * only for rechecking the scan/join quals but also for storing
+			 * the correct tuple in the slot.
+			 */
+
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			if (!(*recheckMtd) (node, slot))
+				ExecClearTuple(slot);	/* would not be returned by scan */
+			return slot;
+		}
+		else if (epqstate->relsubs_done[scanrelid - 1])
+		{
+			/*
+			 * Return empty slot, as either there is no EPQ tuple for this rel
+			 * or we already returned it.
+			 */
+
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			return ExecClearTuple(slot);
+		}
+		else if (epqstate->relsubs_slot[scanrelid - 1] != NULL)
+		{
+			/*
+			 * Return replacement tuple provided by the EPQ caller.
+			 */
+
+			TupleTableSlot *slot = epqstate->relsubs_slot[scanrelid - 1];
+
+			Assert(epqstate->relsubs_rowmark[scanrelid - 1] == NULL);
+
+			/* Mark to remember that we shouldn't return it again */
+			epqstate->relsubs_done[scanrelid - 1] = true;
+
+			/* Return empty slot if we haven't got a test tuple */
+			if (TupIsNull(slot))
+				return NULL;
+
+			/* Check if it meets the access-method conditions */
+			if (!(*recheckMtd) (node, slot))
+				return ExecClearTuple(slot);	/* would not be returned by
+												 * scan */
+			return slot;
+		}
+		else if (epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/*
+			 * Fetch and return replacement tuple using a non-locking rowmark.
+			 */
+
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			/* Mark to remember that we shouldn't return more */
+			epqstate->relsubs_done[scanrelid - 1] = true;
+
+			if (!EvalPlanQualFetchRowMark(epqstate, scanrelid, slot))
+				return NULL;
+
+			/* Return empty slot if we haven't got a test tuple */
+			if (TupIsNull(slot))
+				return NULL;
+
+			/* Check if it meets the access-method conditions */
+			if (!(*recheckMtd) (node, slot))
+				return ExecClearTuple(slot);	/* would not be returned by
+												 * scan */
+			return slot;
+		}
+	}
+
+	/*
+	 * Run the node-type-specific access method function to get the next tuple
+	 */
+	return (*accessMtd) (node);
+}
+
+/* ----------------------------------------------------------------
+ * ExecScanExtended
+ *		Scans the relation using the given 'access method' and returns
+ *		the next qualifying tuple. The tuple is optionally checked
+ *		against 'qual' and, if provided, projected using 'projInfo'.
+ *
+ * The 'recheck method' validates an arbitrary tuple of the relation
+ * against conditions enforced by the access method.
+ *
+ * This function is an alternative to ExecScan, used when callers
+ * may omit 'qual' or 'projInfo'. The pg_attribute_always_inline
+ * attribute allows the compiler to eliminate non-relevant branches
+ * at compile time, avoiding run-time checks in those cases.
+ *
+ * Conditions:
+ *	-- The AMI "cursor" is positioned at the previously returned tuple.
+ *
+ * Initial States:
+ *	-- The relation is opened for scanning, with the "cursor"
+ *	positioned before the first qualifying tuple.
+ * ----------------------------------------------------------------
+ */
+static pg_attribute_always_inline TupleTableSlot *
+ExecScanExtended(ScanState *node,
+				 ExecScanAccessMtd accessMtd,	/* function returning a tuple */
+				 ExecScanRecheckMtd recheckMtd,
+				 EPQState *epqstate,
+				 ExprState *qual,
+				 ProjectionInfo *projInfo)
+{
+	ExprContext *econtext = node->ps.ps_ExprContext;
+
+	/* interrupt checks are in ExecScanFetch */
+
+	/*
+	 * If we have neither a qual to check nor a projection to do, just skip
+	 * all the overhead and return the raw scan tuple.
+	 */
+	if (!qual && !projInfo)
+	{
+		ResetExprContext(econtext);
+		return ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
+	}
+
+	/*
+	 * Reset per-tuple memory context to free any expression evaluation
+	 * storage allocated in the previous tuple cycle.
+	 */
+	ResetExprContext(econtext);
+
+	/*
+	 * get a tuple from the access method.  Loop until we obtain a tuple that
+	 * passes the qualification.
+	 */
+	for (;;)
+	{
+		TupleTableSlot *slot;
+
+		slot = ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
+
+		/*
+		 * if the slot returned by the accessMtd contains NULL, then it means
+		 * there is nothing more to scan so we just return an empty slot,
+		 * being careful to use the projection result slot so it has correct
+		 * tupleDesc.
+		 */
+		if (TupIsNull(slot))
+		{
+			if (projInfo)
+				return ExecClearTuple(projInfo->pi_state.resultslot);
+			else
+				return slot;
+		}
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-null qual here to avoid a function call to ExecQual()
+		 * when the qual is null ... saves only a few cycles, but they add up
+		 * ...
+		 */
+		if (qual == NULL || ExecQual(qual, econtext))
+		{
+			/*
+			 * Found a satisfactory scan tuple.
+			 */
+			if (projInfo)
+			{
+				/*
+				 * Form a projection tuple, store it in the result tuple slot
+				 * and return it.
+				 */
+				return ExecProject(projInfo);
+			}
+			else
+			{
+				/*
+				 * Here, we aren't projecting, so just return scan tuple.
+				 */
+				return slot;
+			}
+		}
+		else
+			InstrCountFiltered1(node, 1);
+
+		/*
+		 * Tuple fails qual, so free per-tuple memory and try again.
+		 */
+		ResetExprContext(econtext);
+	}
+}
+
+#endif							/* EXECSCAN_H */
-- 
2.43.0

#17

Amit Langote

amitlangote09@gmail.com

12 months ago

In reply to: Amit Langote (#16)

Re: Some ExecSeqScan optimizations

On Fri, Jan 17, 2025 at 2:05 PM Amit Langote <amitlangote09@gmail.com> wrote:

Here's v5 with a few commit message tweaks.

Barring objections, I would like to push this early next week.

Pushed yesterday. Thank you all.

--
Thanks, Amit Langote

#18

Andres Freund

andres@anarazel.de

6 months ago

In reply to: Amit Langote (#17)

1 attachment(s)

Re: Some ExecSeqScan optimizations

Hi,

On 2025-01-22 10:07:51 +0900, Amit Langote wrote:

On Fri, Jan 17, 2025 at 2:05 PM Amit Langote <amitlangote09@gmail.com> wrote:

Here's v5 with a few commit message tweaks.

Barring objections, I would like to push this early next week.

Pushed yesterday. Thank you all.

While looking at a profile I recently noticed that ExecSeqScanWithQual() had a
runtime branch to test whether qual is NULL, which seemed a bit silly. I think
we should use pg_assume(), which I just added to avoid a compiler warning, to
improve the code generation here.

The performance gain unsurprisingly isn't significant (but seems repeatably
measureable), but it does cut out a fair bit of unnecessary code.

andres@awork3:/srv/dev/build/postgres/m-dev-optimize$ size executor_nodeSeqscan.c.*o
text data bss dec hex filename
3330 0 0 3330 d02 executor_nodeSeqscan.c.assume.o
3834 0 0 3834 efa executor_nodeSeqscan.c.o

A 13% reduction in actual code size isn't bad for such a small change, imo.

I have a separate question as well - do we need to call ResetExprContext() if
we neither qual, projection nor epq? I see a small gain by avoiding that.

Greetings,

Andres Freund

Attachments:

v1-0001-Optimize-seqscan-code-generation-using-pg_assume.patchtext/x-diff; charset=us-asciiDownload

From a443d7dc6419a5648b10bbd900acf2fc745255b4 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 9 Jul 2025 19:27:19 -0400
Subject: [PATCH v1] Optimize seqscan code generation using pg_assume()

Discussion: https://postgr.es/m/CA+HiwqFk-MbwhfX_kucxzL8zLmjEt9MMcHi2YF=DyhPrSjsBEA@mail.gmail.com
---
 src/backend/executor/nodeSeqscan.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index ed35c58c2c3..94047d29430 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -131,8 +131,12 @@ ExecSeqScanWithQual(PlanState *pstate)
 {
 	SeqScanState *node = castNode(SeqScanState, pstate);
 
+	/*
+	 * Use pg_assume() for != NULL tests to make the compiler realize no
+	 * runtime check for the field is needed in ExecScanExtended().
+	 */
 	Assert(pstate->state->es_epq_active == NULL);
-	Assert(pstate->qual != NULL);
+	pg_assume(pstate->qual != NULL);
 	Assert(pstate->ps_ProjInfo == NULL);
 
 	return ExecScanExtended(&node->ss,
@@ -153,7 +157,7 @@ ExecSeqScanWithProject(PlanState *pstate)
 
 	Assert(pstate->state->es_epq_active == NULL);
 	Assert(pstate->qual == NULL);
-	Assert(pstate->ps_ProjInfo != NULL);
+	pg_assume(pstate->ps_ProjInfo != NULL);
 
 	return ExecScanExtended(&node->ss,
 							(ExecScanAccessMtd) SeqNext,
@@ -173,8 +177,8 @@ ExecSeqScanWithQualProject(PlanState *pstate)
 	SeqScanState *node = castNode(SeqScanState, pstate);
 
 	Assert(pstate->state->es_epq_active == NULL);
-	Assert(pstate->qual != NULL);
-	Assert(pstate->ps_ProjInfo != NULL);
+	pg_assume(pstate->qual != NULL);
+	pg_assume(pstate->ps_ProjInfo != NULL);
 
 	return ExecScanExtended(&node->ss,
 							(ExecScanAccessMtd) SeqNext,
-- 
2.48.1.76.g4e746b1a31.dirty

#19

Amit Langote

amitlangote09@gmail.com

6 months ago

In reply to: Andres Freund (#18)

Re: Some ExecSeqScan optimizations

Hi Andres,

On Thu, Jul 10, 2025 at 8:34 AM Andres Freund <andres@anarazel.de> wrote:

On 2025-01-22 10:07:51 +0900, Amit Langote wrote:

On Fri, Jan 17, 2025 at 2:05 PM Amit Langote <amitlangote09@gmail.com> wrote:

Here's v5 with a few commit message tweaks.

Barring objections, I would like to push this early next week.

Pushed yesterday. Thank you all.

While looking at a profile I recently noticed that ExecSeqScanWithQual() had a
runtime branch to test whether qual is NULL, which seemed a bit silly. I think
we should use pg_assume(), which I just added to avoid a compiler warning, to
improve the code generation here.

+1. I think this might be what David was getting at in his first
message in this thread.

The performance gain unsurprisingly isn't significant (but seems repeatably
measureable), but it does cut out a fair bit of unnecessary code.

andres@awork3:/srv/dev/build/postgres/m-dev-optimize$ size executor_nodeSeqscan.c.*o
text data bss dec hex filename
3330 0 0 3330 d02 executor_nodeSeqscan.c.assume.o
3834 0 0 3834 efa executor_nodeSeqscan.c.o

A 13% reduction in actual code size isn't bad for such a small change, imo.

Yeah, that seems worthwhile. I had been a bit concerned about code
size growth from having four variant functions with at least some
duplication, so this is a nice offset.

Thanks for the patch.

+    /*
+     * Use pg_assume() for != NULL tests to make the compiler realize no
+     * runtime check for the field is needed in ExecScanExtended().
+     */

I propose changing "to make the compiler realize no runtime check" to
"so the compiler can optimize away the runtime check", assuming that
is what it means.

Also, I assume you intentionally avoided repeating the comment in all
the variant functions.

I have a separate question as well - do we need to call ResetExprContext() if
we neither qual, projection nor epq? I see a small gain by avoiding that.

You're referring to this block, I assume:

/*
* If we have neither a qual to check nor a projection to do, just skip
* all the overhead and return the raw scan tuple.
*/
if (!qual && !projInfo)
{
ResetExprContext(econtext);
return ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
}

Yeah, I think it's fine to remove ResetExprContext() here. When I
looked at it before, I left it in because I was unsure whether
accessMtd() might leak memory into the per-tuple context. But on
second thought, that seems unlikely? Would you like me to do it or do
you have a patch in your tree already?

--
Thanks, Amit Langote

#20

Andres Freund

andres@anarazel.de

6 months ago

In reply to: Amit Langote (#19)

Re: Some ExecSeqScan optimizations

Hi,

On 2025-07-10 17:28:50 +0900, Amit Langote wrote:

On Thu, Jul 10, 2025 at 8:34 AM Andres Freund <andres@anarazel.de> wrote:

On 2025-01-22 10:07:51 +0900, Amit Langote wrote:

On Fri, Jan 17, 2025 at 2:05 PM Amit Langote <amitlangote09@gmail.com> wrote:

Here's v5 with a few commit message tweaks.

Barring objections, I would like to push this early next week.

Pushed yesterday. Thank you all.

While looking at a profile I recently noticed that ExecSeqScanWithQual() had a
runtime branch to test whether qual is NULL, which seemed a bit silly. I think
we should use pg_assume(), which I just added to avoid a compiler warning, to
improve the code generation here.

+1. I think this might be what David was getting at in his first
message in this thread.

Indeed.

The performance gain unsurprisingly isn't significant (but seems repeatably
measureable), but it does cut out a fair bit of unnecessary code.

andres@awork3:/srv/dev/build/postgres/m-dev-optimize$ size executor_nodeSeqscan.c.*o
text data bss dec hex filename
3330 0 0 3330 d02 executor_nodeSeqscan.c.assume.o
3834 0 0 3834 efa executor_nodeSeqscan.c.o

A 13% reduction in actual code size isn't bad for such a small change, imo.

Yeah, that seems worthwhile. I had been a bit concerned about code
size growth from having four variant functions with at least some
duplication, so this is a nice offset.

I'm rather surprised by just how much the size reduces...

I built nodeSeqscan.c with -ffunction-sections and looked at the size with
size --format=sysv:

Before:
.text.SeqRecheck 6 0
.rodata.str1.8 135 0
.text.unlikely.SeqNext 53 0
.text.SeqNext 178 0
.text.ExecSeqScanEPQ 20 0
.text.ExecSeqScanWithProject 289 0
.text.unlikely.ExecSeqScanWithQual 53 0
.text.ExecSeqScanWithQual 441 0
.text.unlikely.ExecSeqScanWithQualProject 53 0
.text.ExecSeqScanWithQualProject 811 0
.text.unlikely.ExecSeqScan 53 0
.text.ExecSeqScan 245 0
.text.ExecInitSeqScan 287 0
.text.ExecEndSeqScan 33 0
.text.ExecReScanSeqScan 63 0
.text.ExecSeqScanEstimate 88 0
.text.ExecSeqScanInitializeDSM 114 0
.text.ExecSeqScanReInitializeDSM 34 0
.text.ExecSeqScanInitializeWorker 64 0

After:
.text.SeqRecheck 6 0
.rodata.str1.8 135 0
.text.unlikely.SeqNext 53 0
.text.SeqNext 178 0
.text.ExecSeqScanEPQ 20 0
.text.ExecSeqScanWithProject 209 0
.text.unlikely.ExecSeqScanWithQual 53 0
.text.ExecSeqScanWithQual 373 0
.text.unlikely.ExecSeqScanWithQualProject 53 0
.text.ExecSeqScanWithQualProject 474 0
.text.unlikely.ExecSeqScan 53 0
.text.ExecSeqScan 245 0
.text.ExecInitSeqScan 287 0
.text.ExecEndSeqScan 33 0
.text.ExecReScanSeqScan 63 0
.text.ExecSeqScanEstimate 88 0
.text.ExecSeqScanInitializeDSM 114 0
.text.ExecSeqScanReInitializeDSM 34 0
.text.ExecSeqScanInitializeWorker 64 0

I'm rather baffled that the size of ExecSeqScanWithQualProject goes from 811
to 474, just due to those null checks being removed... But I'll take it.

Thanks for the patch.
+    /*
+     * Use pg_assume() for != NULL tests to make the compiler realize no
+     * runtime check for the field is needed in ExecScanExtended().
+     */
I propose changing "to make the compiler realize no runtime check" to
"so the compiler can optimize away the runtime check", assuming that
is what it means.

It does. I don't really see a meaningful difference between the comments?

Also, I assume you intentionally avoided repeating the comment in all
the variant functions.

Yea, it didn't seem helpful to do so.

I have a separate question as well - do we need to call ResetExprContext() if
we neither qual, projection nor epq? I see a small gain by avoiding that.

You're referring to this block, I assume:

/*
* If we have neither a qual to check nor a projection to do, just skip
* all the overhead and return the raw scan tuple.
*/
if (!qual && !projInfo)
{
ResetExprContext(econtext);
return ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
}

Yep.

Yeah, I think it's fine to remove ResetExprContext() here. When I
looked at it before, I left it in because I was unsure whether
accessMtd() might leak memory into the per-tuple context.

It's a good question. I think I unfortunately found a problematic case,
ForeignNext().

I wonder if we instead can MemoryContextReset cheaper, by avoiding a function
call for the common case that no reset is needed. Right now we can't just
check ->isReset in an inline function, because we also delete children. I
wonder if we could define isReset so that creating a child context unsets
isReset?

Greetings,

Andres Freund

#21

Amit Langote

amitlangote09@gmail.com

6 months ago

In reply to: Andres Freund (#20)

Re: Some ExecSeqScan optimizations

On Fri, Jul 11, 2025 at 5:55 AM Andres Freund <andres@anarazel.de> wrote:

On 2025-07-10 17:28:50 +0900, Amit Langote wrote:

On Thu, Jul 10, 2025 at 8:34 AM Andres Freund <andres@anarazel.de> wrote:

The performance gain unsurprisingly isn't significant (but seems repeatably
measureable), but it does cut out a fair bit of unnecessary code.

andres@awork3:/srv/dev/build/postgres/m-dev-optimize$ size executor_nodeSeqscan.c.*o
text data bss dec hex filename
3330 0 0 3330 d02 executor_nodeSeqscan.c.assume.o
3834 0 0 3834 efa executor_nodeSeqscan.c.o

A 13% reduction in actual code size isn't bad for such a small change, imo.

Yeah, that seems worthwhile. I had been a bit concerned about code
size growth from having four variant functions with at least some
duplication, so this is a nice offset.

I'm rather surprised by just how much the size reduces...

I built nodeSeqscan.c with -ffunction-sections and looked at the size with
size --format=sysv:

Before:
.text.SeqRecheck 6 0
.rodata.str1.8 135 0
.text.unlikely.SeqNext 53 0
.text.SeqNext 178 0
.text.ExecSeqScanEPQ 20 0
.text.ExecSeqScanWithProject 289 0
.text.unlikely.ExecSeqScanWithQual 53 0
.text.ExecSeqScanWithQual 441 0
.text.unlikely.ExecSeqScanWithQualProject 53 0
.text.ExecSeqScanWithQualProject 811 0
.text.unlikely.ExecSeqScan 53 0
.text.ExecSeqScan 245 0
.text.ExecInitSeqScan 287 0
.text.ExecEndSeqScan 33 0
.text.ExecReScanSeqScan 63 0
.text.ExecSeqScanEstimate 88 0
.text.ExecSeqScanInitializeDSM 114 0
.text.ExecSeqScanReInitializeDSM 34 0
.text.ExecSeqScanInitializeWorker 64 0

After:
.text.SeqRecheck 6 0
.rodata.str1.8 135 0
.text.unlikely.SeqNext 53 0
.text.SeqNext 178 0
.text.ExecSeqScanEPQ 20 0
.text.ExecSeqScanWithProject 209 0
.text.unlikely.ExecSeqScanWithQual 53 0
.text.ExecSeqScanWithQual 373 0
.text.unlikely.ExecSeqScanWithQualProject 53 0
.text.ExecSeqScanWithQualProject 474 0
.text.unlikely.ExecSeqScan 53 0
.text.ExecSeqScan 245 0
.text.ExecInitSeqScan 287 0
.text.ExecEndSeqScan 33 0
.text.ExecReScanSeqScan 63 0
.text.ExecSeqScanEstimate 88 0
.text.ExecSeqScanInitializeDSM 114 0
.text.ExecSeqScanReInitializeDSM 34 0
.text.ExecSeqScanInitializeWorker 64 0

I'm rather baffled that the size of ExecSeqScanWithQualProject goes from 811
to 474, just due to those null checks being removed... But I'll take it.

Wow, indeed.

Thanks for the patch.
+    /*
+     * Use pg_assume() for != NULL tests to make the compiler realize no
+     * runtime check for the field is needed in ExecScanExtended().
+     */
I propose changing "to make the compiler realize no runtime check" to
"so the compiler can optimize away the runtime check", assuming that
is what it means.
It does. I don't really see a meaningful difference between the comments?

Maybe not. I just had to pause for a moment to be sure that was what
it actually meant when I first read it. I'm fine leaving it as is if
you prefer.

I have a separate question as well - do we need to call ResetExprContext() if
we neither qual, projection nor epq? I see a small gain by avoiding that.

You're referring to this block, I assume:

/*
* If we have neither a qual to check nor a projection to do, just skip
* all the overhead and return the raw scan tuple.
*/
if (!qual && !projInfo)
{
ResetExprContext(econtext);
return ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
}

Yep.

Yeah, I think it's fine to remove ResetExprContext() here. When I
looked at it before, I left it in because I was unsure whether
accessMtd() might leak memory into the per-tuple context.

It's a good question. I think I unfortunately found a problematic case,
ForeignNext().

Ah, so we do have a culprit in the tree.

I wonder if we instead can MemoryContextReset cheaper, by avoiding a function
call for the common case that no reset is needed. Right now we can't just
check ->isReset in an inline function, because we also delete children. I
wonder if we could define isReset so that creating a child context unsets
isReset?

Were you thinking ResetExprContext() could become something like:

#define ResetExprContext(econtext) \
do { \
if (!((econtext)->ecxt_per_tuple_memory)->isReset) \
MemoryContextReset((econtext)->ecxt_per_tuple_memory); \
} while (0)

that is, once isReset also accounts for whether any child context exists?

--
Thanks, Amit Langote

#22

Nikita Malakhov

hukutoc@gmail.com

6 months ago

In reply to: Amit Langote (#21)

Re: Some ExecSeqScan optimizations

Hi Amit!

It's a pity I missed this thread when you developed the patch.
I've developed a feature recently and discovered that SeqScan
does not make use of scan keys, and there is a Tom Lane's
comment regarding this:
* Note that unlike IndexScan, SeqScan never use keys in heap_beginscan
* (and this is very bad) - so, here we do not check are keys ok or not.

Have you considered passing scan keys like it is done in IndexScan?

Thanks!

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#23

Andres Freund

andres@anarazel.de

6 months ago

In reply to: Amit Langote (#21)

Re: Some ExecSeqScan optimizations

Hi,

On 2025-07-11 11:22:36 +0900, Amit Langote wrote:

On Fri, Jul 11, 2025 at 5:55 AM Andres Freund <andres@anarazel.de> wrote:
On 2025-07-10 17:28:50 +0900, Amit Langote wrote:
Thanks for the patch.
+    /*
+     * Use pg_assume() for != NULL tests to make the compiler realize no
+     * runtime check for the field is needed in ExecScanExtended().
+     */
I propose changing "to make the compiler realize no runtime check" to
"so the compiler can optimize away the runtime check", assuming that
is what it means.
It does. I don't really see a meaningful difference between the comments?
Maybe not. I just had to pause for a moment to be sure that was what
it actually meant when I first read it. I'm fine leaving it as is if
you prefer.

To me my version makes a bit more sense, by explaining that we tell the
compiler information that it otherwise doesn't have, which results in the
optimization...

I have a separate question as well - do we need to call ResetExprContext() if
we neither qual, projection nor epq? I see a small gain by avoiding that.

You're referring to this block, I assume:

/*
* If we have neither a qual to check nor a projection to do, just skip
* all the overhead and return the raw scan tuple.
*/
if (!qual && !projInfo)
{
ResetExprContext(econtext);
return ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
}

Yep.

Yeah, I think it's fine to remove ResetExprContext() here. When I
looked at it before, I left it in because I was unsure whether
accessMtd() might leak memory into the per-tuple context.

It's a good question. I think I unfortunately found a problematic case,
ForeignNext().

Ah, so we do have a culprit in the tree.

I wonder if we instead can MemoryContextReset cheaper, by avoiding a function
call for the common case that no reset is needed. Right now we can't just
check ->isReset in an inline function, because we also delete children. I
wonder if we could define isReset so that creating a child context unsets
isReset?

Were you thinking ResetExprContext() could become something like:

#define ResetExprContext(econtext) \
do { \
if (!((econtext)->ecxt_per_tuple_memory)->isReset) \
MemoryContextReset((econtext)->ecxt_per_tuple_memory); \
} while (0)

that is, once isReset also accounts for whether any child context exists?

Nearly - I was thinking we'd do that in MemoryContextReset(), rather than
ResetExprContext().

Greetings,

Andres Freund

#24

Andres Freund

andres@anarazel.de

6 months ago

In reply to: Nikita Malakhov (#22)

Re: Some ExecSeqScan optimizations

Hi,

On 2025-07-11 14:03:42 +0300, Nikita Malakhov wrote:

It's a pity I missed this thread when you developed the patch.
I've developed a feature recently and discovered that SeqScan
does not make use of scan keys, and there is a Tom Lane's
comment regarding this:
* Note that unlike IndexScan, SeqScan never use keys in heap_beginscan
* (and this is very bad) - so, here we do not check are keys ok or not.

Have you considered passing scan keys like it is done in IndexScan?

You can't easily do that without causing issues:

1) ScanKeys are evaluated while holding a buffer lock, we shouldn't do that
with arbitrary functions (since they could recurse and acquire other locks
in a non-correct order)

2) ScanKeys are rather restrictive in what they can express, but not
restrictive enough to make 1) not a problem. That means that you can't just
evaluate the whole predicate using ScanKeys.

3) ScanKey evaluation is actually sometimes *more* expensive than expression
evaluation, because the columns are deformed one-by-one.

Greetings,

Andres Freund

#25

Nikita Malakhov

hukutoc@gmail.com

6 months ago

In reply to: Andres Freund (#24)

Re: Some ExecSeqScan optimizations

Hi!

Andres, thank you for the explanation about the locks.
I've already tried to pass scan keys and saw that it is
quite expensive.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#26

Amit Langote

amitlangote09@gmail.com

6 months ago

In reply to: Andres Freund (#23)

Re: Some ExecSeqScan optimizations

Hi,

On Fri, Jul 11, 2025 at 11:34 PM Andres Freund <andres@anarazel.de> wrote:

On 2025-07-11 11:22:36 +0900, Amit Langote wrote:
On Fri, Jul 11, 2025 at 5:55 AM Andres Freund <andres@anarazel.de> wrote:
On 2025-07-10 17:28:50 +0900, Amit Langote wrote:
Thanks for the patch.
+    /*
+     * Use pg_assume() for != NULL tests to make the compiler realize no
+     * runtime check for the field is needed in ExecScanExtended().
+     */
I propose changing "to make the compiler realize no runtime check" to
"so the compiler can optimize away the runtime check", assuming that
is what it means.
It does. I don't really see a meaningful difference between the comments?
Maybe not. I just had to pause for a moment to be sure that was what
it actually meant when I first read it. I'm fine leaving it as is if
you prefer.
To me my version makes a bit more sense, by explaining that we tell the
compiler information that it otherwise doesn't have, which results in the
optimization...

Ok, that does make sense.

I have a separate question as well - do we need to call ResetExprContext() if
we neither qual, projection nor epq? I see a small gain by avoiding that.

You're referring to this block, I assume:

/*
* If we have neither a qual to check nor a projection to do, just skip
* all the overhead and return the raw scan tuple.
*/
if (!qual && !projInfo)
{
ResetExprContext(econtext);
return ExecScanFetch(node, epqstate, accessMtd, recheckMtd);
}

Yep.

Yeah, I think it's fine to remove ResetExprContext() here. When I
looked at it before, I left it in because I was unsure whether
accessMtd() might leak memory into the per-tuple context.

I wonder if we instead can MemoryContextReset cheaper, by avoiding a function
call for the common case that no reset is needed. Right now we can't just
check ->isReset in an inline function, because we also delete children. I
wonder if we could define isReset so that creating a child context unsets
isReset?

Were you thinking ResetExprContext() could become something like:

#define ResetExprContext(econtext) \
do { \
if (!((econtext)->ecxt_per_tuple_memory)->isReset) \
MemoryContextReset((econtext)->ecxt_per_tuple_memory); \
} while (0)

that is, once isReset also accounts for whether any child context exists?

Nearly - I was thinking we'd do that in MemoryContextReset(), rather than
ResetExprContext().

Ah, ok -- I was confused about which function you meant ("can't just
check ->isReset in an inline function" should have been a clue). I
thought you were referring to avoiding the call to
MemoryContextReset() itself from ExecScanExtended() by checking
isReset.

But it sounds like you meant optimizing within MemoryContextReset() --
specifically, skipping MemoryContextDeleteChildren() when isReset is
already true, so it becomes:

if (context->isReset)
return;
MemoryContextDeleteChildren(context);
MemoryContextResetOnly(context);

Just out of curiosity, I tried making that change locally, and meson
test (check-world) passed. I assume that's just because nothing
notices leaked child contexts -- there's no mechanism asserting that
everything under a context gets reset if we skip
MemoryContextDeleteChildren().

That’s not to say we don't need MemoryContextCreate() to clear isReset
in the parent when adding a child. :-)

--
Thanks, Amit Langote

#27

Andres Freund

andres@anarazel.de

5 months ago

In reply to: Amit Langote (#21)

Re: Some ExecSeqScan optimizations

Hi,

On 2025-07-11 11:22:36 +0900, Amit Langote wrote:

On Fri, Jul 11, 2025 at 5:55 AM Andres Freund <andres@anarazel.de> wrote:

On 2025-07-10 17:28:50 +0900, Amit Langote wrote:

On Thu, Jul 10, 2025 at 8:34 AM Andres Freund <andres@anarazel.de> wrote:

The performance gain unsurprisingly isn't significant (but seems repeatably
measureable), but it does cut out a fair bit of unnecessary code.

andres@awork3:/srv/dev/build/postgres/m-dev-optimize$ size executor_nodeSeqscan.c.*o
text data bss dec hex filename
3330 0 0 3330 d02 executor_nodeSeqscan.c.assume.o
3834 0 0 3834 efa executor_nodeSeqscan.c.o

A 13% reduction in actual code size isn't bad for such a small change, imo.

Yeah, that seems worthwhile. I had been a bit concerned about code
size growth from having four variant functions with at least some
duplication, so this is a nice offset.

I'm rather surprised by just how much the size reduces...

I built nodeSeqscan.c with -ffunction-sections and looked at the size with
size --format=sysv:

Before:
.text.SeqRecheck 6 0
.rodata.str1.8 135 0
.text.unlikely.SeqNext 53 0
.text.SeqNext 178 0
.text.ExecSeqScanEPQ 20 0
.text.ExecSeqScanWithProject 289 0
.text.unlikely.ExecSeqScanWithQual 53 0
.text.ExecSeqScanWithQual 441 0
.text.unlikely.ExecSeqScanWithQualProject 53 0
.text.ExecSeqScanWithQualProject 811 0
.text.unlikely.ExecSeqScan 53 0
.text.ExecSeqScan 245 0
.text.ExecInitSeqScan 287 0
.text.ExecEndSeqScan 33 0
.text.ExecReScanSeqScan 63 0
.text.ExecSeqScanEstimate 88 0
.text.ExecSeqScanInitializeDSM 114 0
.text.ExecSeqScanReInitializeDSM 34 0
.text.ExecSeqScanInitializeWorker 64 0

After:
.text.SeqRecheck 6 0
.rodata.str1.8 135 0
.text.unlikely.SeqNext 53 0
.text.SeqNext 178 0
.text.ExecSeqScanEPQ 20 0
.text.ExecSeqScanWithProject 209 0
.text.unlikely.ExecSeqScanWithQual 53 0
.text.ExecSeqScanWithQual 373 0
.text.unlikely.ExecSeqScanWithQualProject 53 0
.text.ExecSeqScanWithQualProject 474 0
.text.unlikely.ExecSeqScan 53 0
.text.ExecSeqScan 245 0
.text.ExecInitSeqScan 287 0
.text.ExecEndSeqScan 33 0
.text.ExecReScanSeqScan 63 0
.text.ExecSeqScanEstimate 88 0
.text.ExecSeqScanInitializeDSM 114 0
.text.ExecSeqScanReInitializeDSM 34 0
.text.ExecSeqScanInitializeWorker 64 0

I'm rather baffled that the size of ExecSeqScanWithQualProject goes from 811
to 474, just due to those null checks being removed... But I'll take it.

Wow, indeed.

Thanks for reviewing. Pushed!

Greetings,

Andres Freund