Don't codegen deform code for virtual tuples in expr eval for scan fetch

Started by Soumyadeep Chakrabortyover 6 years ago12 messages
#1Soumyadeep Chakraborty
soumyadeep2007@gmail.com
2 attachment(s)

Hello Hackers,

This is to address a TODO I found in the JIT expression evaluation
code (opcode =
EEOP_INNER_FETCHSOME/EEOP_OUTER_FETCHSOME/EEOP_SCAN_FETCHSOME):

* TODO: skip nvalid check if slot is fixed and known to
* be a virtual slot.

Not only should we skip the nvalid check if the tuple is virtual, the
whole basic block should be a no-op. There is no deforming to be done,
JIT (slot_compile_deform) or otherwise (slot_getsomeattrs) for virtual
tuples.

Attached is a patch 0001 that achieves exactly that by:

1. Performing the check at the very beginning of the case.
2. Emitting an unconditional branch to the next opblock if we have a
virtual tuple. We end up with a block with just the single
unconditional branch instruction in this case. This block is optimized
away when we run llvm_optimize_module().

Also enclosed is another patch 0002 that adds on some minor
refactoring of conditionals around whether to JIT deform or not.

---
Soumyadeep Chakraborty

Attachments:

0001-Don-t-codegen-if-tuple-is-virtual-in-expr-eval-of-sc.patchapplication/octet-stream; name=0001-Don-t-codegen-if-tuple-is-virtual-in-expr-eval-of-sc.patchDownload
From 6756569021bd5b76f69a3d8d08ea60cc1e4d0ba5 Mon Sep 17 00:00:00 2001
From: soumyadeep2007 <sochakraborty@pivotal.io>
Date: Tue, 17 Sep 2019 09:21:48 -0700
Subject: [PATCH 1/2] Don't codegen if tuple is virtual in expr eval of scan
 fetch

---
 src/backend/jit/llvm/llvmjit_expr.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 30133634c7..e159255627 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -278,14 +278,21 @@ llvm_compile_expr(ExprState *state)
 					LLVMValueRef l_jit_deform = NULL;
 					const TupleTableSlotOps *tts_ops = NULL;
 
-					b_fetch = l_bb_before_v(opblocks[i + 1],
-											"op.%d.fetch", i);
-
 					if (op->d.fetch.known_desc)
 						desc = op->d.fetch.known_desc;
 
 					if (op->d.fetch.fixed)
+					{
 						tts_ops = op->d.fetch.kind;
+						/*
+						 * Deforming is not required if the tuple is virtual.
+						 */
+						if (tts_ops && tts_ops == &TTSOpsVirtual)
+						{
+							LLVMBuildBr(b, opblocks[i + 1]);
+							break;
+						}
+					}
 
 					if (opcode == EEOP_INNER_FETCHSOME)
 						v_slot = v_innerslot;
@@ -294,13 +301,8 @@ llvm_compile_expr(ExprState *state)
 					else
 						v_slot = v_scanslot;
 
-					/*
-					 * Check if all required attributes are available, or
-					 * whether deforming is required.
-					 *
-					 * TODO: skip nvalid check if slot is fixed and known to
-					 * be a virtual slot.
-					 */
+					b_fetch = l_bb_before_v(opblocks[i + 1],
+											"op.%d.fetch", i);
 					v_nvalid =
 						l_load_struct_gep(b, v_slot,
 										  FIELDNO_TUPLETABLESLOT_NVALID,
-- 
2.23.0

0002-Minor-refactor-JIT-deform-or-not.patchapplication/octet-stream; name=0002-Minor-refactor-JIT-deform-or-not.patchDownload
From 5df004bb985b9c58187bc7656b5f571402d6b622 Mon Sep 17 00:00:00 2001
From: soumyadeep2007 <sochakraborty@pivotal.io>
Date: Tue, 17 Sep 2019 21:44:26 -0700
Subject: [PATCH 2/2] Minor refactor JIT deform or not

---
 src/backend/jit/llvm/llvmjit_expr.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index e159255627..468bf2f7e6 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -275,7 +275,7 @@ llvm_compile_expr(ExprState *state)
 					LLVMValueRef v_slot;
 					LLVMBasicBlockRef b_fetch;
 					LLVMValueRef v_nvalid;
-					LLVMValueRef l_jit_deform = NULL;
+					LLVMValueRef deform_function = NULL;
 					const TupleTableSlotOps *tts_ops = NULL;
 
 					if (op->d.fetch.known_desc)
@@ -323,30 +323,25 @@ llvm_compile_expr(ExprState *state)
 					 */
 					if (tts_ops && desc && (context->base.flags & PGJIT_DEFORM))
 					{
-						l_jit_deform =
+						deform_function =
 							slot_compile_deform(context, desc,
 												tts_ops,
 												op->d.fetch.last_var);
-					}
-
-					if (l_jit_deform)
-					{
 						LLVMValueRef params[1];
-
 						params[0] = v_slot;
-
-						LLVMBuildCall(b, l_jit_deform,
+						LLVMBuildCall(b, deform_function,
 									  params, lengthof(params), "");
 					}
 					else
 					{
+						deform_function = llvm_get_decl(mod, FuncSlotGetsomeattrsInt);
 						LLVMValueRef params[2];
 
 						params[0] = v_slot;
 						params[1] = l_int32_const(op->d.fetch.last_var);
 
 						LLVMBuildCall(b,
-									  llvm_get_decl(mod, FuncSlotGetsomeattrsInt),
+									  deform_function,
 									  params, lengthof(params), "");
 					}
 
-- 
2.23.0

#2Soumyadeep Chakraborty
soumyadeep2007@gmail.com
In reply to: Soumyadeep Chakraborty (#1)
2 attachment(s)
Re: Don't codegen deform code for virtual tuples in expr eval for scan fetch

Hello,

In my previous patch 0001, the resulting opblock consisted of a single
br instruction to it's successor opblock. Such a block represents
unnecessary overhead. Even though such a block would be optimized
away, what if optimization is not performed (perhaps due to
jit_optimize_above_cost)? Perhaps we could be more aggressive. We
could maybe remove the opblock altogether. However, such a solution
is not without complexity.

Such unlinking is non-trivial, and is one of the things the
simplify-cfg pass takes care of. One solution could be to run this
pass ad-hoc for the evalexpr function. Or we could perform the
unlinking during codegen. Such a solution is presented below.

To unlink the current opblock from
the cfg we have to replace all of its uses. From the current state of
the code, these uses are either:
1. Terminator instruction of opblocks[i - 1]
2. Terminator instruction of some sub-op-block of opblocks[i - 1]. By
sub-op-block, I mean some block that is directly linked from opblock[i
- 1] but not opblocks[i].
3. Terminator instruction of the entry block.

We should replace all of these uses with opblocks[i + 1] and then and
only then can we delete opblocks[i]. My latest patch v2-0001 in my latest
patch set, achieves this.

I guard LLVMReplaceAllUsesWith() with Assert()s to ensure that we
don't accidentally replace non-terminator uses of opblocks[i], should
they be introduced in the future. If these asserts fail in the future,
replacing these non-terminator instructions with undefs constitutes a
commonly adopted solution. Otherwise, we can always fall back to making
opblocks[i] empty with just the unconditional br, as in my previous
patch 0001.

--
Soumyadeep

On Tue, Sep 17, 2019 at 11:54 PM Soumyadeep Chakraborty <
soumyadeep2007@gmail.com> wrote:

Show quoted text

Hello Hackers,

This is to address a TODO I found in the JIT expression evaluation
code (opcode =
EEOP_INNER_FETCHSOME/EEOP_OUTER_FETCHSOME/EEOP_SCAN_FETCHSOME):

* TODO: skip nvalid check if slot is fixed and known to
* be a virtual slot.

Not only should we skip the nvalid check if the tuple is virtual, the
whole basic block should be a no-op. There is no deforming to be done,
JIT (slot_compile_deform) or otherwise (slot_getsomeattrs) for virtual
tuples.

Attached is a patch 0001 that achieves exactly that by:

1. Performing the check at the very beginning of the case.
2. Emitting an unconditional branch to the next opblock if we have a
virtual tuple. We end up with a block with just the single
unconditional branch instruction in this case. This block is optimized
away when we run llvm_optimize_module().

Also enclosed is another patch 0002 that adds on some minor
refactoring of conditionals around whether to JIT deform or not.

---
Soumyadeep Chakraborty

Attachments:

v2-0001-Don-t-codegen-if-tuple-is-virtual-in-expr-eval-of.patchapplication/octet-stream; name=v2-0001-Don-t-codegen-if-tuple-is-virtual-in-expr-eval-of.patchDownload
From d4d14a092bf15f7b389a5803d745b085f122e5c4 Mon Sep 17 00:00:00 2001
From: soumyadeep2007 <sochakraborty@pivotal.io>
Date: Tue, 17 Sep 2019 09:21:48 -0700
Subject: [PATCH v2 1/2] Don't codegen if tuple is virtual in expr eval of scan
 fetch

Delete the resulting empty opblock and replace all of its uses with that
of its successor opblock.
---
 src/backend/jit/llvm/llvmjit_expr.c | 53 +++++++++++++++++++++++------
 1 file changed, 43 insertions(+), 10 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 30133634c7..52f20fe787 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -65,6 +65,8 @@ static void build_EvalXFunc(LLVMBuilderRef b, LLVMModuleRef mod,
 							ExprEvalStep *op);
 static LLVMValueRef create_LifetimeEnd(LLVMModuleRef mod);
 
+static void eliminate_empty_opblock(LLVMBasicBlockRef opblock,
+									LLVMBasicBlockRef successor);
 
 /*
  * JIT compile expression.
@@ -278,14 +280,23 @@ llvm_compile_expr(ExprState *state)
 					LLVMValueRef l_jit_deform = NULL;
 					const TupleTableSlotOps *tts_ops = NULL;
 
-					b_fetch = l_bb_before_v(opblocks[i + 1],
-											"op.%d.fetch", i);
-
 					if (op->d.fetch.known_desc)
 						desc = op->d.fetch.known_desc;
 
 					if (op->d.fetch.fixed)
+					{
 						tts_ops = op->d.fetch.kind;
+						/*
+						 * Deforming is not required if the tuple is virtual.
+						 * In such a case, the current opblock is redundant and
+						 * should be unlinked from it's predecessors and removed.
+						 */
+						if (tts_ops && tts_ops == &TTSOpsVirtual)
+						{
+							eliminate_empty_opblock(opblocks[i], opblocks[i + 1]);
+							break;
+						}
+					}
 
 					if (opcode == EEOP_INNER_FETCHSOME)
 						v_slot = v_innerslot;
@@ -294,13 +305,8 @@ llvm_compile_expr(ExprState *state)
 					else
 						v_slot = v_scanslot;
 
-					/*
-					 * Check if all required attributes are available, or
-					 * whether deforming is required.
-					 *
-					 * TODO: skip nvalid check if slot is fixed and known to
-					 * be a virtual slot.
-					 */
+					b_fetch = l_bb_before_v(opblocks[i + 1],
+											"op.%d.fetch", i);
 					v_nvalid =
 						l_load_struct_gep(b, v_slot,
 										  FIELDNO_TUPLETABLESLOT_NVALID,
@@ -2603,3 +2609,30 @@ create_LifetimeEnd(LLVMModuleRef mod)
 
 	return fn;
 }
+
+/*
+ * Unlink given opblock from CFG and remove it.
+ */
+static void
+eliminate_empty_opblock(LLVMBasicBlockRef opblock,
+						LLVMBasicBlockRef successor)
+{
+	LLVMUseRef   useRef = NULL;
+	LLVMValueRef instr   = NULL;
+	/*
+	 * Assert that all uses are terminator instructions of the opblock's
+	 * predecessors. We want to rewire all such references of the opblock to
+	 * point to its successor.
+	 */
+	useRef = LLVMGetFirstUse((LLVMValueRef) opblock);
+	for (; useRef != NULL; useRef = LLVMGetNextUse(useRef))
+	{
+		instr = LLVMGetUser(useRef);
+		Assert (LLVMIsATerminatorInst(instr));
+	}
+	/*
+	 * Perform the rewiring and delete the opblock.
+	 */
+	LLVMReplaceAllUsesWith((LLVMValueRef) opblock, (LLVMValueRef) successor);
+	LLVMDeleteBasicBlock(opblock);
+}
-- 
2.23.0

v2-0002-Minor-refactor-JIT-deform-or-not.patchapplication/octet-stream; name=v2-0002-Minor-refactor-JIT-deform-or-not.patchDownload
From c1171743e2600ce38c1531e0ce3c0429d2f59b71 Mon Sep 17 00:00:00 2001
From: soumyadeep2007 <sochakraborty@pivotal.io>
Date: Tue, 17 Sep 2019 21:44:26 -0700
Subject: [PATCH v2 2/2] Minor refactor JIT deform or not

---
 src/backend/jit/llvm/llvmjit_expr.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 52f20fe787..6910460822 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -277,7 +277,7 @@ llvm_compile_expr(ExprState *state)
 					LLVMValueRef v_slot;
 					LLVMBasicBlockRef b_fetch;
 					LLVMValueRef v_nvalid;
-					LLVMValueRef l_jit_deform = NULL;
+					LLVMValueRef deform_function = NULL;
 					const TupleTableSlotOps *tts_ops = NULL;
 
 					if (op->d.fetch.known_desc)
@@ -327,30 +327,25 @@ llvm_compile_expr(ExprState *state)
 					 */
 					if (tts_ops && desc && (context->base.flags & PGJIT_DEFORM))
 					{
-						l_jit_deform =
+						deform_function =
 							slot_compile_deform(context, desc,
 												tts_ops,
 												op->d.fetch.last_var);
-					}
-
-					if (l_jit_deform)
-					{
 						LLVMValueRef params[1];
-
 						params[0] = v_slot;
-
-						LLVMBuildCall(b, l_jit_deform,
+						LLVMBuildCall(b, deform_function,
 									  params, lengthof(params), "");
 					}
 					else
 					{
+						deform_function = llvm_get_decl(mod, FuncSlotGetsomeattrsInt);
 						LLVMValueRef params[2];
 
 						params[0] = v_slot;
 						params[1] = l_int32_const(op->d.fetch.last_var);
 
 						LLVMBuildCall(b,
-									  llvm_get_decl(mod, FuncSlotGetsomeattrsInt),
+									  deform_function,
 									  params, lengthof(params), "");
 					}
 
-- 
2.23.0

#3Andres Freund
andres@anarazel.de
In reply to: Soumyadeep Chakraborty (#1)
Re: Don't codegen deform code for virtual tuples in expr eval for scan fetch

Hi,

Thanks for working on this!

On 2019-09-17 23:54:51 -0700, Soumyadeep Chakraborty wrote:

This is to address a TODO I found in the JIT expression evaluation
code (opcode =
EEOP_INNER_FETCHSOME/EEOP_OUTER_FETCHSOME/EEOP_SCAN_FETCHSOME):

* TODO: skip nvalid check if slot is fixed and known to
* be a virtual slot.

I now think this isn't actually the right approach. Instead of doing
this optimization just for JIT compilation, we should do it while
generating the ExprState itself. That way we don't just accelerate JITed
programs, but also normal interpreted execution.

IOW, wherever ExecComputeSlotInfo() is called, we should only actually
push the expression step, when ExecComputeSlotInfo does not determine
that a) the slot is virtual, b) and fixed (i.e. guaranteed to always be
the same type of slot).

Does that make sense?

Greetings,

Andres Freund

#4Andres Freund
andres@anarazel.de
In reply to: Soumyadeep Chakraborty (#2)
Re: Don't codegen deform code for virtual tuples in expr eval for scan fetch

Hi,

On 2019-09-20 22:19:46 -0700, Soumyadeep Chakraborty wrote:

In my previous patch 0001, the resulting opblock consisted of a single
br instruction to it's successor opblock. Such a block represents
unnecessary overhead. Even though such a block would be optimized
away, what if optimization is not performed (perhaps due to
jit_optimize_above_cost)? Perhaps we could be more aggressive. We
could maybe remove the opblock altogether. However, such a solution
is not without complexity.

I'm doubtful this is worth the complexity - and not that we already have
plenty other places with zero length blocks.

WRT jit_optimize_above_cost not triggering, I think we need to replace
the "basic, unoptimized" codegen path with one that does cheap
optimizations only. The reason we don't do the full optimizations all
the time is that they're expensive, but there's enough optimizations
that are pretty cheap. At some point we'll probably need our own
optimization pipeline, but I don't want to maintain that right now
(i.e. if some other people want to help with this aspect, cool)...

See also: /messages/by-id/20190904152438.pv4vdk3ctuvvnxh3@alap3.anarazel.de

Greetings,

Andres Freund

#5Soumyadeep Chakraborty
soumyadeep2007@gmail.com
In reply to: Andres Freund (#3)
1 attachment(s)
Re: Don't codegen deform code for virtual tuples in expr eval for scan fetch

Hello Andres,

Thank you very much for reviewing my patch!

On Wed, Sep 25, 2019 at 1:02 PM Andres Freund <andres@anarazel.de> wrote:

IOW, wherever ExecComputeSlotInfo() is called, we should only actually
push the expression step, when ExecComputeSlotInfo does not determine
that a) the slot is virtual, b) and fixed (i.e. guaranteed to always be
the same type of slot).

Does that make sense?

That is a great suggestion and I totally agree. I have attached a patch
that achieves this.

--
Soumyadeep

Attachments:

v1-0001-Skip-deform-expr-step-emission-for-virtual-tuples.patchapplication/octet-stream; name=v1-0001-Skip-deform-expr-step-emission-for-virtual-tuples.patchDownload
From bf26010357d1dd2a797b672d1de231efca08f571 Mon Sep 17 00:00:00 2001
From: soumyadeep2007 <sochakraborty@pivotal.io>
Date: Wed, 25 Sep 2019 20:47:55 -0700
Subject: [PATCH v1] Skip deform expr step emission for virtual tuples

Virtual tuples need not be deformed. Thus, skip emitting any deform step
if the slot type is fixed and virtual.

Doing the optimization at this stage ensures that we get the benefit in
both JITed and non-JITed code.

Discussion: https://www.postgresql.org/message-id/20190925200210.aled756suzad674w%40alap3.anarazel.de
---
 src/backend/executor/execExpr.c     | 25 ++++++++++++++++++-------
 src/backend/jit/llvm/llvmjit_expr.c |  3 ---
 2 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 39442f8866..92524f2391 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -2277,7 +2277,10 @@ ExecPushExprSlots(ExprState *state, LastAttnumInfo *info)
 	scratch.resvalue = NULL;
 	scratch.resnull = NULL;
 
-	/* Emit steps as needed */
+	/*
+	 * Push deform steps. We can skip pushing steps if the slot is virtual and
+	 * the slot type is fixed as virtual tuples don't need to be deformed.
+	 */
 	if (info->last_inner > 0)
 	{
 		scratch.opcode = EEOP_INNER_FETCHSOME;
@@ -2286,7 +2289,8 @@ ExecPushExprSlots(ExprState *state, LastAttnumInfo *info)
 		scratch.d.fetch.kind = NULL;
 		scratch.d.fetch.known_desc = NULL;
 		ExecComputeSlotInfo(state, &scratch);
-		ExprEvalPushStep(state, &scratch);
+		if (!(scratch.d.fetch.kind == &TTSOpsVirtual && scratch.d.fetch.fixed))
+			ExprEvalPushStep(state, &scratch);
 	}
 	if (info->last_outer > 0)
 	{
@@ -2296,7 +2300,8 @@ ExecPushExprSlots(ExprState *state, LastAttnumInfo *info)
 		scratch.d.fetch.kind = NULL;
 		scratch.d.fetch.known_desc = NULL;
 		ExecComputeSlotInfo(state, &scratch);
-		ExprEvalPushStep(state, &scratch);
+		if (!(scratch.d.fetch.kind == &TTSOpsVirtual && scratch.d.fetch.fixed))
+			ExprEvalPushStep(state, &scratch);
 	}
 	if (info->last_scan > 0)
 	{
@@ -2306,7 +2311,8 @@ ExecPushExprSlots(ExprState *state, LastAttnumInfo *info)
 		scratch.d.fetch.kind = NULL;
 		scratch.d.fetch.known_desc = NULL;
 		ExecComputeSlotInfo(state, &scratch);
-		ExprEvalPushStep(state, &scratch);
+		if (!(scratch.d.fetch.kind == &TTSOpsVirtual && scratch.d.fetch.fixed))
+			ExprEvalPushStep(state, &scratch);
 	}
 }
 
@@ -3352,14 +3358,18 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 	}
 	Assert(maxatt >= 0);
 
-	/* push deform steps */
+	/*
+	 * Push deform steps. We can skip pushing steps if the slot is virtual and
+	 * the slot type is fixed as virtual tuples don't need to be deformed.
+	 */
 	scratch.opcode = EEOP_INNER_FETCHSOME;
 	scratch.d.fetch.last_var = maxatt;
 	scratch.d.fetch.fixed = false;
 	scratch.d.fetch.known_desc = ldesc;
 	scratch.d.fetch.kind = lops;
 	ExecComputeSlotInfo(state, &scratch);
-	ExprEvalPushStep(state, &scratch);
+	if (!(scratch.d.fetch.kind == &TTSOpsVirtual && scratch.d.fetch.fixed))
+		ExprEvalPushStep(state, &scratch);
 
 	scratch.opcode = EEOP_OUTER_FETCHSOME;
 	scratch.d.fetch.last_var = maxatt;
@@ -3367,7 +3377,8 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 	scratch.d.fetch.known_desc = rdesc;
 	scratch.d.fetch.kind = rops;
 	ExecComputeSlotInfo(state, &scratch);
-	ExprEvalPushStep(state, &scratch);
+	if (!(scratch.d.fetch.kind == &TTSOpsVirtual && scratch.d.fetch.fixed))
+		ExprEvalPushStep(state, &scratch);
 
 	/*
 	 * Start comparing at the last field (least significant sort key). That's
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 30133634c7..10a55ce2b1 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -297,9 +297,6 @@ llvm_compile_expr(ExprState *state)
 					/*
 					 * Check if all required attributes are available, or
 					 * whether deforming is required.
-					 *
-					 * TODO: skip nvalid check if slot is fixed and known to
-					 * be a virtual slot.
 					 */
 					v_nvalid =
 						l_load_struct_gep(b, v_slot,
-- 
2.23.0

#6Soumyadeep Chakraborty
soumyadeep2007@gmail.com
In reply to: Andres Freund (#4)
Re: Don't codegen deform code for virtual tuples in expr eval for scan fetch

Hi Andres,

Thank you for your insight and the link offered just the context I needed!

On Wed, Sep 25, 2019 at 1:06 PM Andres Freund <andres@anarazel.de> wrote:

I'm doubtful this is worth the complexity - and not that we already have

plenty other places with zero length blocks.

Agreed.

On Wed, Sep 25, 2019 at 1:06 PM Andres Freund <andres@anarazel.de> wrote:

WRT jit_optimize_above_cost not triggering, I think we need to replace

the "basic, unoptimized" codegen path with one that does cheap
optimizations only. The reason we don't do the full optimizations all
the time is that they're expensive, but there's enough optimizations
that are pretty cheap. At some point we'll probably need our own
optimization pipeline, but I don't want to maintain that right now
(i.e. if some other people want to help with this aspect, cool)...

I would absolutely love to work on this!

I was thinking the same. Perhaps not only replace the default on the
compile path, but also introduce additional levels. These levels could then
be configured at a granularity lower than the -O0, -O1, .. In other words,
perhaps we could have levels with ad-hoc pass combinations. We could then
maybe have costs associated with each level. I think such a framework
would be ideal.

To summarize this into TODOs:
1. Tune default compilation path optimizations.
2. New costs per optimization level.
3. Capacity to define "levels" with ad-hoc pass combinations that are
costing
sensitive.

Is this what you meant by "optimization pipeline"?

--
Soumyadeep

#7Andres Freund
andres@anarazel.de
In reply to: Soumyadeep Chakraborty (#5)
2 attachment(s)
Re: Don't codegen deform code for virtual tuples in expr eval for scan fetch

Hi,

On 2019-09-25 22:11:51 -0700, Soumyadeep Chakraborty wrote:

Thank you very much for reviewing my patch!

On Wed, Sep 25, 2019 at 1:02 PM Andres Freund <andres@anarazel.de> wrote:

IOW, wherever ExecComputeSlotInfo() is called, we should only actually
push the expression step, when ExecComputeSlotInfo does not determine
that a) the slot is virtual, b) and fixed (i.e. guaranteed to always be
the same type of slot).

Does that make sense?

That is a great suggestion and I totally agree. I have attached a patch
that achieves this.

I think as done, this has the slight disadvantage of removing the
fast-path for small interpreted expressions, because that explicitly
matches for seeing the EEOP_*_FETCHSOME ops. Look at execExprInterp.c,
around:
/*
* Select fast-path evalfuncs for very simple expressions. "Starting up"
* the full interpreter is a measurable overhead for these, and these
* patterns occur often enough to be worth optimizing.
*/
if (state->steps_len == 3)
{

So I think we'd have to add a separate fastpath for virtual slots for
that.

What do you think about the attached?

Greetings,

Andres Freund

Attachments:

v1-0001-Reduce-code-duplication-for-ExecJust-Var-operatio.patchtext/x-diff; charset=us-asciiDownload
From 26b963f2989cdeec963f30c5d9b6c0e9b942741f Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 26 Sep 2019 11:14:46 -0700
Subject: [PATCH v1 1/2] Reduce code duplication for ExecJust*Var operations.

This is mainly in preparation for introducing a few additional
fastpath implementations.

Also reorder ExecJust*Var functions to be consistent with the order in
which they're used.

Author: Andres Freund
Discussion: https://postgr.es/m/CAE-ML+9OKSN71+mHtfMD-L24oDp8dGTfaVjDU6U+j+FNAW5kRQ@mail.gmail.com
---
 src/backend/executor/execExprInterp.c | 94 ++++++++++-----------------
 1 file changed, 35 insertions(+), 59 deletions(-)

diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 293bfb61c36..e876160a0e7 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -155,11 +155,11 @@ static void ExecEvalRowNullInt(ExprState *state, ExprEvalStep *op,
 static Datum ExecJustInnerVar(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum ExecJustOuterVar(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum ExecJustScanVar(ExprState *state, ExprContext *econtext, bool *isnull);
-static Datum ExecJustConst(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum ExecJustAssignInnerVar(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum ExecJustAssignOuterVar(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum ExecJustAssignScanVar(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum ExecJustApplyFuncToCase(ExprState *state, ExprContext *econtext, bool *isnull);
+static Datum ExecJustConst(ExprState *state, ExprContext *econtext, bool *isnull);
 
 
 /*
@@ -1966,13 +1966,12 @@ ShutdownTupleDescRef(Datum arg)
  * Fast-path functions, for very simple expressions
  */
 
-/* Simple reference to inner Var */
-static Datum
-ExecJustInnerVar(ExprState *state, ExprContext *econtext, bool *isnull)
+/* implementation of ExecJust(Inner|Outer|Scan)Var */
+static pg_attribute_always_inline Datum
+ExecJustVarImpl(ExprState *state, TupleTableSlot *slot, bool *isnull)
 {
 	ExprEvalStep *op = &state->steps[1];
 	int			attnum = op->d.var.attnum + 1;
-	TupleTableSlot *slot = econtext->ecxt_innertuple;
 
 	CheckOpSlotCompatibility(&state->steps[0], slot);
 
@@ -1984,52 +1983,34 @@ ExecJustInnerVar(ExprState *state, ExprContext *econtext, bool *isnull)
 	return slot_getattr(slot, attnum, isnull);
 }
 
-/* Simple reference to outer Var */
+/* Simple reference to inner Var */
 static Datum
-ExecJustOuterVar(ExprState *state, ExprContext *econtext, bool *isnull)
+ExecJustInnerVar(ExprState *state, ExprContext *econtext, bool *isnull)
 {
-	ExprEvalStep *op = &state->steps[1];
-	int			attnum = op->d.var.attnum + 1;
-	TupleTableSlot *slot = econtext->ecxt_outertuple;
-
-	CheckOpSlotCompatibility(&state->steps[0], slot);
-
-	/* See comments in ExecJustInnerVar */
-	return slot_getattr(slot, attnum, isnull);
+	return ExecJustVarImpl(state, econtext->ecxt_innertuple, isnull);
 }
 
-/* Simple reference to scan Var */
+/* Simple reference to outer Var */
 static Datum
-ExecJustScanVar(ExprState *state, ExprContext *econtext, bool *isnull)
+ExecJustOuterVar(ExprState *state, ExprContext *econtext, bool *isnull)
 {
-	ExprEvalStep *op = &state->steps[1];
-	int			attnum = op->d.var.attnum + 1;
-	TupleTableSlot *slot = econtext->ecxt_scantuple;
-
-	CheckOpSlotCompatibility(&state->steps[0], slot);
-
-	/* See comments in ExecJustInnerVar */
-	return slot_getattr(slot, attnum, isnull);
+	return ExecJustVarImpl(state, econtext->ecxt_outertuple, isnull);
 }
 
-/* Simple Const expression */
+/* Simple reference to scan Var */
 static Datum
-ExecJustConst(ExprState *state, ExprContext *econtext, bool *isnull)
+ExecJustScanVar(ExprState *state, ExprContext *econtext, bool *isnull)
 {
-	ExprEvalStep *op = &state->steps[0];
-
-	*isnull = op->d.constval.isnull;
-	return op->d.constval.value;
+	return ExecJustVarImpl(state, econtext->ecxt_scantuple, isnull);
 }
 
-/* Evaluate inner Var and assign to appropriate column of result tuple */
-static Datum
-ExecJustAssignInnerVar(ExprState *state, ExprContext *econtext, bool *isnull)
+/* implementation of ExecJustAssign(Inner|Outer|Scan)Var */
+static pg_attribute_always_inline Datum
+ExecJustAssignVarImpl(ExprState *state, TupleTableSlot *inslot, bool *isnull)
 {
 	ExprEvalStep *op = &state->steps[1];
 	int			attnum = op->d.assign_var.attnum + 1;
 	int			resultnum = op->d.assign_var.resultnum;
-	TupleTableSlot *inslot = econtext->ecxt_innertuple;
 	TupleTableSlot *outslot = state->resultslot;
 
 	CheckOpSlotCompatibility(&state->steps[0], inslot);
@@ -2047,40 +2028,25 @@ ExecJustAssignInnerVar(ExprState *state, ExprContext *econtext, bool *isnull)
 	return 0;
 }
 
+/* Evaluate inner Var and assign to appropriate column of result tuple */
+static Datum
+ExecJustAssignInnerVar(ExprState *state, ExprContext *econtext, bool *isnull)
+{
+	return ExecJustAssignVarImpl(state, econtext->ecxt_innertuple, isnull);
+}
+
 /* Evaluate outer Var and assign to appropriate column of result tuple */
 static Datum
 ExecJustAssignOuterVar(ExprState *state, ExprContext *econtext, bool *isnull)
 {
-	ExprEvalStep *op = &state->steps[1];
-	int			attnum = op->d.assign_var.attnum + 1;
-	int			resultnum = op->d.assign_var.resultnum;
-	TupleTableSlot *inslot = econtext->ecxt_outertuple;
-	TupleTableSlot *outslot = state->resultslot;
-
-	CheckOpSlotCompatibility(&state->steps[0], inslot);
-
-	/* See comments in ExecJustAssignInnerVar */
-	outslot->tts_values[resultnum] =
-		slot_getattr(inslot, attnum, &outslot->tts_isnull[resultnum]);
-	return 0;
+	return ExecJustAssignVarImpl(state, econtext->ecxt_outertuple, isnull);
 }
 
 /* Evaluate scan Var and assign to appropriate column of result tuple */
 static Datum
 ExecJustAssignScanVar(ExprState *state, ExprContext *econtext, bool *isnull)
 {
-	ExprEvalStep *op = &state->steps[1];
-	int			attnum = op->d.assign_var.attnum + 1;
-	int			resultnum = op->d.assign_var.resultnum;
-	TupleTableSlot *inslot = econtext->ecxt_scantuple;
-	TupleTableSlot *outslot = state->resultslot;
-
-	CheckOpSlotCompatibility(&state->steps[0], inslot);
-
-	/* See comments in ExecJustAssignInnerVar */
-	outslot->tts_values[resultnum] =
-		slot_getattr(inslot, attnum, &outslot->tts_isnull[resultnum]);
-	return 0;
+	return ExecJustAssignVarImpl(state, econtext->ecxt_scantuple, isnull);
 }
 
 /* Evaluate CASE_TESTVAL and apply a strict function to it */
@@ -2120,6 +2086,16 @@ ExecJustApplyFuncToCase(ExprState *state, ExprContext *econtext, bool *isnull)
 	return d;
 }
 
+/* Simple Const expression */
+static Datum
+ExecJustConst(ExprState *state, ExprContext *econtext, bool *isnull)
+{
+	ExprEvalStep *op = &state->steps[0];
+
+	*isnull = op->d.constval.isnull;
+	return op->d.constval.value;
+}
+
 #if defined(EEO_USE_COMPUTED_GOTO)
 /*
  * Comparator used when building address->opcode lookup table for
-- 
2.23.0.162.gf1d4a28250

v1-0002-Don-t-generate-EEOP_-_FETCHSOME-operations-for-sl.patchtext/x-diff; charset=us-asciiDownload
From 09b8c12569671525de5741e77710a905b1bbd1a0 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 26 Sep 2019 11:23:43 -0700
Subject: [PATCH v1 2/2] Don't generate EEOP_*_FETCHSOME operations for slots
 know to be virtual.

That avoids unnecessary work during both interpreted and JIT compiled
expression evaluation.

Author: Soumyadeep Chakraborty, Andres Freund
Discussion: https://postgr.es/m/CAE-ML+9OKSN71+mHtfMD-L24oDp8dGTfaVjDU6U+j+FNAW5kRQ@mail.gmail.com
---
 src/backend/executor/execExpr.c       |  43 ++++++---
 src/backend/executor/execExprInterp.c | 133 +++++++++++++++++++++++++-
 src/backend/jit/llvm/llvmjit_expr.c   |   6 +-
 3 files changed, 160 insertions(+), 22 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 39442f8866f..1ff13d1079a 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -65,7 +65,7 @@ static void ExecInitFunc(ExprEvalStep *scratch, Expr *node, List *args,
 static void ExecInitExprSlots(ExprState *state, Node *node);
 static void ExecPushExprSlots(ExprState *state, LastAttnumInfo *info);
 static bool get_last_attnums_walker(Node *node, LastAttnumInfo *info);
-static void ExecComputeSlotInfo(ExprState *state, ExprEvalStep *op);
+static bool ExecComputeSlotInfo(ExprState *state, ExprEvalStep *op);
 static void ExecInitWholeRowVar(ExprEvalStep *scratch, Var *variable,
 								ExprState *state);
 static void ExecInitSubscriptingRef(ExprEvalStep *scratch,
@@ -2285,8 +2285,8 @@ ExecPushExprSlots(ExprState *state, LastAttnumInfo *info)
 		scratch.d.fetch.fixed = false;
 		scratch.d.fetch.kind = NULL;
 		scratch.d.fetch.known_desc = NULL;
-		ExecComputeSlotInfo(state, &scratch);
-		ExprEvalPushStep(state, &scratch);
+		if (ExecComputeSlotInfo(state, &scratch))
+			ExprEvalPushStep(state, &scratch);
 	}
 	if (info->last_outer > 0)
 	{
@@ -2295,8 +2295,8 @@ ExecPushExprSlots(ExprState *state, LastAttnumInfo *info)
 		scratch.d.fetch.fixed = false;
 		scratch.d.fetch.kind = NULL;
 		scratch.d.fetch.known_desc = NULL;
-		ExecComputeSlotInfo(state, &scratch);
-		ExprEvalPushStep(state, &scratch);
+		if (ExecComputeSlotInfo(state, &scratch))
+			ExprEvalPushStep(state, &scratch);
 	}
 	if (info->last_scan > 0)
 	{
@@ -2305,8 +2305,8 @@ ExecPushExprSlots(ExprState *state, LastAttnumInfo *info)
 		scratch.d.fetch.fixed = false;
 		scratch.d.fetch.kind = NULL;
 		scratch.d.fetch.known_desc = NULL;
-		ExecComputeSlotInfo(state, &scratch);
-		ExprEvalPushStep(state, &scratch);
+		if (ExecComputeSlotInfo(state, &scratch))
+			ExprEvalPushStep(state, &scratch);
 	}
 }
 
@@ -2364,14 +2364,21 @@ get_last_attnums_walker(Node *node, LastAttnumInfo *info)
  * The goal is to determine whether a slot is 'fixed', that is, every
  * evaluation of the expression will have the same type of slot, with an
  * equivalent descriptor.
+ *
+ * Returns true if the the deforming step is required, false otherwise.
  */
-static void
+static bool
 ExecComputeSlotInfo(ExprState *state, ExprEvalStep *op)
 {
 	PlanState  *parent = state->parent;
 	TupleDesc	desc = NULL;
 	const TupleTableSlotOps *tts_ops = NULL;
 	bool		isfixed = false;
+	ExprEvalOp	opcode = op->opcode;
+
+	Assert(opcode == EEOP_INNER_FETCHSOME ||
+		   opcode == EEOP_OUTER_FETCHSOME ||
+		   opcode == EEOP_SCAN_FETCHSOME);
 
 	if (op->d.fetch.known_desc != NULL)
 	{
@@ -2383,7 +2390,7 @@ ExecComputeSlotInfo(ExprState *state, ExprEvalStep *op)
 	{
 		isfixed = false;
 	}
-	else if (op->opcode == EEOP_INNER_FETCHSOME)
+	else if (opcode == EEOP_INNER_FETCHSOME)
 	{
 		PlanState  *is = innerPlanState(parent);
 
@@ -2402,7 +2409,7 @@ ExecComputeSlotInfo(ExprState *state, ExprEvalStep *op)
 			desc = ExecGetResultType(is);
 		}
 	}
-	else if (op->opcode == EEOP_OUTER_FETCHSOME)
+	else if (opcode == EEOP_OUTER_FETCHSOME)
 	{
 		PlanState  *os = outerPlanState(parent);
 
@@ -2421,7 +2428,7 @@ ExecComputeSlotInfo(ExprState *state, ExprEvalStep *op)
 			desc = ExecGetResultType(os);
 		}
 	}
-	else if (op->opcode == EEOP_SCAN_FETCHSOME)
+	else if (opcode == EEOP_SCAN_FETCHSOME)
 	{
 		desc = parent->scandesc;
 
@@ -2444,6 +2451,12 @@ ExecComputeSlotInfo(ExprState *state, ExprEvalStep *op)
 		op->d.fetch.kind = NULL;
 		op->d.fetch.known_desc = NULL;
 	}
+
+	/* if the slot is known to always virtual we never need to deform */
+	if (op->d.fetch.fixed && op->d.fetch.kind == &TTSOpsVirtual)
+		return false;
+
+	return true;
 }
 
 /*
@@ -3358,16 +3371,16 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 	scratch.d.fetch.fixed = false;
 	scratch.d.fetch.known_desc = ldesc;
 	scratch.d.fetch.kind = lops;
-	ExecComputeSlotInfo(state, &scratch);
-	ExprEvalPushStep(state, &scratch);
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
 
 	scratch.opcode = EEOP_OUTER_FETCHSOME;
 	scratch.d.fetch.last_var = maxatt;
 	scratch.d.fetch.fixed = false;
 	scratch.d.fetch.known_desc = rdesc;
 	scratch.d.fetch.kind = rops;
-	ExecComputeSlotInfo(state, &scratch);
-	ExprEvalPushStep(state, &scratch);
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
 
 	/*
 	 * Start comparing at the last field (least significant sort key). That's
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index e876160a0e7..ccea030cd70 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -160,6 +160,12 @@ static Datum ExecJustAssignOuterVar(ExprState *state, ExprContext *econtext, boo
 static Datum ExecJustAssignScanVar(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum ExecJustApplyFuncToCase(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum ExecJustConst(ExprState *state, ExprContext *econtext, bool *isnull);
+static Datum ExecJustInnerVarVirt(ExprState *state, ExprContext *econtext, bool *isnull);
+static Datum ExecJustOuterVarVirt(ExprState *state, ExprContext *econtext, bool *isnull);
+static Datum ExecJustScanVarVirt(ExprState *state, ExprContext *econtext, bool *isnull);
+static Datum ExecJustAssignInnerVarVirt(ExprState *state, ExprContext *econtext, bool *isnull);
+static Datum ExecJustAssignOuterVarVirt(ExprState *state, ExprContext *econtext, bool *isnull);
+static Datum ExecJustAssignScanVarVirt(ExprState *state, ExprContext *econtext, bool *isnull);
 
 
 /*
@@ -255,11 +261,45 @@ ExecReadyInterpretedExpr(ExprState *state)
 			return;
 		}
 	}
-	else if (state->steps_len == 2 &&
-			 state->steps[0].opcode == EEOP_CONST)
+	else if (state->steps_len == 2)
 	{
-		state->evalfunc_private = (void *) ExecJustConst;
-		return;
+		ExprEvalOp	step0 = state->steps[0].opcode;
+
+		if (step0 == EEOP_CONST)
+		{
+			state->evalfunc_private = (void *) ExecJustConst;
+			return;
+		}
+		else if (step0 == EEOP_INNER_VAR)
+		{
+			state->evalfunc_private = (void *) ExecJustInnerVarVirt;
+			return;
+		}
+		else if (step0 == EEOP_OUTER_VAR)
+		{
+			state->evalfunc_private = (void *) ExecJustOuterVarVirt;
+			return;
+		}
+		else if (step0 == EEOP_SCAN_VAR)
+		{
+			state->evalfunc_private = (void *) ExecJustScanVarVirt;
+			return;
+		}
+		else if (step0 == EEOP_ASSIGN_INNER_VAR)
+		{
+			state->evalfunc_private = (void *) ExecJustAssignInnerVarVirt;
+			return;
+		}
+		else if (step0 == EEOP_ASSIGN_OUTER_VAR)
+		{
+			state->evalfunc_private = (void *) ExecJustAssignOuterVarVirt;
+			return;
+		}
+		else if (step0 == EEOP_ASSIGN_SCAN_VAR)
+		{
+			state->evalfunc_private = (void *) ExecJustAssignScanVarVirt;
+			return;
+		}
 	}
 
 #if defined(EEO_USE_COMPUTED_GOTO)
@@ -2096,6 +2136,91 @@ ExecJustConst(ExprState *state, ExprContext *econtext, bool *isnull)
 	return op->d.constval.value;
 }
 
+/* implementation of ExecJust(Inner|Outer|Scan)VarVirt */
+static pg_attribute_always_inline Datum
+ExecJustVarVirtImpl(ExprState *state, TupleTableSlot *slot, bool *isnull)
+{
+	ExprEvalStep *op = &state->steps[0];
+	int			attnum = op->d.var.attnum;
+
+	/*
+	 * As it is guaranteed that a virtual slot is used, there never is a need
+	 * to perform tuple deforming (nor would it be possible). Therefore
+	 * execExpr.c has not emitted a EEOP_*_FETCHSOME step. Verify, as much as
+	 * possible, that that determination was accurate.
+	 */
+	Assert(slot->tts_ops == &TTSOpsVirtual);
+	Assert(TTS_FIXED(slot));
+	Assert(attnum >= 0 && attnum < slot->tts_nvalid);
+
+	*isnull = slot->tts_isnull[attnum];
+
+	return slot->tts_values[attnum];
+}
+
+/* Like ExecJustInnerVar, optimized for virtual slots */
+static Datum
+ExecJustInnerVarVirt(ExprState *state, ExprContext *econtext, bool *isnull)
+{
+	return ExecJustVarVirtImpl(state, econtext->ecxt_innertuple, isnull);
+}
+
+/* Like ExecJustOuterVar, optimized for virtual slots */
+static Datum
+ExecJustOuterVarVirt(ExprState *state, ExprContext *econtext, bool *isnull)
+{
+	return ExecJustVarVirtImpl(state, econtext->ecxt_outertuple, isnull);
+}
+
+/* Like ExecJustScanVar, optimized for virtual slots */
+static Datum
+ExecJustScanVarVirt(ExprState *state, ExprContext *econtext, bool *isnull)
+{
+	return ExecJustVarVirtImpl(state, econtext->ecxt_scantuple, isnull);
+}
+
+/* implementation of ExecJustAssign(Inner|Outer|Scan)VarVirt */
+static pg_attribute_always_inline Datum
+ExecJustAssignVarVirtImpl(ExprState *state, TupleTableSlot *inslot, bool *isnull)
+{
+	ExprEvalStep *op = &state->steps[0];
+	int			attnum = op->d.assign_var.attnum;
+	int			resultnum = op->d.assign_var.resultnum;
+	TupleTableSlot *outslot = state->resultslot;
+
+	/* see ExecJustVarVirtImpl for comments */
+
+	Assert(inslot->tts_ops == &TTSOpsVirtual);
+	Assert(TTS_FIXED(inslot));
+	Assert(attnum >= 0 && attnum < inslot->tts_nvalid);
+
+	outslot->tts_values[resultnum] = inslot->tts_values[attnum];
+	outslot->tts_isnull[resultnum] = inslot->tts_isnull[attnum];
+
+	return 0;
+}
+
+/* Like ExecJustAssignInnerVar, optimized for virtual slots */
+static Datum
+ExecJustAssignInnerVarVirt(ExprState *state, ExprContext *econtext, bool *isnull)
+{
+	return ExecJustAssignVarVirtImpl(state, econtext->ecxt_innertuple, isnull);
+}
+
+/* Like ExecJustAssignOuterVar, optimized for virtual slots */
+static Datum
+ExecJustAssignOuterVarVirt(ExprState *state, ExprContext *econtext, bool *isnull)
+{
+	return ExecJustAssignVarVirtImpl(state, econtext->ecxt_outertuple, isnull);
+}
+
+/* Like ExecJustAssignScanVar, optimized for virtual slots */
+static Datum
+ExecJustAssignScanVarVirt(ExprState *state, ExprContext *econtext, bool *isnull)
+{
+	return ExecJustAssignVarVirtImpl(state, econtext->ecxt_scantuple, isnull);
+}
+
 #if defined(EEO_USE_COMPUTED_GOTO)
 /*
  * Comparator used when building address->opcode lookup table for
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 30133634c70..d09324637b9 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -287,6 +287,9 @@ llvm_compile_expr(ExprState *state)
 					if (op->d.fetch.fixed)
 						tts_ops = op->d.fetch.kind;
 
+					/* step should not have been generated */
+					Assert(tts_ops != &TTSOpsVirtual);
+
 					if (opcode == EEOP_INNER_FETCHSOME)
 						v_slot = v_innerslot;
 					else if (opcode == EEOP_OUTER_FETCHSOME)
@@ -297,9 +300,6 @@ llvm_compile_expr(ExprState *state)
 					/*
 					 * Check if all required attributes are available, or
 					 * whether deforming is required.
-					 *
-					 * TODO: skip nvalid check if slot is fixed and known to
-					 * be a virtual slot.
 					 */
 					v_nvalid =
 						l_load_struct_gep(b, v_slot,
-- 
2.23.0.162.gf1d4a28250

#8Soumyadeep Chakraborty
soumyadeep2007@gmail.com
In reply to: Andres Freund (#7)
3 attachment(s)
Re: Don't codegen deform code for virtual tuples in expr eval for scan fetch

Hey,

I completely agree, that was an important consideration.

I had some purely cosmetic suggestions:
1. Rename ExecComputeSlotInfo to eliminate the need for the asserts.
2. Extract return value to a bool variable for slightly better
readability.
3. Taking the opportunity to use TTS_IS_VIRTUAL.

v2 of patch set attached. The first two patches are unchanged, the cosmetic
changes are part of v2-0003-Some-minor-cosmetic-changes.patch.

--
Soumyadeep

Attachments:

v2-0001-Reduce-code-duplication-for-ExecJust-Var-operatio.patchapplication/octet-stream; name=v2-0001-Reduce-code-duplication-for-ExecJust-Var-operatio.patchDownload
From 20d2b28ba45d13c78d7f48f5d5939fde406e213b Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 26 Sep 2019 11:14:46 -0700
Subject: [PATCH v2 1/3] Reduce code duplication for ExecJust*Var operations.

This is mainly in preparation for introducing a few additional
fastpath implementations.

Also reorder ExecJust*Var functions to be consistent with the order in
which they're used.

Author: Andres Freund
Discussion: https://postgr.es/m/CAE-ML+9OKSN71+mHtfMD-L24oDp8dGTfaVjDU6U+j+FNAW5kRQ@mail.gmail.com
---
 src/backend/executor/execExprInterp.c | 94 ++++++++++-----------------
 1 file changed, 35 insertions(+), 59 deletions(-)

diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 293bfb61c3..e876160a0e 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -155,11 +155,11 @@ static void ExecEvalRowNullInt(ExprState *state, ExprEvalStep *op,
 static Datum ExecJustInnerVar(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum ExecJustOuterVar(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum ExecJustScanVar(ExprState *state, ExprContext *econtext, bool *isnull);
-static Datum ExecJustConst(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum ExecJustAssignInnerVar(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum ExecJustAssignOuterVar(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum ExecJustAssignScanVar(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum ExecJustApplyFuncToCase(ExprState *state, ExprContext *econtext, bool *isnull);
+static Datum ExecJustConst(ExprState *state, ExprContext *econtext, bool *isnull);
 
 
 /*
@@ -1966,13 +1966,12 @@ ShutdownTupleDescRef(Datum arg)
  * Fast-path functions, for very simple expressions
  */
 
-/* Simple reference to inner Var */
-static Datum
-ExecJustInnerVar(ExprState *state, ExprContext *econtext, bool *isnull)
+/* implementation of ExecJust(Inner|Outer|Scan)Var */
+static pg_attribute_always_inline Datum
+ExecJustVarImpl(ExprState *state, TupleTableSlot *slot, bool *isnull)
 {
 	ExprEvalStep *op = &state->steps[1];
 	int			attnum = op->d.var.attnum + 1;
-	TupleTableSlot *slot = econtext->ecxt_innertuple;
 
 	CheckOpSlotCompatibility(&state->steps[0], slot);
 
@@ -1984,52 +1983,34 @@ ExecJustInnerVar(ExprState *state, ExprContext *econtext, bool *isnull)
 	return slot_getattr(slot, attnum, isnull);
 }
 
-/* Simple reference to outer Var */
+/* Simple reference to inner Var */
 static Datum
-ExecJustOuterVar(ExprState *state, ExprContext *econtext, bool *isnull)
+ExecJustInnerVar(ExprState *state, ExprContext *econtext, bool *isnull)
 {
-	ExprEvalStep *op = &state->steps[1];
-	int			attnum = op->d.var.attnum + 1;
-	TupleTableSlot *slot = econtext->ecxt_outertuple;
-
-	CheckOpSlotCompatibility(&state->steps[0], slot);
-
-	/* See comments in ExecJustInnerVar */
-	return slot_getattr(slot, attnum, isnull);
+	return ExecJustVarImpl(state, econtext->ecxt_innertuple, isnull);
 }
 
-/* Simple reference to scan Var */
+/* Simple reference to outer Var */
 static Datum
-ExecJustScanVar(ExprState *state, ExprContext *econtext, bool *isnull)
+ExecJustOuterVar(ExprState *state, ExprContext *econtext, bool *isnull)
 {
-	ExprEvalStep *op = &state->steps[1];
-	int			attnum = op->d.var.attnum + 1;
-	TupleTableSlot *slot = econtext->ecxt_scantuple;
-
-	CheckOpSlotCompatibility(&state->steps[0], slot);
-
-	/* See comments in ExecJustInnerVar */
-	return slot_getattr(slot, attnum, isnull);
+	return ExecJustVarImpl(state, econtext->ecxt_outertuple, isnull);
 }
 
-/* Simple Const expression */
+/* Simple reference to scan Var */
 static Datum
-ExecJustConst(ExprState *state, ExprContext *econtext, bool *isnull)
+ExecJustScanVar(ExprState *state, ExprContext *econtext, bool *isnull)
 {
-	ExprEvalStep *op = &state->steps[0];
-
-	*isnull = op->d.constval.isnull;
-	return op->d.constval.value;
+	return ExecJustVarImpl(state, econtext->ecxt_scantuple, isnull);
 }
 
-/* Evaluate inner Var and assign to appropriate column of result tuple */
-static Datum
-ExecJustAssignInnerVar(ExprState *state, ExprContext *econtext, bool *isnull)
+/* implementation of ExecJustAssign(Inner|Outer|Scan)Var */
+static pg_attribute_always_inline Datum
+ExecJustAssignVarImpl(ExprState *state, TupleTableSlot *inslot, bool *isnull)
 {
 	ExprEvalStep *op = &state->steps[1];
 	int			attnum = op->d.assign_var.attnum + 1;
 	int			resultnum = op->d.assign_var.resultnum;
-	TupleTableSlot *inslot = econtext->ecxt_innertuple;
 	TupleTableSlot *outslot = state->resultslot;
 
 	CheckOpSlotCompatibility(&state->steps[0], inslot);
@@ -2047,40 +2028,25 @@ ExecJustAssignInnerVar(ExprState *state, ExprContext *econtext, bool *isnull)
 	return 0;
 }
 
+/* Evaluate inner Var and assign to appropriate column of result tuple */
+static Datum
+ExecJustAssignInnerVar(ExprState *state, ExprContext *econtext, bool *isnull)
+{
+	return ExecJustAssignVarImpl(state, econtext->ecxt_innertuple, isnull);
+}
+
 /* Evaluate outer Var and assign to appropriate column of result tuple */
 static Datum
 ExecJustAssignOuterVar(ExprState *state, ExprContext *econtext, bool *isnull)
 {
-	ExprEvalStep *op = &state->steps[1];
-	int			attnum = op->d.assign_var.attnum + 1;
-	int			resultnum = op->d.assign_var.resultnum;
-	TupleTableSlot *inslot = econtext->ecxt_outertuple;
-	TupleTableSlot *outslot = state->resultslot;
-
-	CheckOpSlotCompatibility(&state->steps[0], inslot);
-
-	/* See comments in ExecJustAssignInnerVar */
-	outslot->tts_values[resultnum] =
-		slot_getattr(inslot, attnum, &outslot->tts_isnull[resultnum]);
-	return 0;
+	return ExecJustAssignVarImpl(state, econtext->ecxt_outertuple, isnull);
 }
 
 /* Evaluate scan Var and assign to appropriate column of result tuple */
 static Datum
 ExecJustAssignScanVar(ExprState *state, ExprContext *econtext, bool *isnull)
 {
-	ExprEvalStep *op = &state->steps[1];
-	int			attnum = op->d.assign_var.attnum + 1;
-	int			resultnum = op->d.assign_var.resultnum;
-	TupleTableSlot *inslot = econtext->ecxt_scantuple;
-	TupleTableSlot *outslot = state->resultslot;
-
-	CheckOpSlotCompatibility(&state->steps[0], inslot);
-
-	/* See comments in ExecJustAssignInnerVar */
-	outslot->tts_values[resultnum] =
-		slot_getattr(inslot, attnum, &outslot->tts_isnull[resultnum]);
-	return 0;
+	return ExecJustAssignVarImpl(state, econtext->ecxt_scantuple, isnull);
 }
 
 /* Evaluate CASE_TESTVAL and apply a strict function to it */
@@ -2120,6 +2086,16 @@ ExecJustApplyFuncToCase(ExprState *state, ExprContext *econtext, bool *isnull)
 	return d;
 }
 
+/* Simple Const expression */
+static Datum
+ExecJustConst(ExprState *state, ExprContext *econtext, bool *isnull)
+{
+	ExprEvalStep *op = &state->steps[0];
+
+	*isnull = op->d.constval.isnull;
+	return op->d.constval.value;
+}
+
 #if defined(EEO_USE_COMPUTED_GOTO)
 /*
  * Comparator used when building address->opcode lookup table for
-- 
2.23.0

v2-0002-Don-t-generate-EEOP_-_FETCHSOME-operations-for-sl.patchapplication/octet-stream; name=v2-0002-Don-t-generate-EEOP_-_FETCHSOME-operations-for-sl.patchDownload
From 326589c370fcf98f306084f2ead510d1083d967d Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 26 Sep 2019 11:23:43 -0700
Subject: [PATCH v2 2/3] Don't generate EEOP_*_FETCHSOME operations for slots
 know to be virtual.

That avoids unnecessary work during both interpreted and JIT compiled
expression evaluation.

Author: Soumyadeep Chakraborty, Andres Freund
Discussion: https://postgr.es/m/CAE-ML+9OKSN71+mHtfMD-L24oDp8dGTfaVjDU6U+j+FNAW5kRQ@mail.gmail.com
---
 src/backend/executor/execExpr.c       |  43 ++++++---
 src/backend/executor/execExprInterp.c | 133 +++++++++++++++++++++++++-
 src/backend/jit/llvm/llvmjit_expr.c   |   6 +-
 3 files changed, 160 insertions(+), 22 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 39442f8866..1ff13d1079 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -65,7 +65,7 @@ static void ExecInitFunc(ExprEvalStep *scratch, Expr *node, List *args,
 static void ExecInitExprSlots(ExprState *state, Node *node);
 static void ExecPushExprSlots(ExprState *state, LastAttnumInfo *info);
 static bool get_last_attnums_walker(Node *node, LastAttnumInfo *info);
-static void ExecComputeSlotInfo(ExprState *state, ExprEvalStep *op);
+static bool ExecComputeSlotInfo(ExprState *state, ExprEvalStep *op);
 static void ExecInitWholeRowVar(ExprEvalStep *scratch, Var *variable,
 								ExprState *state);
 static void ExecInitSubscriptingRef(ExprEvalStep *scratch,
@@ -2285,8 +2285,8 @@ ExecPushExprSlots(ExprState *state, LastAttnumInfo *info)
 		scratch.d.fetch.fixed = false;
 		scratch.d.fetch.kind = NULL;
 		scratch.d.fetch.known_desc = NULL;
-		ExecComputeSlotInfo(state, &scratch);
-		ExprEvalPushStep(state, &scratch);
+		if (ExecComputeSlotInfo(state, &scratch))
+			ExprEvalPushStep(state, &scratch);
 	}
 	if (info->last_outer > 0)
 	{
@@ -2295,8 +2295,8 @@ ExecPushExprSlots(ExprState *state, LastAttnumInfo *info)
 		scratch.d.fetch.fixed = false;
 		scratch.d.fetch.kind = NULL;
 		scratch.d.fetch.known_desc = NULL;
-		ExecComputeSlotInfo(state, &scratch);
-		ExprEvalPushStep(state, &scratch);
+		if (ExecComputeSlotInfo(state, &scratch))
+			ExprEvalPushStep(state, &scratch);
 	}
 	if (info->last_scan > 0)
 	{
@@ -2305,8 +2305,8 @@ ExecPushExprSlots(ExprState *state, LastAttnumInfo *info)
 		scratch.d.fetch.fixed = false;
 		scratch.d.fetch.kind = NULL;
 		scratch.d.fetch.known_desc = NULL;
-		ExecComputeSlotInfo(state, &scratch);
-		ExprEvalPushStep(state, &scratch);
+		if (ExecComputeSlotInfo(state, &scratch))
+			ExprEvalPushStep(state, &scratch);
 	}
 }
 
@@ -2364,14 +2364,21 @@ get_last_attnums_walker(Node *node, LastAttnumInfo *info)
  * The goal is to determine whether a slot is 'fixed', that is, every
  * evaluation of the expression will have the same type of slot, with an
  * equivalent descriptor.
+ *
+ * Returns true if the the deforming step is required, false otherwise.
  */
-static void
+static bool
 ExecComputeSlotInfo(ExprState *state, ExprEvalStep *op)
 {
 	PlanState  *parent = state->parent;
 	TupleDesc	desc = NULL;
 	const TupleTableSlotOps *tts_ops = NULL;
 	bool		isfixed = false;
+	ExprEvalOp	opcode = op->opcode;
+
+	Assert(opcode == EEOP_INNER_FETCHSOME ||
+		   opcode == EEOP_OUTER_FETCHSOME ||
+		   opcode == EEOP_SCAN_FETCHSOME);
 
 	if (op->d.fetch.known_desc != NULL)
 	{
@@ -2383,7 +2390,7 @@ ExecComputeSlotInfo(ExprState *state, ExprEvalStep *op)
 	{
 		isfixed = false;
 	}
-	else if (op->opcode == EEOP_INNER_FETCHSOME)
+	else if (opcode == EEOP_INNER_FETCHSOME)
 	{
 		PlanState  *is = innerPlanState(parent);
 
@@ -2402,7 +2409,7 @@ ExecComputeSlotInfo(ExprState *state, ExprEvalStep *op)
 			desc = ExecGetResultType(is);
 		}
 	}
-	else if (op->opcode == EEOP_OUTER_FETCHSOME)
+	else if (opcode == EEOP_OUTER_FETCHSOME)
 	{
 		PlanState  *os = outerPlanState(parent);
 
@@ -2421,7 +2428,7 @@ ExecComputeSlotInfo(ExprState *state, ExprEvalStep *op)
 			desc = ExecGetResultType(os);
 		}
 	}
-	else if (op->opcode == EEOP_SCAN_FETCHSOME)
+	else if (opcode == EEOP_SCAN_FETCHSOME)
 	{
 		desc = parent->scandesc;
 
@@ -2444,6 +2451,12 @@ ExecComputeSlotInfo(ExprState *state, ExprEvalStep *op)
 		op->d.fetch.kind = NULL;
 		op->d.fetch.known_desc = NULL;
 	}
+
+	/* if the slot is known to always virtual we never need to deform */
+	if (op->d.fetch.fixed && op->d.fetch.kind == &TTSOpsVirtual)
+		return false;
+
+	return true;
 }
 
 /*
@@ -3358,16 +3371,16 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 	scratch.d.fetch.fixed = false;
 	scratch.d.fetch.known_desc = ldesc;
 	scratch.d.fetch.kind = lops;
-	ExecComputeSlotInfo(state, &scratch);
-	ExprEvalPushStep(state, &scratch);
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
 
 	scratch.opcode = EEOP_OUTER_FETCHSOME;
 	scratch.d.fetch.last_var = maxatt;
 	scratch.d.fetch.fixed = false;
 	scratch.d.fetch.known_desc = rdesc;
 	scratch.d.fetch.kind = rops;
-	ExecComputeSlotInfo(state, &scratch);
-	ExprEvalPushStep(state, &scratch);
+	if (ExecComputeSlotInfo(state, &scratch))
+		ExprEvalPushStep(state, &scratch);
 
 	/*
 	 * Start comparing at the last field (least significant sort key). That's
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index e876160a0e..ccea030cd7 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -160,6 +160,12 @@ static Datum ExecJustAssignOuterVar(ExprState *state, ExprContext *econtext, boo
 static Datum ExecJustAssignScanVar(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum ExecJustApplyFuncToCase(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum ExecJustConst(ExprState *state, ExprContext *econtext, bool *isnull);
+static Datum ExecJustInnerVarVirt(ExprState *state, ExprContext *econtext, bool *isnull);
+static Datum ExecJustOuterVarVirt(ExprState *state, ExprContext *econtext, bool *isnull);
+static Datum ExecJustScanVarVirt(ExprState *state, ExprContext *econtext, bool *isnull);
+static Datum ExecJustAssignInnerVarVirt(ExprState *state, ExprContext *econtext, bool *isnull);
+static Datum ExecJustAssignOuterVarVirt(ExprState *state, ExprContext *econtext, bool *isnull);
+static Datum ExecJustAssignScanVarVirt(ExprState *state, ExprContext *econtext, bool *isnull);
 
 
 /*
@@ -255,11 +261,45 @@ ExecReadyInterpretedExpr(ExprState *state)
 			return;
 		}
 	}
-	else if (state->steps_len == 2 &&
-			 state->steps[0].opcode == EEOP_CONST)
+	else if (state->steps_len == 2)
 	{
-		state->evalfunc_private = (void *) ExecJustConst;
-		return;
+		ExprEvalOp	step0 = state->steps[0].opcode;
+
+		if (step0 == EEOP_CONST)
+		{
+			state->evalfunc_private = (void *) ExecJustConst;
+			return;
+		}
+		else if (step0 == EEOP_INNER_VAR)
+		{
+			state->evalfunc_private = (void *) ExecJustInnerVarVirt;
+			return;
+		}
+		else if (step0 == EEOP_OUTER_VAR)
+		{
+			state->evalfunc_private = (void *) ExecJustOuterVarVirt;
+			return;
+		}
+		else if (step0 == EEOP_SCAN_VAR)
+		{
+			state->evalfunc_private = (void *) ExecJustScanVarVirt;
+			return;
+		}
+		else if (step0 == EEOP_ASSIGN_INNER_VAR)
+		{
+			state->evalfunc_private = (void *) ExecJustAssignInnerVarVirt;
+			return;
+		}
+		else if (step0 == EEOP_ASSIGN_OUTER_VAR)
+		{
+			state->evalfunc_private = (void *) ExecJustAssignOuterVarVirt;
+			return;
+		}
+		else if (step0 == EEOP_ASSIGN_SCAN_VAR)
+		{
+			state->evalfunc_private = (void *) ExecJustAssignScanVarVirt;
+			return;
+		}
 	}
 
 #if defined(EEO_USE_COMPUTED_GOTO)
@@ -2096,6 +2136,91 @@ ExecJustConst(ExprState *state, ExprContext *econtext, bool *isnull)
 	return op->d.constval.value;
 }
 
+/* implementation of ExecJust(Inner|Outer|Scan)VarVirt */
+static pg_attribute_always_inline Datum
+ExecJustVarVirtImpl(ExprState *state, TupleTableSlot *slot, bool *isnull)
+{
+	ExprEvalStep *op = &state->steps[0];
+	int			attnum = op->d.var.attnum;
+
+	/*
+	 * As it is guaranteed that a virtual slot is used, there never is a need
+	 * to perform tuple deforming (nor would it be possible). Therefore
+	 * execExpr.c has not emitted a EEOP_*_FETCHSOME step. Verify, as much as
+	 * possible, that that determination was accurate.
+	 */
+	Assert(slot->tts_ops == &TTSOpsVirtual);
+	Assert(TTS_FIXED(slot));
+	Assert(attnum >= 0 && attnum < slot->tts_nvalid);
+
+	*isnull = slot->tts_isnull[attnum];
+
+	return slot->tts_values[attnum];
+}
+
+/* Like ExecJustInnerVar, optimized for virtual slots */
+static Datum
+ExecJustInnerVarVirt(ExprState *state, ExprContext *econtext, bool *isnull)
+{
+	return ExecJustVarVirtImpl(state, econtext->ecxt_innertuple, isnull);
+}
+
+/* Like ExecJustOuterVar, optimized for virtual slots */
+static Datum
+ExecJustOuterVarVirt(ExprState *state, ExprContext *econtext, bool *isnull)
+{
+	return ExecJustVarVirtImpl(state, econtext->ecxt_outertuple, isnull);
+}
+
+/* Like ExecJustScanVar, optimized for virtual slots */
+static Datum
+ExecJustScanVarVirt(ExprState *state, ExprContext *econtext, bool *isnull)
+{
+	return ExecJustVarVirtImpl(state, econtext->ecxt_scantuple, isnull);
+}
+
+/* implementation of ExecJustAssign(Inner|Outer|Scan)VarVirt */
+static pg_attribute_always_inline Datum
+ExecJustAssignVarVirtImpl(ExprState *state, TupleTableSlot *inslot, bool *isnull)
+{
+	ExprEvalStep *op = &state->steps[0];
+	int			attnum = op->d.assign_var.attnum;
+	int			resultnum = op->d.assign_var.resultnum;
+	TupleTableSlot *outslot = state->resultslot;
+
+	/* see ExecJustVarVirtImpl for comments */
+
+	Assert(inslot->tts_ops == &TTSOpsVirtual);
+	Assert(TTS_FIXED(inslot));
+	Assert(attnum >= 0 && attnum < inslot->tts_nvalid);
+
+	outslot->tts_values[resultnum] = inslot->tts_values[attnum];
+	outslot->tts_isnull[resultnum] = inslot->tts_isnull[attnum];
+
+	return 0;
+}
+
+/* Like ExecJustAssignInnerVar, optimized for virtual slots */
+static Datum
+ExecJustAssignInnerVarVirt(ExprState *state, ExprContext *econtext, bool *isnull)
+{
+	return ExecJustAssignVarVirtImpl(state, econtext->ecxt_innertuple, isnull);
+}
+
+/* Like ExecJustAssignOuterVar, optimized for virtual slots */
+static Datum
+ExecJustAssignOuterVarVirt(ExprState *state, ExprContext *econtext, bool *isnull)
+{
+	return ExecJustAssignVarVirtImpl(state, econtext->ecxt_outertuple, isnull);
+}
+
+/* Like ExecJustAssignScanVar, optimized for virtual slots */
+static Datum
+ExecJustAssignScanVarVirt(ExprState *state, ExprContext *econtext, bool *isnull)
+{
+	return ExecJustAssignVarVirtImpl(state, econtext->ecxt_scantuple, isnull);
+}
+
 #if defined(EEO_USE_COMPUTED_GOTO)
 /*
  * Comparator used when building address->opcode lookup table for
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 30133634c7..d09324637b 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -287,6 +287,9 @@ llvm_compile_expr(ExprState *state)
 					if (op->d.fetch.fixed)
 						tts_ops = op->d.fetch.kind;
 
+					/* step should not have been generated */
+					Assert(tts_ops != &TTSOpsVirtual);
+
 					if (opcode == EEOP_INNER_FETCHSOME)
 						v_slot = v_innerslot;
 					else if (opcode == EEOP_OUTER_FETCHSOME)
@@ -297,9 +300,6 @@ llvm_compile_expr(ExprState *state)
 					/*
 					 * Check if all required attributes are available, or
 					 * whether deforming is required.
-					 *
-					 * TODO: skip nvalid check if slot is fixed and known to
-					 * be a virtual slot.
 					 */
 					v_nvalid =
 						l_load_struct_gep(b, v_slot,
-- 
2.23.0

v2-0003-Some-minor-cosmetic-changes.patchapplication/octet-stream; name=v2-0003-Some-minor-cosmetic-changes.patchDownload
From cd20b02064e6e41c8bc24043c41dbb6efa57dbf2 Mon Sep 17 00:00:00 2001
From: soumyadeep2007 <sochakraborty@pivotal.io>
Date: Fri, 27 Sep 2019 22:46:14 -0700
Subject: [PATCH v2 3/3] Some minor cosmetic changes

1. Renamed ExecComputeSlotInfo to eliminate the need for the assert.
2. Extracted return value to a bool variable for slightly better
readability.
3. Taking the opportunity to use TTS_IS_VIRTUAL.
---
 src/backend/executor/execExpr.c       | 25 ++++++++++++++-----------
 src/backend/executor/execExprInterp.c |  4 ++--
 2 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 1ff13d1079..aef4ac657e 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -65,7 +65,7 @@ static void ExecInitFunc(ExprEvalStep *scratch, Expr *node, List *args,
 static void ExecInitExprSlots(ExprState *state, Node *node);
 static void ExecPushExprSlots(ExprState *state, LastAttnumInfo *info);
 static bool get_last_attnums_walker(Node *node, LastAttnumInfo *info);
-static bool ExecComputeSlotInfo(ExprState *state, ExprEvalStep *op);
+static bool ExecComputeFetchOpInfo(ExprState *state, ExprEvalStep *op);
 static void ExecInitWholeRowVar(ExprEvalStep *scratch, Var *variable,
 								ExprState *state);
 static void ExecInitSubscriptingRef(ExprEvalStep *scratch,
@@ -2273,6 +2273,7 @@ static void
 ExecPushExprSlots(ExprState *state, LastAttnumInfo *info)
 {
 	ExprEvalStep scratch = {0};
+	bool		 deform_step_required;
 
 	scratch.resvalue = NULL;
 	scratch.resnull = NULL;
@@ -2285,7 +2286,8 @@ ExecPushExprSlots(ExprState *state, LastAttnumInfo *info)
 		scratch.d.fetch.fixed = false;
 		scratch.d.fetch.kind = NULL;
 		scratch.d.fetch.known_desc = NULL;
-		if (ExecComputeSlotInfo(state, &scratch))
+		deform_step_required = ExecComputeFetchOpInfo(state, &scratch);
+		if (deform_step_required)
 			ExprEvalPushStep(state, &scratch);
 	}
 	if (info->last_outer > 0)
@@ -2295,7 +2297,8 @@ ExecPushExprSlots(ExprState *state, LastAttnumInfo *info)
 		scratch.d.fetch.fixed = false;
 		scratch.d.fetch.kind = NULL;
 		scratch.d.fetch.known_desc = NULL;
-		if (ExecComputeSlotInfo(state, &scratch))
+		deform_step_required = ExecComputeFetchOpInfo(state, &scratch);
+		if (deform_step_required)
 			ExprEvalPushStep(state, &scratch);
 	}
 	if (info->last_scan > 0)
@@ -2305,7 +2308,8 @@ ExecPushExprSlots(ExprState *state, LastAttnumInfo *info)
 		scratch.d.fetch.fixed = false;
 		scratch.d.fetch.kind = NULL;
 		scratch.d.fetch.known_desc = NULL;
-		if (ExecComputeSlotInfo(state, &scratch))
+		deform_step_required = ExecComputeFetchOpInfo(state, &scratch);
+		if (deform_step_required)
 			ExprEvalPushStep(state, &scratch);
 	}
 }
@@ -2368,7 +2372,7 @@ get_last_attnums_walker(Node *node, LastAttnumInfo *info)
  * Returns true if the the deforming step is required, false otherwise.
  */
 static bool
-ExecComputeSlotInfo(ExprState *state, ExprEvalStep *op)
+ExecComputeFetchOpInfo(ExprState *state, ExprEvalStep *op)
 {
 	PlanState  *parent = state->parent;
 	TupleDesc	desc = NULL;
@@ -2376,10 +2380,6 @@ ExecComputeSlotInfo(ExprState *state, ExprEvalStep *op)
 	bool		isfixed = false;
 	ExprEvalOp	opcode = op->opcode;
 
-	Assert(opcode == EEOP_INNER_FETCHSOME ||
-		   opcode == EEOP_OUTER_FETCHSOME ||
-		   opcode == EEOP_SCAN_FETCHSOME);
-
 	if (op->d.fetch.known_desc != NULL)
 	{
 		desc = op->d.fetch.known_desc;
@@ -3340,6 +3340,7 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 	int			maxatt = -1;
 	List	   *adjust_jumps = NIL;
 	ListCell   *lc;
+	bool		deform_step_required;
 
 	/*
 	 * When no columns are actually compared, the result's always true. See
@@ -3371,7 +3372,8 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 	scratch.d.fetch.fixed = false;
 	scratch.d.fetch.known_desc = ldesc;
 	scratch.d.fetch.kind = lops;
-	if (ExecComputeSlotInfo(state, &scratch))
+	deform_step_required = ExecComputeFetchOpInfo(state, &scratch);
+	if (deform_step_required)
 		ExprEvalPushStep(state, &scratch);
 
 	scratch.opcode = EEOP_OUTER_FETCHSOME;
@@ -3379,7 +3381,8 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 	scratch.d.fetch.fixed = false;
 	scratch.d.fetch.known_desc = rdesc;
 	scratch.d.fetch.kind = rops;
-	if (ExecComputeSlotInfo(state, &scratch))
+	deform_step_required = ExecComputeFetchOpInfo(state, &scratch);
+	if (deform_step_required)
 		ExprEvalPushStep(state, &scratch);
 
 	/*
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index ccea030cd7..3c1288c5b9 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -2149,7 +2149,7 @@ ExecJustVarVirtImpl(ExprState *state, TupleTableSlot *slot, bool *isnull)
 	 * execExpr.c has not emitted a EEOP_*_FETCHSOME step. Verify, as much as
 	 * possible, that that determination was accurate.
 	 */
-	Assert(slot->tts_ops == &TTSOpsVirtual);
+	Assert(TTS_IS_VIRTUAL(slot));
 	Assert(TTS_FIXED(slot));
 	Assert(attnum >= 0 && attnum < slot->tts_nvalid);
 
@@ -2190,7 +2190,7 @@ ExecJustAssignVarVirtImpl(ExprState *state, TupleTableSlot *inslot, bool *isnull
 
 	/* see ExecJustVarVirtImpl for comments */
 
-	Assert(inslot->tts_ops == &TTSOpsVirtual);
+	Assert(TTS_IS_VIRTUAL(inslot));
 	Assert(TTS_FIXED(inslot));
 	Assert(attnum >= 0 && attnum < inslot->tts_nvalid);
 
-- 
2.23.0

#9Andres Freund
andres@anarazel.de
In reply to: Soumyadeep Chakraborty (#8)
Re: Don't codegen deform code for virtual tuples in expr eval for scan fetch

Hi,

On 2019-09-27 23:01:05 -0700, Soumyadeep Chakraborty wrote:

I completely agree, that was an important consideration.

I had some purely cosmetic suggestions:
1. Rename ExecComputeSlotInfo to eliminate the need for the asserts.

How does renaming it do so? I feel like the asserts are a good idea
independent of anything else?

2. Extract return value to a bool variable for slightly better
readability.

To me that seems clearly worse. The variable doesn't add anything, but
needing to track more state.

3. Taking the opportunity to use TTS_IS_VIRTUAL.

Good point.

- Andres

#10Soumyadeep Chakraborty
soumyadeep2007@gmail.com
In reply to: Andres Freund (#9)
Re: Don't codegen deform code for virtual tuples in expr eval for scan fetch

Hi Andres,

I don't feel very strongly about the changes I proposed.

I completely agree, that was an important consideration.

I had some purely cosmetic suggestions:
1. Rename ExecComputeSlotInfo to eliminate the need for the asserts.

How does renaming it do so? I feel like the asserts are a good idea
independent of anything else?

I felt that encoding the restriction that the function is meant to be called
only in the context of fetch operations in the function name itself
ensured that we don't call it from a non-fetch operation - something the
asserts within ExecComputeSlotInfo() are guarding against.

2. Extract return value to a bool variable for slightly better
readability.

To me that seems clearly worse. The variable doesn't add anything, but
needing to track more state.>

Agreed.

--
Soumyadeep

#11Andres Freund
andres@anarazel.de
In reply to: Soumyadeep Chakraborty (#10)
Re: Don't codegen deform code for virtual tuples in expr eval for scan fetch

Hi,

On 2019-09-30 09:14:45 -0700, Soumyadeep Chakraborty wrote:

I don't feel very strongly about the changes I proposed.

I completely agree, that was an important consideration.

I had some purely cosmetic suggestions:
1. Rename ExecComputeSlotInfo to eliminate the need for the asserts.

How does renaming it do so? I feel like the asserts are a good idea
independent of anything else?

I felt that encoding the restriction that the function is meant to be called
only in the context of fetch operations in the function name itself
ensured that we don't call it from a non-fetch operation - something the
asserts within ExecComputeSlotInfo() are guarding against.

2. Extract return value to a bool variable for slightly better
readability.

To me that seems clearly worse. The variable doesn't add anything, but
needing to track more state.>

Agreed.

I pushed this to master. Thanks for your contribution!

Regards,

Andres

#12Soumyadeep Chakraborty
soumyadeep2007@gmail.com
In reply to: Andres Freund (#11)
Re: Don't codegen deform code for virtual tuples in expr eval for scan fetch

Awesome! Thanks so much for all the review! :)

--
Soumyadeep