Don't codegen deform code for virtual tuples in expr eval for scan fetch
Hello Hackers,
This is to address a TODO I found in the JIT expression evaluation
code (opcode =
EEOP_INNER_FETCHSOME/EEOP_OUTER_FETCHSOME/EEOP_SCAN_FETCHSOME):
* TODO: skip nvalid check if slot is fixed and known to
* be a virtual slot.
Not only should we skip the nvalid check if the tuple is virtual, the
whole basic block should be a no-op. There is no deforming to be done,
JIT (slot_compile_deform) or otherwise (slot_getsomeattrs) for virtual
tuples.
Attached is a patch 0001 that achieves exactly that by:
1. Performing the check at the very beginning of the case.
2. Emitting an unconditional branch to the next opblock if we have a
virtual tuple. We end up with a block with just the single
unconditional branch instruction in this case. This block is optimized
away when we run llvm_optimize_module().
Also enclosed is another patch 0002 that adds on some minor
refactoring of conditionals around whether to JIT deform or not.
---
Soumyadeep Chakraborty
Attachments:
0001-Don-t-codegen-if-tuple-is-virtual-in-expr-eval-of-sc.patchapplication/octet-stream; name=0001-Don-t-codegen-if-tuple-is-virtual-in-expr-eval-of-sc.patchDownload+12-11
0002-Minor-refactor-JIT-deform-or-not.patchapplication/octet-stream; name=0002-Minor-refactor-JIT-deform-or-not.patchDownload+5-11
Hello,
In my previous patch 0001, the resulting opblock consisted of a single
br instruction to it's successor opblock. Such a block represents
unnecessary overhead. Even though such a block would be optimized
away, what if optimization is not performed (perhaps due to
jit_optimize_above_cost)? Perhaps we could be more aggressive. We
could maybe remove the opblock altogether. However, such a solution
is not without complexity.
Such unlinking is non-trivial, and is one of the things the
simplify-cfg pass takes care of. One solution could be to run this
pass ad-hoc for the evalexpr function. Or we could perform the
unlinking during codegen. Such a solution is presented below.
To unlink the current opblock from
the cfg we have to replace all of its uses. From the current state of
the code, these uses are either:
1. Terminator instruction of opblocks[i - 1]
2. Terminator instruction of some sub-op-block of opblocks[i - 1]. By
sub-op-block, I mean some block that is directly linked from opblock[i
- 1] but not opblocks[i].
3. Terminator instruction of the entry block.
We should replace all of these uses with opblocks[i + 1] and then and
only then can we delete opblocks[i]. My latest patch v2-0001 in my latest
patch set, achieves this.
I guard LLVMReplaceAllUsesWith() with Assert()s to ensure that we
don't accidentally replace non-terminator uses of opblocks[i], should
they be introduced in the future. If these asserts fail in the future,
replacing these non-terminator instructions with undefs constitutes a
commonly adopted solution. Otherwise, we can always fall back to making
opblocks[i] empty with just the unconditional br, as in my previous
patch 0001.
--
Soumyadeep
On Tue, Sep 17, 2019 at 11:54 PM Soumyadeep Chakraborty <
soumyadeep2007@gmail.com> wrote:
Show quoted text
Hello Hackers,
This is to address a TODO I found in the JIT expression evaluation
code (opcode =
EEOP_INNER_FETCHSOME/EEOP_OUTER_FETCHSOME/EEOP_SCAN_FETCHSOME):* TODO: skip nvalid check if slot is fixed and known to
* be a virtual slot.Not only should we skip the nvalid check if the tuple is virtual, the
whole basic block should be a no-op. There is no deforming to be done,
JIT (slot_compile_deform) or otherwise (slot_getsomeattrs) for virtual
tuples.Attached is a patch 0001 that achieves exactly that by:
1. Performing the check at the very beginning of the case.
2. Emitting an unconditional branch to the next opblock if we have a
virtual tuple. We end up with a block with just the single
unconditional branch instruction in this case. This block is optimized
away when we run llvm_optimize_module().Also enclosed is another patch 0002 that adds on some minor
refactoring of conditionals around whether to JIT deform or not.---
Soumyadeep Chakraborty
Attachments:
v2-0001-Don-t-codegen-if-tuple-is-virtual-in-expr-eval-of.patchapplication/octet-stream; name=v2-0001-Don-t-codegen-if-tuple-is-virtual-in-expr-eval-of.patchDownload+43-11
v2-0002-Minor-refactor-JIT-deform-or-not.patchapplication/octet-stream; name=v2-0002-Minor-refactor-JIT-deform-or-not.patchDownload+5-11
Hi,
Thanks for working on this!
On 2019-09-17 23:54:51 -0700, Soumyadeep Chakraborty wrote:
This is to address a TODO I found in the JIT expression evaluation
code (opcode =
EEOP_INNER_FETCHSOME/EEOP_OUTER_FETCHSOME/EEOP_SCAN_FETCHSOME):* TODO: skip nvalid check if slot is fixed and known to
* be a virtual slot.
I now think this isn't actually the right approach. Instead of doing
this optimization just for JIT compilation, we should do it while
generating the ExprState itself. That way we don't just accelerate JITed
programs, but also normal interpreted execution.
IOW, wherever ExecComputeSlotInfo() is called, we should only actually
push the expression step, when ExecComputeSlotInfo does not determine
that a) the slot is virtual, b) and fixed (i.e. guaranteed to always be
the same type of slot).
Does that make sense?
Greetings,
Andres Freund
Hi,
On 2019-09-20 22:19:46 -0700, Soumyadeep Chakraborty wrote:
In my previous patch 0001, the resulting opblock consisted of a single
br instruction to it's successor opblock. Such a block represents
unnecessary overhead. Even though such a block would be optimized
away, what if optimization is not performed (perhaps due to
jit_optimize_above_cost)? Perhaps we could be more aggressive. We
could maybe remove the opblock altogether. However, such a solution
is not without complexity.
I'm doubtful this is worth the complexity - and not that we already have
plenty other places with zero length blocks.
WRT jit_optimize_above_cost not triggering, I think we need to replace
the "basic, unoptimized" codegen path with one that does cheap
optimizations only. The reason we don't do the full optimizations all
the time is that they're expensive, but there's enough optimizations
that are pretty cheap. At some point we'll probably need our own
optimization pipeline, but I don't want to maintain that right now
(i.e. if some other people want to help with this aspect, cool)...
See also: /messages/by-id/20190904152438.pv4vdk3ctuvvnxh3@alap3.anarazel.de
Greetings,
Andres Freund
Hello Andres,
Thank you very much for reviewing my patch!
On Wed, Sep 25, 2019 at 1:02 PM Andres Freund <andres@anarazel.de> wrote:
IOW, wherever ExecComputeSlotInfo() is called, we should only actually
push the expression step, when ExecComputeSlotInfo does not determine
that a) the slot is virtual, b) and fixed (i.e. guaranteed to always be
the same type of slot).Does that make sense?
That is a great suggestion and I totally agree. I have attached a patch
that achieves this.
--
Soumyadeep
Attachments:
v1-0001-Skip-deform-expr-step-emission-for-virtual-tuples.patchapplication/octet-stream; name=v1-0001-Skip-deform-expr-step-emission-for-virtual-tuples.patchDownload+18-11
Hi Andres,
Thank you for your insight and the link offered just the context I needed!
On Wed, Sep 25, 2019 at 1:06 PM Andres Freund <andres@anarazel.de> wrote:
I'm doubtful this is worth the complexity - and not that we already have
plenty other places with zero length blocks.
Agreed.
On Wed, Sep 25, 2019 at 1:06 PM Andres Freund <andres@anarazel.de> wrote:
WRT jit_optimize_above_cost not triggering, I think we need to replace
the "basic, unoptimized" codegen path with one that does cheap
optimizations only. The reason we don't do the full optimizations all
the time is that they're expensive, but there's enough optimizations
that are pretty cheap. At some point we'll probably need our own
optimization pipeline, but I don't want to maintain that right now
(i.e. if some other people want to help with this aspect, cool)...
I would absolutely love to work on this!
I was thinking the same. Perhaps not only replace the default on the
compile path, but also introduce additional levels. These levels could then
be configured at a granularity lower than the -O0, -O1, .. In other words,
perhaps we could have levels with ad-hoc pass combinations. We could then
maybe have costs associated with each level. I think such a framework
would be ideal.
To summarize this into TODOs:
1. Tune default compilation path optimizations.
2. New costs per optimization level.
3. Capacity to define "levels" with ad-hoc pass combinations that are
costing
sensitive.
Is this what you meant by "optimization pipeline"?
--
Soumyadeep
Hi,
On 2019-09-25 22:11:51 -0700, Soumyadeep Chakraborty wrote:
Thank you very much for reviewing my patch!
On Wed, Sep 25, 2019 at 1:02 PM Andres Freund <andres@anarazel.de> wrote:
IOW, wherever ExecComputeSlotInfo() is called, we should only actually
push the expression step, when ExecComputeSlotInfo does not determine
that a) the slot is virtual, b) and fixed (i.e. guaranteed to always be
the same type of slot).Does that make sense?
That is a great suggestion and I totally agree. I have attached a patch
that achieves this.
I think as done, this has the slight disadvantage of removing the
fast-path for small interpreted expressions, because that explicitly
matches for seeing the EEOP_*_FETCHSOME ops. Look at execExprInterp.c,
around:
/*
* Select fast-path evalfuncs for very simple expressions. "Starting up"
* the full interpreter is a measurable overhead for these, and these
* patterns occur often enough to be worth optimizing.
*/
if (state->steps_len == 3)
{
So I think we'd have to add a separate fastpath for virtual slots for
that.
What do you think about the attached?
Greetings,
Andres Freund
Hey,
I completely agree, that was an important consideration.
I had some purely cosmetic suggestions:
1. Rename ExecComputeSlotInfo to eliminate the need for the asserts.
2. Extract return value to a bool variable for slightly better
readability.
3. Taking the opportunity to use TTS_IS_VIRTUAL.
v2 of patch set attached. The first two patches are unchanged, the cosmetic
changes are part of v2-0003-Some-minor-cosmetic-changes.patch.
--
Soumyadeep
Attachments:
v2-0001-Reduce-code-duplication-for-ExecJust-Var-operatio.patchapplication/octet-stream; name=v2-0001-Reduce-code-duplication-for-ExecJust-Var-operatio.patchDownload+35-60
v2-0002-Don-t-generate-EEOP_-_FETCHSOME-operations-for-sl.patchapplication/octet-stream; name=v2-0002-Don-t-generate-EEOP_-_FETCHSOME-operations-for-sl.patchDownload+160-23
v2-0003-Some-minor-cosmetic-changes.patchapplication/octet-stream; name=v2-0003-Some-minor-cosmetic-changes.patchDownload+16-14
Hi,
On 2019-09-27 23:01:05 -0700, Soumyadeep Chakraborty wrote:
I completely agree, that was an important consideration.
I had some purely cosmetic suggestions:
1. Rename ExecComputeSlotInfo to eliminate the need for the asserts.
How does renaming it do so? I feel like the asserts are a good idea
independent of anything else?
2. Extract return value to a bool variable for slightly better
readability.
To me that seems clearly worse. The variable doesn't add anything, but
needing to track more state.
3. Taking the opportunity to use TTS_IS_VIRTUAL.
Good point.
- Andres
Hi Andres,
I don't feel very strongly about the changes I proposed.
I completely agree, that was an important consideration.
I had some purely cosmetic suggestions:
1. Rename ExecComputeSlotInfo to eliminate the need for the asserts.How does renaming it do so? I feel like the asserts are a good idea
independent of anything else?
I felt that encoding the restriction that the function is meant to be called
only in the context of fetch operations in the function name itself
ensured that we don't call it from a non-fetch operation - something the
asserts within ExecComputeSlotInfo() are guarding against.
2. Extract return value to a bool variable for slightly better
readability.To me that seems clearly worse. The variable doesn't add anything, but
needing to track more state.>
Agreed.
--
Soumyadeep
Hi,
On 2019-09-30 09:14:45 -0700, Soumyadeep Chakraborty wrote:
I don't feel very strongly about the changes I proposed.
I completely agree, that was an important consideration.
I had some purely cosmetic suggestions:
1. Rename ExecComputeSlotInfo to eliminate the need for the asserts.How does renaming it do so? I feel like the asserts are a good idea
independent of anything else?I felt that encoding the restriction that the function is meant to be called
only in the context of fetch operations in the function name itself
ensured that we don't call it from a non-fetch operation - something the
asserts within ExecComputeSlotInfo() are guarding against.2. Extract return value to a bool variable for slightly better
readability.To me that seems clearly worse. The variable doesn't add anything, but
needing to track more state.>Agreed.
I pushed this to master. Thanks for your contribution!
Regards,
Andres