WIP: Faster Expression Processing and Tuple Deforming (including JIT)
Hi Everyone,
TL;DR: Making things faster. Architectural evalation.
as some of you might be aware I've been working on making execution of
larger queries in postgresl faster. While working on "batched execution"
I came to the conclusion that, while necessary, isn't currently showing
a large benefit because expression evaluation and tuple deforming are
massive bottlenecks.
I'm posting a quite massive series of WIP patches here, to get some
feedback.
Tuple deforming is slow because of two reasons:
1) It's the first thing that accesses tuples, i.e. it'll often incur
cache misses. That's partially fundamental, but also partially can be
addressed, e.g. through changing the access order in heap as in [1]https://archives.postgresql.org/message-id/20161030073655.rfa6nvbyk4w2kkpk%40alap3.anarazel.de.
2) Tuple deforming has a lot of unpredicatable branches, because it has
to cope with various types of fields. We e.g. perform alignment in a
lot of unneeded cases, do null checks for NOT NULL columns et al.
I tried to address 2) by changing the C implementation. That brings some
measurable speedups, but it's not huge. A bigger speedup is making
slot_getattr, slot_getsomeattrs, slot_getallattrs very trivial wrappers;
but it's still not huge. Finally I turned to just-in-time (JIT)
compiling the code for tuple deforming. That doesn't save the cost of
1), but it gets rid of most of 2) (from ~15% to ~3% in TPCH-Q01). The
first part is done in 0008, the JITing in 0012.
Expression evaluation and projection is another major bottleneck.
1) Our recursive expression evaluation puts a *lot* of pressure on the
stack.
2) There's a lot of indirect function calls when recursing to other
expression nodes. These are hard to predict, because the same node
type (say ExecEvalAnd()) is used in different parts of an expression
tree, and invokes different sub-nodes.
3) The function calls to operators and other functions are hard to
predict, leading to a significant number of pipeline stalls.
4) There's a fair amount of pg_list.h list style iteration going on,
those are cache and pipeline inefficient.
After some experimenting I came to the conclusion that the recursive
processing is a fundamental impediment to making this faster. I've
converted (0006) expression processing and projection into an opcode
dispatch based interpreter. That yields, especially for complex
expressions and larger projections a significant speedup in itself. But
similarly to the deforming, expression evaluation remains a bottleneck
after that, primarily because there's still a lot of unpredictable jump
and calls, and because loads/stores have to be complex
(e.g. ExprContext->ecxt_innertuple->tts_values[i]/tts_isnull[i] for a
single scalar var evaluation). Using the opcode based representation
of expression evaluation (as it's nearly linear, and has done a lot of
the lookups ahead of time), it's actually quite easy to
*After JITing expression evaluation itself is more than ten times faster
than before*.
But unfortunately that doesn't mean that queries are ten times faster -
usually we'll hit bottlenecks elsewhere relatively soon. WRT to
expression evaluation, the biggest cost afterwards are the relatively
high overhead V1 function calls - register based parameter passing is a
lot faster.
After experimenting a bit with doing JITing manually (a lot of
eye-stabbing kind of fun), I chose to use LLVM.
An overview of the patch-queue so far:
0001 Make get_last_attnums more generic.
Boring prerequisite.
0002 More efficient AggState->pertrans iteration.
Relatively boring minor optimization, but it turns out to be a easily
hit bottleneck. Will commit independently.
0003 Avoid materializing SRFs in the FROM list.
0004 Allow ROWS FROM to return functions as single record column.
0005 Basic implementation of targetlist SRFs via ROWS FROM.
0006 Remove unused code related to targetlist SRFs.
These are basically just pre-requisites for the faster expression
evaluation, and discussed elsewhere [2]/messages/by-id/20160523005327.v2tr7obytitxcnna@alap3.anarazel.de. This implementation is *NOT*
going to survive, because we ended coming to the conclusion that using a
separate executor node to expand SRFs is a btter plan. But the new
expression evaluation code won't be able to handle SRFs...
0007 WIP: Optimize slot_deform_tuple() significantly.
This a) turns tuple deforming into an opcode based dispatch loop (using
computed goto on gcc/clang). b) moves a lot of the logic from
slot_deform_tuple() callsites into itself - that turns out to be more
efficient. I'm not entirely sure it's worth doing the opcode based
dispatch part, if we're going to also do the JIT bit - it's a fair
amount of code, and the speed difference only matters on large amounts
of rows.
0008 WIP: Faster expression processing and targetlist projection.
This, functionally nearly complete, patch turns expression evaluation
(and tuple deforming as a special case of that) into a "mini language"
which is interpreted using either a while(true) switch(opcode) or
computed goto to jump from opcode to opcode. It does so by moving a lot
more of the code for expression evaluation to initialization time and
building a linear series of steps to evaluate expressions, thereby
removing all recursion from expression processing.
This nearly entirely gets rid of the stack usage cost of expression
evaluation (we pretty much never recurse except for subplans). Being
able to remove, now redundant, calls to check_stack_depth() is a
noticeable benefit, it turns out that that check has a noticeable
performance impact (as it aparently forces to actually use the stack,
instead of just renumbering registers inside the CPU).
The new representation and evaluation is functionally nearly complete
(there's a single regression test failure, and I know why that is), but
the code needs a fair amount of polishing.
I do absolutely think that the fundamentals of this are the right way to
go, and I'm going to work hard on polishing the patch up. But this
isn't something that we can easily do in parts, and it's a huge ass
patch. So I'd like to have at least some more buyin before wasting even
more time on this.
0009 WIP: Add minimal keytest implementation.
More or less experimental patch that tries to implement simple
expression of the OpExpr(ScalarVar, Const) into a single expression
evaluation step. The benefits probably aren't big enough iff we do end
up doing JITing of expressions.
0010 WIP: Add configure infrastructure to enable LLVM.
0011 WIP: Beginning of a LLVM JIT infrastructure.
Very boring preliminary patches to add --with-llvm and some minimal
infrastructure to handle LLVM. If we go this way, JITed stuff needs to
be tied to resource owners, and we need some other centralized
infrastructure.
0012 Heavily-WIP: JITing of tuple deforming.
This, in a not-yet-that-nice manner, implements a JITed version of the
per-column stuff that slot_deform_tuple() does. It currently always
deforms all columns, which obviously would have to change. There's also
considerable additional performance improvements possible.
With this patch the per-column overhead (minus bitmap handling, which
0007 moved into a separate loop), drops from 10%+ into low single digits
for a number of queries. Afterwards the biggest cost is VARSIZE_ANY()
for varlena columns (which atm isn't inlined). That is, besides the
initial cache-miss when accessing tuple->t_hoff, which JITing can do
nothing about :(
This can be enabled/disabled using the new jit_tuple_deforming GUC. To
make this production ready in some form, we'd have to come up with a way
to determine when it's worth doing JITing. The easiest way would be to
do so after N slot_deform_tuple() calls or such, another way would be to
do it based on cost estimates.
0013 WIP: ExprEval: Make threaded dispatch use a separate field.
Boring preliminary patch. Increases memory usage a bit, needs to be
thought through more.
0014 Heavily-WIP: JITed expression evaluation.
This is the most-interesting bit performance wise. A few common types of
expressions are JITed. Scalar value accesses, function calls, boolean
expressions, aggregate references.
This can be enabled using the new jit_expressions GUC.
Even for the supported expression types I've taken some shortcuts
(e.g. strict functions aren't actually strict).
The performance benefits are quite noticeable. For TPCH ExecEvalExpr()
(which is where 0008 moved all of expression evaluation/projection) goes
from being the top profile entry, to barely noticeable, with the JITed
function usually not showing up in the top five entries anymore.
After the patch it becomes very clear that our function call
infrastructure is a serious bottlenecks. Passing all the arguments via
memory, and, even worse, forcing isnull/values to be on separate
cachelines, has significant performance implications. It also becomes
quite noticeable that nodeAgg's transition function invocation doesn't
go through ExecEvalExpr() but does that itself - which leads to constant
mispredictions if several transition values exist.
While the JIT code is relatively verbose, it turns out to not actually
be that hard to write after some startup pains. All the JITing of
expressions that exists so far was basically written in ~10 hours.
This also needs some heuristics about when JITing is
appropriate. Compiling an expression that's only executed once is never
going to be faster than doing the interpretation (it at least needs a
writable allocation for the code, and then a remap to make that code
read-only and executable). A trace based approach (everything executed
at least a thousand times) or cost based (all queries costing more than
100000 should be JITed) could make sense.
It's worthwhile to note that at the moment this is a per-query-execution
JIT, not something that can trivially be cached for prepared
statements. That'll need further infrastructure.
0015 Super-Heavily-WIP: LLVM perf integration.
This very very very preliminary patch (including some copy-pasted GPL
code!) creates /proc/perf-<pid>.map files, which allows perf to show
useful symbols for profile hits to JIT expressions. I plan to push this
towards LLVM, so this isn't something PG will have to do, but it's
helpful for evaluation.
I eventually plan to start separate threads about some of the parts in
here, but I think the overal picture needs some discussion first.
Q: Why LLVM and not a hand-rolled JIT?
A: Because hand-rolling a JIT is probably hard to scale to multiple
maintainers, and multiple platforms. I started down the path of doing
a hand-rolled x86 JIT, and that'd also be doable (faster compilation,
slower execution basically); but I doubt we'd end up having that on
different architectures on platforms. Not to speak of things like
proper debugger and profiler integration. I'm not entirely convinced
that that's the right path. It might also be a transitional step,
towards doing our completely own JIT. But I think it's a sensible
step.
Q: Why LLVM and not $jit-toolkit
A: Because all the other JIT stuff I looked at was either really
unportable (mostly x86 linux only), inconveniently licensed (like
e.g. gcc's jit library) or nearly unmaintained (luajit's stuff for
example). I might have missed something, but ISTM that atm the
choice is between hand-rolling and using LLVM.
Q: Does this actually inline functions from the backend?
A: No. That probably is something desirable in the future, but to me
that seems like it should be a separate step. The current one's big
enough. It's also further increases compilation times, so quite
possibly we only want to do so based on another set of heuristics.
Q: ?
Comments? Questions?
Regards,
Andres
[1]: https://archives.postgresql.org/message-id/20161030073655.rfa6nvbyk4w2kkpk%40alap3.anarazel.de
[2]: /messages/by-id/20160523005327.v2tr7obytitxcnna@alap3.anarazel.de
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-12-05 19:49:55 -0800, Andres Freund wrote:
I'm posting a quite massive series of WIP patches here, to get some
feedback.
And here's the patches themselves - let's hope they're not too big
(after gzip'ing that is).
Andres
Andres Freund <andres@anarazel.de> writes:
I'm posting a quite massive series of WIP patches here, to get some
feedback.
I guess the $64 question that has to be addressed here is whether we're
prepared to accept LLVM as a run-time dependency. There are some reasons
why we might not be:
* The sheer mass of the dependency. What's the installed footprint of
LLVM, versus a Postgres server? How hard is it to install from source?
* How will we answer people who say they can't accept having a compiler
installed on their production boxes for security reasons?
* Are there any currently-interesting platforms that LLVM doesn't work
for? (I'm worried about RISC-V as much as legacy systems.)
I concur with your feeling that hand-rolled JIT is right out. But
I'm not sure that whatever performance gain we might get in this
direction is worth the costs.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Dec 6, 2016 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:
I'm posting a quite massive series of WIP patches here, to get some
feedback.I guess the $64 question that has to be addressed here is whether we're
prepared to accept LLVM as a run-time dependency. There are some reasons
why we might not be:* The sheer mass of the dependency. What's the installed footprint of
LLVM, versus a Postgres server? How hard is it to install from source?* How will we answer people who say they can't accept having a compiler
installed on their production boxes for security reasons?* Are there any currently-interesting platforms that LLVM doesn't work
for? (I'm worried about RISC-V as much as legacy systems.)
I think anything that requires LLVM -- or, for that matter, anything
that does JIT by any means -- has got to be optional. But I don't
think --with-llvm as a compile option is inherently problematic.
Also, I think this is probably a direction we need to go. I've heard
at least one and maybe several PGCon presentations about people JITing
tuple deformation and getting big speedups, and I'd like to finally
hear one from somebody who intends to integrate that into PostgreSQL.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2016-12-06 13:56:28 -0500, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
I'm posting a quite massive series of WIP patches here, to get some
feedback.I guess the $64 question that has to be addressed here is whether we're
prepared to accept LLVM as a run-time dependency. There are some reasons
why we might not be:
Indeed. It'd only be a soft dependency obviously.
* The sheer mass of the dependency. What's the installed footprint of
LLVM, versus a Postgres server? How hard is it to install from source?
Worked for me first try, but I'm perhaps not the best person to judge.
It does take a while to compile though (~20min on my laptop).
* How will we answer people who say they can't accept having a compiler
installed on their production boxes for security reasons?
I think they'll just not enable --with-llvm in that case, and get
inferior performance. Note that installing llvm does not imply
installing a full blown C compiler (although I think that's largely
moot, you can get pretty much the same things done with just compiling
LLVM IR).
* Are there any currently-interesting platforms that LLVM doesn't work
for? (I'm worried about RISC-V as much as legacy systems.)
LLVM itself I don't think is a problem, it seems to target a wide range
of platforms. The platforms that don't support JIT compiling might be a
bit larger, since that involves more than just generating code.
I concur with your feeling that hand-rolled JIT is right out. But
I'm not sure that whatever performance gain we might get in this
direction is worth the costs.
Well, I'm not impartial, but I don't think we do our users a service by
leaving significant speedups untackled, and after spending a *LOT* of
time on this, I don't see much other choice than JITing. Note that
nearly everything performance sensitive is moving towards doing JITing
in some form or another.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-12-06 14:04:09 -0500, Robert Haas wrote:
I've heard at least one and maybe several PGCon presentations about
people JITing tuple deformation and getting big speedups, and I'd like
to finally hear one from somebody who intends to integrate that into
PostgreSQL.
I certainly want to.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-12-06 11:10:59 -0800, Andres Freund wrote:
* Are there any currently-interesting platforms that LLVM doesn't work
for? (I'm worried about RISC-V as much as legacy systems.)LLVM itself I don't think is a problem, it seems to target a wide range
of platforms. The platforms that don't support JIT compiling might be a
bit larger, since that involves more than just generating code.
The os specific part is handling the executable format. The JIT we'd be
using (MCJIT) has support for ELF, MachO, and COFF. The architecture
specific bits seem to be there for x86, arm (small endian, be), aarch64
(arm 64 bits be/le again), mips, ppc64.
Somebody is working on RISC-V support for llvm (i.e. it appears to be
working, but is not merged) - but given it's not integrated into gcc
either, I'm not seing that being an argument.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Dec 6, 2016 at 2:10 PM, Andres Freund <andres@anarazel.de> wrote:
* The sheer mass of the dependency. What's the installed footprint of
LLVM, versus a Postgres server? How hard is it to install from source?Worked for me first try, but I'm perhaps not the best person to judge.
It does take a while to compile though (~20min on my laptop).
Presumably this is going to need to be something that a user can get
via yum install <blah> or apt-get install <blah> on common systems.
I wonder how feasible it would be to make this a run-time dependency
rather than a compile option. That's probably overcomplicating
things, but...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-12-06 15:13:21 -0500, Robert Haas wrote:
Presumably this is going to need to be something that a user can get
via yum install <blah> or apt-get install <blah> on common systems.
Right. apt-get install llvm-dev (or llvm-3.9-dev or such if you want to
install a specific version), does the trick here.
It's a bit easier to develop with a hand compiled version, because then
LLVM adds a bootloads of asserts to its IR builder, which catches a fair
amount of mistakes. Nothing you'd run in production though (just like
you don't use a cassert build...).
I wonder how feasible it would be to make this a run-time dependency
rather than a compile option. That's probably overcomplicating
things, but...
I don't think that's feasible at all unfortunately - the compiler IR
(which then is JITed by LLVM) is generated via another C API. We could
rebuild that one, but that'd be a lot of work.
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Dec 06, 2016 at 01:56:28PM -0500, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
I'm posting a quite massive series of WIP patches here, to get some
feedback.I guess the $64 question that has to be addressed here is whether we're
prepared to accept LLVM as a run-time dependency. There are some reasons
why we might not be:* The sheer mass of the dependency. What's the installed footprint of
LLVM, versus a Postgres server? How hard is it to install from source?
As long as it's optional, does this matter?
A bigger concern might be interface stability. IIRC the LLVM C/C++
interfaces are not very stable, but bitcode is.
* How will we answer people who say they can't accept having a compiler
installed on their production boxes for security reasons?
You don't need the front-ends (e.g., clang) installed in order to JIT.
* Are there any currently-interesting platforms that LLVM doesn't work
for? (I'm worried about RISC-V as much as legacy systems.)
The *BSDs support more platforms than LLVM does, that's for sure.
(NetBSD supports four more, IIRC, including ia64.) But the patches make
LLVM optional anyways, so this should be a non-issue.
I concur with your feeling that hand-rolled JIT is right out. But
Yeah, that way lies maintenance madness.
I'm not sure that whatever performance gain we might get in this
direction is worth the costs.
Byte-/bit-coding query plans then JITting them is very likely to improve
performance significantly. Whether you want the maintenance overhead is
another story.
Sometimes byte-coding + interpretation yields a significant improvement
by reducing cache pressure on the icache and the size of the program to
be interpreted. Having the option to JIT or not JIT might be useful.
Nico
--
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
On 2016-12-06 13:56:28 -0500, Tom Lane wrote:
I guess the $64 question that has to be addressed here is whether we're
prepared to accept LLVM as a run-time dependency. There are some reasons
why we might not be:
Indeed. It'd only be a soft dependency obviously.
Oh, so we'd need to maintain both the LLVM and the traditional expression
execution code? That seems like a bit of a pain, but maybe we can live
with it.
* How will we answer people who say they can't accept having a compiler
installed on their production boxes for security reasons?
I think they'll just not enable --with-llvm in that case, and get
inferior performance. Note that installing llvm does not imply
installing a full blown C compiler (although I think that's largely
moot, you can get pretty much the same things done with just compiling
LLVM IR).
I'm not entirely thrilled with the idea of this being a configure-time
decision, because that forces packagers to decide for their entire
audience whether it's okay to depend on LLVM. That would be an untenable
position to put e.g. Red Hat's packagers in: either they screw the people
who want performance or they screw the people who want security.
I think it'd be all right if we can build this so that the direct
dependency on LLVM is confined to a separately-packageable extension.
That way, a packager can produce a core postgresql-server package
that does not require LLVM, plus a postgresql-llvm package that does,
and the "no compiler please" crowd simply doesn't install the latter
package.
The alternative would be to produce two independent builds of the
server, which I suppose might be acceptable but it sure seems like
a kluge, or at least something that simply wouldn't get done by
most vendors.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2016-12-06 14:19:21 -0600, Nico Williams wrote:
A bigger concern might be interface stability. IIRC the LLVM C/C++
interfaces are not very stable, but bitcode is.
The C API is a lot more stable than the C++ bit, that's the primary
reason I ended up using it, despite the C++ docs being better.
I concur with your feeling that hand-rolled JIT is right out. But
Yeah, that way lies maintenance madness.
I'm not quite that sure about that. I had a lot of fun doing some
hand-rolled x86 JITing. Not that is a ward against me being mad. But
more seriously: Manually doing a JIT gives you a lot faster compilation
times, which makes JIT applicable in a lot more situations.
I'm not sure that whatever performance gain we might get in this
direction is worth the costs.Byte-/bit-coding query plans then JITting them is very likely to improve
performance significantly.
Note that what I'm proposing is a far cry away from that - this converts
two (peformance wise two, size wise one) significant subsystems, but far
from all the executors to be JIT able. I think there's some more low
hanging fruits (particularly aggregate transition functions), but
converting everything seems to hit the wrong spot in the
benefit/effort/maintainability triangle.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Dec 06, 2016 at 12:27:51PM -0800, Andres Freund wrote:
On 2016-12-06 14:19:21 -0600, Nico Williams wrote:
A bigger concern might be interface stability. IIRC the LLVM C/C++
interfaces are not very stable, but bitcode is.The C API is a lot more stable than the C++ bit, that's the primary
reason I ended up using it, despite the C++ docs being better.
Ah.
I concur with your feeling that hand-rolled JIT is right out. But
Yeah, that way lies maintenance madness.
I'm not quite that sure about that. I had a lot of fun doing some
hand-rolled x86 JITing. Not that is a ward against me being mad. But
more seriously: Manually doing a JIT gives you a lot faster compilation
times, which makes JIT applicable in a lot more situations.
What I meant is that each time there are new ISA extensions, or
differences in how relevant/significant different implementations of the
same ISA implement certain instructions, and/or every time you want to
add a new architecture... someone has to do a lot of very low-level
work.
I'm not sure that whatever performance gain we might get in this
direction is worth the costs.Byte-/bit-coding query plans then JITting them is very likely to improve
performance significantly.Note that what I'm proposing is a far cry away from that - this converts
two (peformance wise two, size wise one) significant subsystems, but far
from all the executors to be JIT able. I think there's some more low
Yes, I know.
hanging fruits (particularly aggregate transition functions), but
converting everything seems to hit the wrong spot in the
benefit/effort/maintainability triangle.
Maybe? At least with the infrastructure in place for it someone might
try it and see.
Nico
--
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-12-06 15:25:44 -0500, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2016-12-06 13:56:28 -0500, Tom Lane wrote:
I guess the $64 question that has to be addressed here is whether we're
prepared to accept LLVM as a run-time dependency. There are some reasons
why we might not be:Indeed. It'd only be a soft dependency obviously.
Oh, so we'd need to maintain both the LLVM and the traditional expression
execution code? That seems like a bit of a pain, but maybe we can live
with it.
Yea, that's why I converted the "traditional" expression evaluation into
a different format first - that way the duplication is a lot
lower. E.g. scalar var eval looks like:
EEO_CASE(EEO_INNER_VAR):
{
int attnum = op->d.var.attnum;
Assert(op->d.var.attnum >= 0);
*op->resnull = innerslot->tts_isnull[attnum];
*op->resvalue = innerslot->tts_values[attnum];
EEO_DISPATCH(op);
}
in normal evaluation and like
case EEO_INNER_VAR:
{
LLVMValueRef value, isnull;
LLVMValueRef v_attnum;
v_attnum = LLVMConstInt(LLVMInt32Type(), op->d.var.attnum, false);
value = LLVMBuildLoad(builder, LLVMBuildGEP(builder, v_innervalues, &v_attnum, 1, ""), "");
isnull = LLVMBuildLoad(builder, LLVMBuildGEP(builder, v_innernulls, &v_attnum, 1, ""), "");
LLVMBuildStore(builder, value, v_resvaluep);
LLVMBuildStore(builder, isnull, v_resnullp);
LLVMBuildBr(builder, opblocks[i + 1]);
break;
}
for JITed evaluation.
I'm not entirely thrilled with the idea of this being a configure-time
decision, because that forces packagers to decide for their entire
audience whether it's okay to depend on LLVM. That would be an untenable
position to put e.g. Red Hat's packagers in: either they screw the people
who want performance or they screw the people who want security.
Hm. I've a bit of a hard time buying the security argument here. Having
LLVM (not clang!) installed doesn't really change the picture that
much. In either case you can install binaries, and you're very likely
already using some program that does JIT internally. And postgres itself
gives you plenty of ways to execute arbitrary code as superuser.
The argument for not install a c compiler seems to be that it makes it
less convenient to build an executable. I doubt that having a C(++)
library for code generation is convenient enough to change the picture
there.
I think it'd be all right if we can build this so that the direct
dependency on LLVM is confined to a separately-packageable extension.
That way, a packager can produce a core postgresql-server package
that does not require LLVM, plus a postgresql-llvm package that does,
and the "no compiler please" crowd simply doesn't install the latter
package.
That should be possible, but I'm not sure it's worth the effort. The JIT
infrastructure will need resowner integration and such. We can obviously
split things so that part is independent of LLVM, but I'm unconvinced
that the benefit is large enough.
The alternative would be to produce two independent builds of the
server, which I suppose might be acceptable but it sure seems like
a kluge, or at least something that simply wouldn't get done by
most vendors.
Hm. We could make that a make target ourselves ;)
Regards,
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-12-06 14:35:43 -0600, Nico Williams wrote:
On Tue, Dec 06, 2016 at 12:27:51PM -0800, Andres Freund wrote:
On 2016-12-06 14:19:21 -0600, Nico Williams wrote:
I concur with your feeling that hand-rolled JIT is right out. But
Yeah, that way lies maintenance madness.
I'm not quite that sure about that. I had a lot of fun doing some
hand-rolled x86 JITing. Not that is a ward against me being mad. But
more seriously: Manually doing a JIT gives you a lot faster compilation
times, which makes JIT applicable in a lot more situations.What I meant is that each time there are new ISA extensions, or
differences in how relevant/significant different implementations of the
same ISA implement certain instructions, and/or every time you want to
add a new architecture... someone has to do a lot of very low-level
work.
Yea, that's why I didn't pursue this path further. I *personally* think
it'd be perfectly fine to only support JITing on linux x86_64 and
aarch64 for now. And those I'd be willing to work on. But since I know
that's not project policy...
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Dec 06, 2016 at 12:36:41PM -0800, Andres Freund wrote:
On 2016-12-06 15:25:44 -0500, Tom Lane wrote:
I'm not entirely thrilled with the idea of this being a configure-time
decision, because that forces packagers to decide for their entire
audience whether it's okay to depend on LLVM. That would be an untenable
position to put e.g. Red Hat's packagers in: either they screw the people
who want performance or they screw the people who want security.
There's no security issue. The dependency is on LLVM libraries, not
LLVM front-ends (e.g., clang(1)).
I don't think there's a real issue as to distros/packagers/OS vendors.
They already have to package LLVM, and they already package LLVM
libraries separately from LLVM front-ends.
The argument for not install a c compiler seems to be that it makes it
less convenient to build an executable. I doubt that having a C(++)
library for code generation is convenient enough to change the picture
there.
The security argument goes back to the days of the Morris worm, which
depended on having developer tools (specifically in that case, ld(1),
the link-editor). But JIT via LLVM won't give hackers a way to generate
or link arbitrary object code.
Nico
--
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Dec 5, 2016 at 7:49 PM, Andres Freund <andres@anarazel.de> wrote:
I tried to address 2) by changing the C implementation. That brings some
measurable speedups, but it's not huge. A bigger speedup is making
slot_getattr, slot_getsomeattrs, slot_getallattrs very trivial wrappers;
but it's still not huge. Finally I turned to just-in-time (JIT)
compiling the code for tuple deforming. That doesn't save the cost of
1), but it gets rid of most of 2) (from ~15% to ~3% in TPCH-Q01). The
first part is done in 0008, the JITing in 0012.
A more complete motivating example would be nice. For example, it
would be nice to see the overall speedup for some particular TPC-H
query.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-12-06 13:27:14 -0800, Peter Geoghegan wrote:
On Mon, Dec 5, 2016 at 7:49 PM, Andres Freund <andres@anarazel.de> wrote:
I tried to address 2) by changing the C implementation. That brings some
measurable speedups, but it's not huge. A bigger speedup is making
slot_getattr, slot_getsomeattrs, slot_getallattrs very trivial wrappers;
but it's still not huge. Finally I turned to just-in-time (JIT)
compiling the code for tuple deforming. That doesn't save the cost of
1), but it gets rid of most of 2) (from ~15% to ~3% in TPCH-Q01). The
first part is done in 0008, the JITing in 0012.A more complete motivating example would be nice. For example, it
would be nice to see the overall speedup for some particular TPC-H
query.
Well, it's a bit WIP-y for that - not all TPCH queries run JITed yet, as
I've not done that for enough expression types... And you run quickly
into other bottlenecks.
But here we go for TPCH (scale 10) Q01:
master:
Time: 33885.381 ms
16.29% postgres postgres [.] slot_getattr
12.85% postgres postgres [.] ExecMakeFunctionResultNoSets
10.85% postgres postgres [.] advance_aggregates
6.91% postgres postgres [.] slot_deform_tuple
6.70% postgres postgres [.] advance_transition_function
4.59% postgres postgres [.] ExecProject
4.25% postgres postgres [.] float8_accum
3.69% postgres postgres [.] tuplehash_insert
2.39% postgres postgres [.] float8pl
2.20% postgres postgres [.] bpchareq
2.03% postgres postgres [.] check_stack_depth
profile:
(note that all expression evaluated things are distributed among many
functions)
dev (no jiting):
Time: 30343.532 ms
profile:
16.57% postgres postgres [.] slot_deform_tuple
13.39% postgres postgres [.] ExecEvalExpr
8.64% postgres postgres [.] advance_aggregates
8.58% postgres postgres [.] advance_transition_function
5.83% postgres postgres [.] float8_accum
5.14% postgres postgres [.] tuplehash_insert
3.89% postgres postgres [.] float8pl
3.60% postgres postgres [.] slot_getattr
2.66% postgres postgres [.] bpchareq
2.56% postgres postgres [.] heap_getnext
dev (jiting):
SET jit_tuple_deforming = on;
SET jit_expressions = true;
Time: 24439.803 ms
profile:
11.11% postgres postgres [.] slot_deform_tuple
10.87% postgres postgres [.] advance_aggregates
9.74% postgres postgres [.] advance_transition_function
6.53% postgres postgres [.] float8_accum
5.25% postgres postgres [.] tuplehash_insert
4.31% postgres perf-10698.map [.] deform0
3.68% postgres perf-10698.map [.] evalexpr6
3.53% postgres postgres [.] slot_getattr
3.41% postgres postgres [.] float8pl
2.84% postgres postgres [.] bpchareq
(note how expression eval when from 13.39% to roughly 4%)
The slot_deform_cost here is primarily cache misses. If you do the
"memory order" iteration, it drops significantly.
The JIT generated code still leaves a lot on the table, i.e. this is
definitely not the best we can do. We also deform half the tuple twice,
because I've not yet added support for starting to deform in the middle
of a tuple.
Independent of new expression evaluation and/or JITing, if you make
advance_aggregates and advance_transition_function inline functions (or
you do profiling accounting for children), you'll notice that ExecAgg()
+ advance_aggregates + advance_transition_function themselves take up
about 20% cpu-time. That's *not* including the hashtable management,
the actual transition functions, and such themselves.
If you have queries where tuple deforming is a bigger proportion of the
load, or where expression evalution (including projection) is a larger
part (any NULLs e.g.) you can get a lot bigger wins, even without
actually optimizing the generated code (which I've not yet done).
Just btw: float8_accum really should use an internal aggregation type
instead of using postgres array...
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 7 December 2016 at 04:13, Robert Haas <robertmhaas@gmail.com> wrote:
I wonder how feasible it would be to make this a run-time dependency
rather than a compile option.
Or something that's compiled with the server, but produces a separate
.so that's the only thing that links to LLVM. So packagers can avoid a
dependency on LLVM for postgres.
I suspect it wouldn't be worth the complexity, the added indirection
necessary, etc. If you're using packages then pulling in LLVM isn't a
big deal. If you're not, then don't use --with-llvm .
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 7 December 2016 at 14:39, Craig Ringer <craig@2ndquadrant.com> wrote:
On 7 December 2016 at 04:13, Robert Haas <robertmhaas@gmail.com> wrote:
I wonder how feasible it would be to make this a run-time dependency
rather than a compile option.Or something that's compiled with the server, but produces a separate
.so that's the only thing that links to LLVM. So packagers can avoid a
dependency on LLVM for postgres.
Ahem, next time I'll finish the thread first. Nevermind.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers