Query is over 2x slower with jit=on
Hi all.
I don't know whether this is expected or not but I'm experiencing over 2x
slowdown on a large query in PG-11 with JIT=on.
(query is a prepared statement executed with "explain analyze execute
myprepared(arg1, arg2, ..., argn)")
After 10 executions these are the results (the first 5 executed in > 16s, then
the plan changed)
With jit=on:
https://explain.depesz.com/s/vYB
Planning Time: 0.336 ms
JIT:
Functions: 716
Generation Time: 78.404 ms
Inlining: false
Inlining Time: 0.000 ms
Optimization: false
Optimization Time: 43.916 ms
Emission Time: 600.031 ms
Execution Time: 2035.150 ms
(385 rows)
With jit=off:
https://explain.depesz.com/s/X6mA
Planning Time: 0.371 ms
Execution Time: 833.941 ms
(377 rows)
Both are master as of 55d26ff638f063fbccf57843f2c27f9795895a5c
The query largely consists of CTEs with aggregation which are FULL OUTER
JOIN'ed.
On v10 the query executes in:
Execution time: 1159.628 ms
So v11 (with jit=off) is about 25% faster (due to parallel hash-join I think),
which is nice!
What's the deal with jit making it slower?
--
Andreas Joseph Krogh
On 2018-04-18 17:35:31 +0200, Andreas Joseph Krogh wrote:
With jit=on:
https://explain.depesz.com/s/vYB
Planning Time: 0.336 ms
JIT:
�Functions: 716
�Generation Time: 78.404 ms
�Inlining: false
�Inlining Time: 0.000 ms
�Optimization: false
�Optimization Time: 43.916 ms
�Emission Time: 600.031 ms
Any chance this is a debug LLVM build?
What's the deal with jit making it slower?
JIT has cost, and sometimes it's not beneficial. Here our heuristics
when to JIT appear to be a bit off. In the parallel world it's worse
because the JITing is duplicated for parallel workers atm.
Greetings,
Andres Freund
På onsdag 18. april 2018 kl. 17:50:55, skrev Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>>:
On 2018-04-18 17:35:31 +0200, Andreas Joseph Krogh wrote:
With jit=on:
https://explain.depesz.com/s/vYB
Planning Time: 0.336 ms
JIT:
Functions: 716
Generation Time: 78.404 ms
Inlining: false
Inlining Time: 0.000 ms
Optimization: false
Optimization Time: 43.916 ms
Emission Time: 600.031 ms
Any chance this is a debug LLVM build?
What's the deal with jit making it slower?
JIT has cost, and sometimes it's not beneficial. Here our heuristics
when to JIT appear to be a bit off. In the parallel world it's worse
because the JITing is duplicated for parallel workers atm.
PostgreSQL is built with "--enable-debug --with-llvm". LLVM is the one which
comes with Ubuntu-17.10.
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas@visena.com <mailto:andreas@visena.com>
www.visena.com <https://www.visena.com>
<https://www.visena.com>
On 18 April 2018 at 16:50, Andres Freund <andres@anarazel.de> wrote:
On 2018-04-18 17:35:31 +0200, Andreas Joseph Krogh wrote:
With jit=on:
https://explain.depesz.com/s/vYB
Planning Time: 0.336 ms
JIT:
Functions: 716
Generation Time: 78.404 ms
Inlining: false
Inlining Time: 0.000 ms
Optimization: false
Optimization Time: 43.916 ms
Emission Time: 600.031 msAny chance this is a debug LLVM build?
What's the deal with jit making it slower?
JIT has cost, and sometimes it's not beneficial. Here our heuristics
when to JIT appear to be a bit off. In the parallel world it's worse
because the JITing is duplicated for parallel workers atm.
Please change the name of the "JIT" parameter to something meaningful
to humans before this gets too far into the wild.
SSL is somewhat understandable because its not a Postgres-private term.
geqo is regrettable and we really don't want any more too
short/abbreviated parameter names.
Think of our EOU if every GUC was a TLA.
Thanks
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 04/18/2018 12:27 PM, Simon Riggs wrote:
Please change the name of the "JIT" parameter to something meaningful
to humans before this gets too far into the wild.SSL is somewhat understandable because its not a Postgres-private term.
JIT is hardly a Postgres-private term. It's a familiar term in a
widespread community, though I might cede the point that the
community in question is more compiler wonks than DBAs.
-Chap
On April 18, 2018 9:50:48 AM PDT, Chapman Flack <chap@anastigmatix.net> wrote:
On 04/18/2018 12:27 PM, Simon Riggs wrote:
Please change the name of the "JIT" parameter to something meaningful
to humans before this gets too far into the wild.SSL is somewhat understandable because its not a Postgres-private
term.
JIT is hardly a Postgres-private term. It's a familiar term in a
widespread community, though I might cede the point that the
community in question is more compiler wonks than DBAs.
You're right. There's another thread where we discussed this. Also due to concern of Simon's. Can we not do so separately here?
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Wed, Apr 18, 2018 at 11:50 AM, Andres Freund <andres@anarazel.de> wrote:
JIT has cost, and sometimes it's not beneficial. Here our heuristics
when to JIT appear to be a bit off. In the parallel world it's worse
because the JITing is duplicated for parallel workers atm.
It seems like you're describing it as if the JIT just didn't produce
gains sufficient to make up for the cost of doing it, but that's not
really the issue here AFAICS. Here the JIT actually made code that
run slower than the un-JIT-ted code. That seems like a different sort
of problem.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On April 18, 2018 12:16:35 PM PDT, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Apr 18, 2018 at 11:50 AM, Andres Freund <andres@anarazel.de>
wrote:JIT has cost, and sometimes it's not beneficial. Here our heuristics
when to JIT appear to be a bit off. In the parallel world it's worse
because the JITing is duplicated for parallel workers atm.It seems like you're describing it as if the JIT just didn't produce
gains sufficient to make up for the cost of doing it, but that's not
really the issue here AFAICS. Here the JIT actually made code that
run slower than the un-JIT-ted code. That seems like a different sort
of problem.
Not convinced that that is true - the issue is more likely that JIT work in workers is counted as execute time... Gotta add that somehow, not sure what the best way would be.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Wed, Apr 18, 2018 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:
Not convinced that that is true - the issue is more likely that JIT work in workers is counted as execute time... Gotta add that somehow, not sure what the best way would be.
Oh, that does seem like something that should be fixed. If that's
what is happening here, it's bound to confuse a lot of people.
Probably you need to add some code to
ExecParallelRetrieveInstrumentation.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
På onsdag 18. april 2018 kl. 18:26:03, skrev Andreas Joseph Krogh <
andreas@visena.com <mailto:andreas@visena.com>>:
På onsdag 18. april 2018 kl. 17:50:55, skrev Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>>:
On 2018-04-18 17:35:31 +0200, Andreas Joseph Krogh wrote:
With jit=on:
https://explain.depesz.com/s/vYB
Planning Time: 0.336 ms
JIT:
Functions: 716
Generation Time: 78.404 ms
Inlining: false
Inlining Time: 0.000 ms
Optimization: false
Optimization Time: 43.916 ms
Emission Time: 600.031 ms
Any chance this is a debug LLVM build?
What's the deal with jit making it slower?
JIT has cost, and sometimes it's not beneficial. Here our heuristics
when to JIT appear to be a bit off. In the parallel world it's worse
because the JITing is duplicated for parallel workers atm.
PostgreSQL is built with "--enable-debug --with-llvm". LLVM is the one which
comes with Ubuntu-17.10.
Some more info;
Without --enable-debug (seems this didn't impact performance mutch):
First 5 executions (prepared-statement issued from psql)
Planning Time: 47.634 ms
JIT:
Functions: 725
Generation Time: 74.748 ms
Inlining: true
Inlining Time: 90.763 ms
Optimization: true
Optimization Time: 5822.516 ms
Emission Time: 3089.127 ms
Execution Time: 16375.996 ms
After 5 executions (prepared-statement issued from psql)
Planning Time: 0.385 ms
JIT:
Functions: 716
Generation Time: 76.382 ms
Inlining: false
Inlining Time: 0.000 ms
Optimization: false
Optimization Time: 41.709 ms
Emission Time: 613.074 ms
Execution Time: 2031.830 ms
jit=off:
Planning Time: 0.171 ms
Execution Time: 832.489 ms
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas@visena.com <mailto:andreas@visena.com>
www.visena.com <https://www.visena.com>
<https://www.visena.com>
Hi,
On 2018-04-18 18:37:30 -0400, Robert Haas wrote:
On Wed, Apr 18, 2018 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:
Not convinced that that is true - the issue is more likely that JIT work in workers is counted as execute time... Gotta add that somehow, not sure what the best way would be.
Oh, that does seem like something that should be fixed. If that's
what is happening here, it's bound to confuse a lot of people.
Probably you need to add some code to
ExecParallelRetrieveInstrumentation.
I had lost track of this, and we unfortunately hadn't added an open item
back then. I think we should add it now?
RMT (with me recused), do you think we should accept the new code fixing
this would entail? And thus that this should be an open item? It's
arguably a new feature, although I don't find that a terribly convincing
position.
Greetings,
Andres Freund
On Aug 22, 2018, at 12:12 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-04-18 18:37:30 -0400, Robert Haas wrote:
On Wed, Apr 18, 2018 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:
Not convinced that that is true - the issue is more likely that JIT work in workers is counted as execute time... Gotta add that somehow, not sure what the best way would be.
Oh, that does seem like something that should be fixed. If that's
what is happening here, it's bound to confuse a lot of people.
Probably you need to add some code to
ExecParallelRetrieveInstrumentation.I had lost track of this, and we unfortunately hadn't added an open item
back then. I think we should add it now?RMT (with me recused), do you think we should accept the new code fixing
this would entail? And thus that this should be an open item? It's
arguably a new feature, although I don't find that a terribly convincing
position.
Reviewed the earlier discussion. I can argue this both ways.
As it stands right now it seems like we will be defaulting “jit = off” which
means there will only be so many people who will be using jit in PG11,
and as such we could provide the patch for 12.
However, for the people who do enable JIT in PG11, this is a scenario
that would cause the query to perform in an unexpected way that is no
fault of their own, which one would argue is a bug. I have not tried to
reproduce it myself, but from looking at the explain plans from Andreas,
I’m confident I could craft a moderately complex query that could
demonstrate the performance degradation with jit = on.
While adding the code may constitute being a new feature, it is a feature
that would prevent user frustration on something that we are highlighting as a
“major feature” of PG11, even if it’s not enabled by default.
What I would you Andres is how complex will the patch be and how much time
will it take? As such I would +1 it for open items right now unless it seems like
this could take a significant of time to come up with a proper patch, in which
case I would reconsider, but at least with it on open items we would be able
to continuously review.
Thanks,
Jonathan
På onsdag 22. august 2018 kl. 18:12:41, skrev Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>>:
Hi,
On 2018-04-18 18:37:30 -0400, Robert Haas wrote:
On Wed, Apr 18, 2018 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:
Not convinced that that is true - the issue is more likely that JIT work
in workers is counted as execute time... Gotta add that somehow, not sure what
the best way would be.
Oh, that does seem like something that should be fixed. If that's
what is happening here, it's bound to confuse a lot of people.
Probably you need to add some code to
ExecParallelRetrieveInstrumentation.
I had lost track of this, and we unfortunately hadn't added an open item
back then. I think we should add it now?
RMT (with me recused), do you think we should accept the new code fixing
this would entail? And thus that this should be an open item? It's
arguably a new feature, although I don't find that a terribly convincing
position.
Greetings,
Andres Freund
Just to be clear; The query really runs slower (wall-clock time), it's not
just the timing.
--
Andreas Joseph Krogh
On 2018-08-22 18:39:18 +0200, Andreas Joseph Krogh wrote:
Just to be clear; The query�really�runs slower (wall-clock time), it's not
just the timing.
I bet it's not actually running slower, it "just" takes longer to start
up due to the JITing in each worker. I suspect what we should do is to
multiple the cost limits by the number of workers, to model that. But
without the fixed instrumentation that's harder to see...
Greetings,
Andres Freund
På onsdag 22. august 2018 kl. 18:51:55, skrev Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>>:
On 2018-08-22 18:39:18 +0200, Andreas Joseph Krogh wrote:
Just to be clear; The query really runs slower (wall-clock time), it's not
just the timing.
I bet it's not actually running slower, it "just" takes longer to start
up due to the JITing in each worker. I suspect what we should do is to
multiple the cost limits by the number of workers, to model that. But
without the fixed instrumentation that's harder to see...
Well, yes, that might be. By "runs" I meant from me hitting ENTER in psql to
the time the query finishes...
I thought JITing of prepared queries happended once (in "prepare") so it
didn't have to do the JITing every time the query is executed. Isn't
the previously generated bytecode usable for subsequent queries?
--
Andreas Joseph Krogh
On Wednesday, August 22, 2018 6:51:55 PM CEST Andres Freund wrote:
On 2018-08-22 18:39:18 +0200, Andreas Joseph Krogh wrote:
Just to be clear; The query really runs slower (wall-clock time), it's not
just the timing.I bet it's not actually running slower, it "just" takes longer to start
up due to the JITing in each worker. I suspect what we should do is to
multiple the cost limits by the number of workers, to model that. But
without the fixed instrumentation that's harder to see...
It depends on the query. It has been shown in other threads that query can
indeed take longer to run because of JITing : if the cost is too low to fire
LLVM optimizer, the generated code can be so bad it will be slower than the
non-JIT executor.
Cf for instance a previous discussion here : http://www.postgresql-archive.org/PATCH-LLVM-tuple-deforming-improvements-td6029385.html
I think it would be interesting to try the query from this thread with a patch
forcing the LLVM codegen to O1 (I found no PassManager there to play with, it
seems to be an off/on/extreme switch ; patch 0001-LLVM-Use-the-O1-CodeGen-
level.patch from thread mentioned above).
On 2018-08-22 19:51:12 +0200, Andreas Joseph Krogh wrote:
I thought�JITing of prepared�queries happended once (in "prepare")
No, it happens when the first JITed function is executed.
�so it didn't have to do the JITing every time the query is
executed. Isn't the�previously generated bytecode usable for
subsequent queries?
No, not currently. There's some reasons preventing that (primarily that
we currently rely on addresses of certain things not to change during
execution). There's ongoing work to change that, but that's certainly
not going to be ready for v11.
Greetings,
Andres Freund
På onsdag 22. august 2018 kl. 20:52:05, skrev Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>>:
On 2018-08-22 19:51:12 +0200, Andreas Joseph Krogh wrote:
I thought JITing of prepared queries happended once (in "prepare")
No, it happens when the first JITed function is executed.
so it didn't have to do the JITing every time the query is
executed. Isn't the previously generated bytecode usable for
subsequent queries?
No, not currently. There's some reasons preventing that (primarily that
we currently rely on addresses of certain things not to change during
execution). There's ongoing work to change that, but that's certainly
not going to be ready for v11.
Greetings,
Andres Freund
Ok, thanks for clarifying.
--
Andreas Joseph Krogh
On 2018-08-22 20:48:48 +0200, Pierre Ducroquet wrote:
On Wednesday, August 22, 2018 6:51:55 PM CEST Andres Freund wrote:
On 2018-08-22 18:39:18 +0200, Andreas Joseph Krogh wrote:
Just to be clear; The query really runs slower (wall-clock time), it's not
just the timing.I bet it's not actually running slower, it "just" takes longer to start
up due to the JITing in each worker. I suspect what we should do is to
multiple the cost limits by the number of workers, to model that. But
without the fixed instrumentation that's harder to see...It depends on the query. It has been shown in other threads that query can
indeed take longer to run because of JITing : if the cost is too low to fire
LLVM optimizer, the generated code can be so bad it will be slower than the
non-JIT executor.
This largely seems to be orthogonal to what I'm talking about.
Cf for instance a previous discussion here : http://www.postgresql-archive.org/PATCH-LLVM-tuple-deforming-improvements-td6029385.html
I'd wish people stopped using www.postgresql-archive.org. It's *NOT*
postgresql.org maintained, in fact I do not know who does. It does shows
ads when downloading links, which I'm personally not ok with.
Greetings,
Andres Freund
On Aug 22, 2018, at 2:58 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På onsdag 22. august 2018 kl. 20:52:05, skrev Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>>:
On 2018-08-22 19:51:12 +0200, Andreas Joseph Krogh wrote:I thought JITing of prepared queries happended once (in "prepare")
No, it happens when the first JITed function is executed.
so it didn't have to do the JITing every time the query is
executed. Isn't the previously generated bytecode usable for
subsequent queries?No, not currently. There's some reasons preventing that (primarily that
we currently rely on addresses of certain things not to change during
execution). There's ongoing work to change that, but that's certainly
not going to be ready for v11.Greetings,
Andres Freund
Ok, thanks for clarifying.
Per earlier note[1]/messages/by-id/7F838324-064B-4A24-952C-2800CFBD39D6@postgresql.org </messages/by-id/7F838324-064B-4A24-952C-2800CFBD39D6@postgresql.org> I was able to reproduce this issue with similar results to
Andreas while running 11 Beta 3.
jit = on
https://explain.depesz.com/s/vgzD <https://explain.depesz.com/s/vgzD>
Planning Time: 0.921 ms
JIT:
Functions: 193
Generation Time: 121.595 ms
Inlining: false
Inlining Time: 0.000 ms
Optimization: false
Optimization Time: 58.045 ms
Emission Time: 1201.100 ms
Execution Time: 1628.017 ms
jit = off
https://explain.depesz.com/s/AvZM <https://explain.depesz.com/s/AvZM>
Planning Time: 1.398 ms
Execution Time: 256.473 ms
I increased the the search range I used in the query by 3x, and got these numbers:
jit=on
Planning Time: 0.931 ms
JIT:
Functions: 184
Generation Time: 126.587 ms
Inlining: true
Inlining Time: 98.865 ms
Optimization: true
Optimization Time: 20518.982 ms
Emission Time: 7270.963 ms
Execution Time: 28772.576 ms
jit=off
Planning Time: 1.527 ms
Execution Time: 959.160 ms
So, I would +1 this for open items.
Jonathan
[1]: /messages/by-id/7F838324-064B-4A24-952C-2800CFBD39D6@postgresql.org </messages/by-id/7F838324-064B-4A24-952C-2800CFBD39D6@postgresql.org>