bad JIT decision

Started by Scott Ribeover 5 years ago26 messagesgeneral
Jump to latest
#1Scott Ribe
scott_ribe@elevated-dev.com

I have come across a case where PG 12 with default JIT settings makes a dramatically bad decision. PG11 without JIT, executes the query in <1ms, PG12 with JIT takes 7s--and explain analyze attributes all that time to JIT. (The plan is the same on both 11 & 12, it's just the JIT.)

It is a complex query, with joins to subqueries etc; there is a decent amount of data (~50M rows), and around 80 partitions (by date) on the main table. The particular query that I'm testing is intended as a sort of base case, in that it queries on a small set (4) of unique ids which will not match any rows, thus the complex bits never get executed, and this is reflected in the plan, where the innermost section is:

-> Index Scan using equities_rds_id on equities e0 (cost=0.42..33.74 rows=1 width=37) (actual time=6751.892..6751.892 rows=0 loops=1)
Index Cond: (rds_id = ANY ('{..., ..., ..., ...}'::uuid[]))
Filter: (security_type = 'ETP'::text)
Rows Removed by Filter: 4

And that is ultimately followed by a couple of sets of 80'ish scans of partitions, which show never executed, pretty much as expected since there are no rows left to check. The final bit is:

JIT:
Functions: 683
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 86.439 ms, Inlining 21.994 ms, Optimization 3900.318 ms, Emission 2561.409 ms, Total 6570.161 ms

Now I think the query is not so complex that there could possibly be 683 distinct functions. I think this count must be the result of a smaller number of functions created per-partition. I can understand how that would make sense, and some testing in which I added conditions that would restrict the matches to a single partition seem to bear it out (JIT reports 79 functions in that case).

Given the magnitude of the miss in using JIT here, I am wondering: is it possible that the planner does not properly take into account the cost of JIT'ing a function for multiple partitions? Or is it that the planner doesn't have enough info about the restrictiveness of conditions, and is therefore anticipating running the functions against a great many rows?

--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/

#2David Rowley
dgrowleyml@gmail.com
In reply to: Scott Ribe (#1)
Re: bad JIT decision

On Sat, 25 Jul 2020 at 08:46, Scott Ribe <scott_ribe@elevated-dev.com> wrote:

Given the magnitude of the miss in using JIT here, I am wondering: is it possible that the planner does not properly take into account the cost of JIT'ing a function for multiple partitions? Or is it that the planner doesn't have enough info about the restrictiveness of conditions, and is therefore anticipating running the functions against a great many rows?

It does not really take into account the cost of jitting. If the total
plan cost is above the jit threshold then jit is enabled. If not, then
it's disabled.

There are various levels of jit and various thresholds that can be tweaked, see:

select name,setting from pg_settings where name like '%jit%';

But as far as each threshold goes, you either reach it or you don't.
Maybe that can be made better by considering jit in a more cost-based
way rather than by threshold, that way it might be possible to
consider jit per plan node rather than on the query as a whole. e.g,
if you have 1000 partitions and 999 of them have 1 row and the final
one has 1 billion rows, then it's likely a waste of time to jit
expressions for the 999 partitions.

However, for now, you might just want to try raising various jit
thresholds so that it only is enabled for more expensive plans.

David

#3Scott Ribe
scott_ribe@elevated-dev.com
In reply to: David Rowley (#2)
Re: bad JIT decision

On Jul 24, 2020, at 4:26 PM, David Rowley <dgrowleyml@gmail.com> wrote:

It does not really take into account the cost of jitting.

That is what I was missing.

I read about JIT when 12 was pre-release; in re-reading after my post I see that it does not attempt to estimate JIT cost. And in thinking about it, I realize that would be next to impossible to anticipate how expensive LLVM optimizstion was going to be.

In the case where a set of functions is replicated across partitions, it would be possible to do them once, then project the cost of the copies. Perhaps for PG 14 as better support for the combination of JIT optimization and highly-partitioned data ;-)

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#2)
Re: bad JIT decision

David Rowley <dgrowleyml@gmail.com> writes:

However, for now, you might just want to try raising various jit
thresholds so that it only is enabled for more expensive plans.

Yeah. I'm fairly convinced that the v12 defaults are far too low,
because we are constantly seeing complaints of this sort.

regards, tom lane

#5David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#4)
Re: bad JIT decision

On Sat, 25 Jul 2020 at 10:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

However, for now, you might just want to try raising various jit
thresholds so that it only is enabled for more expensive plans.

Yeah. I'm fairly convinced that the v12 defaults are far too low,
because we are constantly seeing complaints of this sort.

I think plan cost overestimation is a common cause of unwanted jit too.

It would be good to see the EXPLAIN ANALYZE so we knew if that was the
case here.

David

#6Scott Ribe
scott_ribe@elevated-dev.com
In reply to: Tom Lane (#4)
Re: bad JIT decision

On Jul 24, 2020, at 4:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah. I'm fairly convinced that the v12 defaults are far too low,
because we are constantly seeing complaints of this sort.

They are certainly too low for our case; not sure if for folks who are not partitioning if they're way too low.

The passive-aggressive approach would really not be good general advice for you, but I'm actually glad that in our case they were low enough to get our attention early ;-)

I think I will disable optimization, because with our partitioning scheme we will commonly see blow ups of optimization time like this one.

The inlining time in this case is still much more than the query, but it is low enough to not be noticed by users, and I think that with different variations of the parameters coming in to the query, that for the slower versions (more partitions requiring actual scans), inlining will help. Slowing down the fastest while speeding up the slower is a trade off we can take.

#7Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#4)
Re: bad JIT decision

Hi,

On 2020-07-24 18:37:02 -0400, Tom Lane wrote:

David Rowley <dgrowleyml@gmail.com> writes:

However, for now, you might just want to try raising various jit
thresholds so that it only is enabled for more expensive plans.

Yeah. I'm fairly convinced that the v12 defaults are far too low,
because we are constantly seeing complaints of this sort.

I think the issue is more that we need to take into accoutn that the
overhead of JITing scales ~linearly with the number of JITed
expressions. And that's not done right now. I've had a patch somewhere
that had a prototype implementation of changing the costing to be
#expressions * some_cost, and I think that's a lot more accurate.

Greetings,

Andres Freund

#8Andres Freund
andres@anarazel.de
In reply to: Scott Ribe (#3)
Re: bad JIT decision

Hi,

On Fri, Jul 24, 2020, at 15:32, Scott Ribe wrote:

On Jul 24, 2020, at 4:26 PM, David Rowley <dgrowleyml@gmail.com> wrote:

It does not really take into account the cost of jitting.

That is what I was missing.

I read about JIT when 12 was pre-release; in re-reading after my post I
see that it does not attempt to estimate JIT cost. And in thinking
about it, I realize that would be next to impossible to anticipate how
expensive LLVM optimizstion was going to be.

We certainly can do better than now.

In the case where a set of functions is replicated across partitions,
it would be possible to do them once, then project the cost of the
copies.

Probably not - JITing functions separately is more expensive than doing them once... The bigger benefit there is to avoid optimizing functions that are likely to be the same.

Perhaps for PG 14 as better support for the combination of JIT
optimization and highly-partitioned data ;-)

If I posted a few patches to test / address some of these issue, could you test them with your schema & querries?

Regards,

Andres

#9David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#5)
Re: bad JIT decision

On Sat, 25 Jul 2020 at 10:42, David Rowley <dgrowleyml@gmail.com> wrote:

On Sat, 25 Jul 2020 at 10:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

However, for now, you might just want to try raising various jit
thresholds so that it only is enabled for more expensive plans.

Yeah. I'm fairly convinced that the v12 defaults are far too low,
because we are constantly seeing complaints of this sort.

I think plan cost overestimation is a common cause of unwanted jit too.

It would be good to see the EXPLAIN ANALYZE so we knew if that was the
case here.

So Scott did send me the full EXPLAIN ANALYZE for this privately. He
wishes to keep the full output private.

After looking at it, it seems the portion that he pasted above, aka:

-> Index Scan using equities_rds_id on equities e0 (cost=0.42..33.74
rows=1 width=37) (actual time=6751.892..6751.892 rows=0 loops=1)
Index Cond: (rds_id = ANY ('{..., ..., ..., ...}'::uuid[]))
Filter: (security_type = 'ETP'::text)
Rows Removed by Filter: 4

Is nested at the bottom level join, about 6 joins deep. The lack of
any row being found results in upper level joins not having to do
anything, and the majority of the plan is (never executed).

David

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#9)
Re: bad JIT decision

David Rowley <dgrowleyml@gmail.com> writes:

On Sat, 25 Jul 2020 at 10:42, David Rowley <dgrowleyml@gmail.com> wrote:

I think plan cost overestimation is a common cause of unwanted jit too.
It would be good to see the EXPLAIN ANALYZE so we knew if that was the
case here.

So Scott did send me the full EXPLAIN ANALYZE for this privately. He
wishes to keep the full output private.

So ... what was the *top* line, ie total cost estimate?

regards, tom lane

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#7)
Re: bad JIT decision

Andres Freund <andres@anarazel.de> writes:

On 2020-07-24 18:37:02 -0400, Tom Lane wrote:

Yeah. I'm fairly convinced that the v12 defaults are far too low,
because we are constantly seeing complaints of this sort.

I think the issue is more that we need to take into accoutn that the
overhead of JITing scales ~linearly with the number of JITed
expressions. And that's not done right now. I've had a patch somewhere
that had a prototype implementation of changing the costing to be
#expressions * some_cost, and I think that's a lot more accurate.

Another thing we could try with much less effort is scaling it by the
number of relations in the query. There's already some code in the
plancache that tries to estimate planning effort that way, IIRC.
Such a scaling would be very legitimate for the cost of compiling
tuple-deconstruction code, and for other expressions it'd kind of
amount to an assumption that the expressions-per-table ratio is
roughly constant. If you don't like that, maybe some simple
nonlinear growth rule would work.

regards, tom lane

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#9)
Re: bad JIT decision

David Rowley <dgrowleyml@gmail.com> writes:

... nested at the bottom level join, about 6 joins deep. The lack of
any row being found results in upper level joins not having to do
anything, and the majority of the plan is (never executed).

On re-reading this, that last point struck me forcibly. If most of
the plan never gets executed, could we avoid compiling it? That is,
maybe JIT isn't JIT enough, and we should make compilation happen
at first use of an expression not during executor startup.

regards, tom lane

#13David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#10)
Re: bad JIT decision

On Sun, 26 Jul 2020 at 02:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

On Sat, 25 Jul 2020 at 10:42, David Rowley <dgrowleyml@gmail.com> wrote:

I think plan cost overestimation is a common cause of unwanted jit too.
It would be good to see the EXPLAIN ANALYZE so we knew if that was the
case here.

So Scott did send me the full EXPLAIN ANALYZE for this privately. He
wishes to keep the full output private.

So ... what was the *top* line, ie total cost estimate?

Hash Right Join (cost=1200566.17..1461446.31 rows=1651 width=141)
(actual time=5881.944..5881.944 rows=0 loops=1)

So well above the standard jit inline and optimize cost

David

#14David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#11)
Re: bad JIT decision

On Sun, 26 Jul 2020 at 02:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On 2020-07-24 18:37:02 -0400, Tom Lane wrote:

Yeah. I'm fairly convinced that the v12 defaults are far too low,
because we are constantly seeing complaints of this sort.

I think the issue is more that we need to take into accoutn that the
overhead of JITing scales ~linearly with the number of JITed
expressions. And that's not done right now. I've had a patch somewhere
that had a prototype implementation of changing the costing to be
#expressions * some_cost, and I think that's a lot more accurate.

Another thing we could try with much less effort is scaling it by the
number of relations in the query. There's already some code in the
plancache that tries to estimate planning effort that way, IIRC.
Such a scaling would be very legitimate for the cost of compiling
tuple-deconstruction code, and for other expressions it'd kind of
amount to an assumption that the expressions-per-table ratio is
roughly constant. If you don't like that, maybe some simple
nonlinear growth rule would work.

I had imagined something a bit less all or nothing. I had thought
that the planner could pretty cheaply choose if jit should occur or
not on a per-Expr level. For WHERE clause items we know "norm_selec"
and we know what baserestrictinfos come before this RestrictInfo, so
we could estimate the number of executions per item in the WHERE
clause. For Exprs in the targetlist we have the estimated rows from
the RelOptInfo. HAVING clause Exprs will be evaluated a similar number
of times. The planner could do something along the lines of
assuming, say 1000 * cpu_operator_cost to compile an Expr then assume
that a compiled Expr will be some percentage faster than an evaluated
one and only jit when the Expr is likely to be evaluated enough times
for it to be an overall win. Optimize and inline would just have
higher thresholds.

David

#15David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#12)
Re: bad JIT decision

On Sun, 26 Jul 2020 at 02:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

... nested at the bottom level join, about 6 joins deep. The lack of
any row being found results in upper level joins not having to do
anything, and the majority of the plan is (never executed).

On re-reading this, that last point struck me forcibly. If most of
the plan never gets executed, could we avoid compiling it? That is,
maybe JIT isn't JIT enough, and we should make compilation happen
at first use of an expression not during executor startup.

That's interesting. But it would introduce an additional per
evaluation cost of checking if we're doing the first execution.

David

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#7)
Re: bad JIT decision

On 2020-Jul-24, Andres Freund wrote:

I think the issue is more that we need to take into accoutn that the
overhead of JITing scales ~linearly with the number of JITed
expressions. And that's not done right now. I've had a patch somewhere
that had a prototype implementation of changing the costing to be
#expressions * some_cost, and I think that's a lot more accurate.

I don't quite understand why is it that a table with 1000 partitions
means that JIT compiles the thing 1000 times. Sure, it is possible that
some partitions have a different column layout, but it seems an easy bet
that most cases are going to have identical column layout, and so tuple
deforming can be shared. (I'm less sure about sharing a compile of an
expression, since the varno would vary. But presumably there's a way to
take the varno as an input value for the compiled expr too?) Now I
don't actually know how this works so please correct if I misunderstand
it.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17Scott Ribe
scott_ribe@elevated-dev.com
In reply to: Alvaro Herrera (#16)
Re: bad JIT decision

On Jul 27, 2020, at 4:00 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

I don't quite understand why is it that a table with 1000 partitions
means that JIT compiles the thing 1000 times. Sure, it is possible that
some partitions have a different column layout, but it seems an easy bet
that most cases are going to have identical column layout, and so tuple
deforming can be shared. (I'm less sure about sharing a compile of an
expression, since the varno would vary. But presumably there's a way to
take the varno as an input value for the compiled expr too?) Now I
don't actually know how this works so please correct if I misunderstand
it.

I'm guessing it's because of inlining. You could optimize a function that takes parameters, no problem. But what's happening is inlining, with parameters, then optimizing.

#18Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#12)
Re: bad JIT decision

Hi,

On 2020-07-25 10:54:18 -0400, Tom Lane wrote:

David Rowley <dgrowleyml@gmail.com> writes:

... nested at the bottom level join, about 6 joins deep. The lack of
any row being found results in upper level joins not having to do
anything, and the majority of the plan is (never executed).

On re-reading this, that last point struck me forcibly. If most of
the plan never gets executed, could we avoid compiling it? That is,
maybe JIT isn't JIT enough, and we should make compilation happen
at first use of an expression not during executor startup.

That unfortunately has its own downsides, in that there's significant
overhead of emitting code multiple times. I suspect that taking the
cost of all the JIT emissions together into account is the more
promising approach.

Greetings,

Andres Freund

#19Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Scott Ribe (#17)
Re: bad JIT decision

On 2020-Jul-27, Scott Ribe wrote:

On Jul 27, 2020, at 4:00 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

I don't quite understand why is it that a table with 1000 partitions
means that JIT compiles the thing 1000 times. Sure, it is possible that
some partitions have a different column layout, but it seems an easy bet
that most cases are going to have identical column layout, and so tuple
deforming can be shared. (I'm less sure about sharing a compile of an
expression, since the varno would vary. But presumably there's a way to
take the varno as an input value for the compiled expr too?) Now I
don't actually know how this works so please correct if I misunderstand
it.

I'm guessing it's because of inlining. You could optimize a function
that takes parameters, no problem. But what's happening is inlining,
with parameters, then optimizing.

Are you saying that if you crank jit_inline_above_cost beyond this
query's total cost, the problem goes away?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#18)
Re: bad JIT decision

On Tue, 28 Jul 2020 at 11:00, Andres Freund <andres@anarazel.de> wrote:

On 2020-07-25 10:54:18 -0400, Tom Lane wrote:

David Rowley <dgrowleyml@gmail.com> writes:

... nested at the bottom level join, about 6 joins deep. The lack of
any row being found results in upper level joins not having to do
anything, and the majority of the plan is (never executed).

On re-reading this, that last point struck me forcibly. If most of
the plan never gets executed, could we avoid compiling it? That is,
maybe JIT isn't JIT enough, and we should make compilation happen
at first use of an expression not during executor startup.

That unfortunately has its own downsides, in that there's significant
overhead of emitting code multiple times. I suspect that taking the
cost of all the JIT emissions together into account is the more
promising approach.

Is there some reason that we can't consider jitting on a more granular
basis? To me, it seems wrong to have a jit cost per expression and
demand that the plan cost > #nexprs * jit_expr_cost before we do jit
on anything. It'll make it pretty hard to predict when jit will occur
and doing things like adding new partitions could suddenly cause jit
to not enable for some query any more.

ISTM a more granular approach would be better. For example, for the
expression we expect to evaluate once, there's likely little point in
jitting it, but for the one on some other relation that has more rows,
where we expect to evaluate it 1 billion times, there's likely good
reason to jit that. Wouldn't it be better to consider it at the
RangeTblEntry level?

David

#21Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#19)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#21)
#23Andres Freund
andres@anarazel.de
In reply to: David Rowley (#20)
#24Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#23)
#25David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#23)
#26David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#24)