Add parameter jit_warn_above_fraction

Started by Magnus Haganderabout 4 years ago38 messageshackers
Jump to latest
#1Magnus Hagander
magnus@hagander.net

This patch adds a configuration parameter jit_warn_above_fraction that
will cause a warning to be logged if the fraction of time spent on
doing JIT is bigger than the specified one. For example, this can be
used to track down those cases where JIT ends up taking 90% of the
query runtime because of bad estimates...

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Attachments:

jit_warn_above_fraction.patchtext/x-patch; charset=US-ASCII; name=jit_warn_above_fraction.patchDownload+94-13
#2Julien Rouhaud
rjuju123@gmail.com
In reply to: Magnus Hagander (#1)
Re: Add parameter jit_warn_above_fraction

Hi,

On Fri, Feb 25, 2022 at 04:16:01PM +0100, Magnus Hagander wrote:

This patch adds a configuration parameter jit_warn_above_fraction that
will cause a warning to be logged if the fraction of time spent on
doing JIT is bigger than the specified one. For example, this can be
used to track down those cases where JIT ends up taking 90% of the
query runtime because of bad estimates...

I think that's tremendously useful, huge +1.

Just a few minor nit:

+ A value of 0 (the default)disables the warning.

missing space

+			ereport(WARNING,
+					(errmsg("JIT time was %ld ms of %d ms",
+							jit_time, msecs)));

"JIT time" may a bit obscure for users, how about "JIT total processing time"?"

+			gettext_noop("Sets the fraction of query time spent on JIT before writing"
+						 "a warning to the log."),
+			gettext_noop("Write a message tot he server log if more than this"
+						 "fraction of the query runtime is spent on JIT."
+						 "Zero turns off the warning.")

missing spaces in the concatenated strings.

The rest of the patch looks good to me.

#3Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#1)
Re: Add parameter jit_warn_above_fraction

Hi,

On 2022-02-25 16:16:01 +0100, Magnus Hagander wrote:

This patch adds a configuration parameter jit_warn_above_fraction that
will cause a warning to be logged if the fraction of time spent on
doing JIT is bigger than the specified one. For example, this can be
used to track down those cases where JIT ends up taking 90% of the
query runtime because of bad estimates...

Hm. Could it make sense to do this as a auto_explain feature?

Greetings,

Andres Freund

#4Justin Pryzby
pryzby@telsasoft.com
In reply to: Magnus Hagander (#1)
Re: Add parameter jit_warn_above_fraction

On Fri, Feb 25, 2022 at 04:16:01PM +0100, Magnus Hagander wrote:

+	{
+		{"jit_warn_above_fraction", PGC_SUSET, LOGGING_WHEN,
+			gettext_noop("Sets the fraction of query time spent on JIT before writing"
+						 "a warning to the log."),
+			gettext_noop("Write a message tot he server log if more than this"
+						 "fraction of the query runtime is spent on JIT."
+						 "Zero turns off the warning.")
+		},
+		&jit_warn_above_fraction,
+		0.0, 0.0, 1.0,
+		NULL, NULL, NULL
+	},

Should be PGC_USERSET ?

+ gettext_noop("Write a message tot he server log if more than this"

to the

+       if (jit_enabled && jit_warn_above_fraction > 0)                                                                                                                                                                            
+       {                                                                                                                                                                                                                          
+               int64 jit_time =                                                                                                                                                                                                   
+                       INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.generation_counter) +                                                                                                                     
+                       INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.inlining_counter) +                                                                                                                       
+                       INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.optimization_counter) +                                                                                                                   
+                       INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.emission_counter);                                                                                                                        
+                                                                                                                                                                                                                                  
+               if (jit_time > msecs * jit_warn_above_fraction)                                                                                                                                                                    
+               {                                                                                                                                                                                                                  
+                       ereport(WARNING,                                                                                                                                                                                           
+                                       (errmsg("JIT time was %ld ms of %d ms",                                                                                                                                                    
+                                                       jit_time, msecs)));                                                                                                                                                        
+               }                                                                                                                                                                                                                  
+       }                                                                                                                                                                                                                          

I think it should be a NOTICE (or less?)

Is it more useful if this is applied combined with log_min_duration_statement ?

It's easy to imagine a query for which the planner computes a high cost, but
actually runs quickly. You might get a bunch of WARNINGs that the query took
10 MS and JIT was 75% of that, even if you don't care about queries that take
less than 10 SEC.

I should say that this is already available by processing the output of
autoexplain.

--
Justin

#5Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#3)
Re: Add parameter jit_warn_above_fraction

On Fri, Feb 25, 2022 at 5:20 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-02-25 16:16:01 +0100, Magnus Hagander wrote:

This patch adds a configuration parameter jit_warn_above_fraction that
will cause a warning to be logged if the fraction of time spent on
doing JIT is bigger than the specified one. For example, this can be
used to track down those cases where JIT ends up taking 90% of the
query runtime because of bad estimates...

Hm. Could it make sense to do this as a auto_explain feature?

It could be. But I was looking for something a lot more "light weight"
than having to install an extension. But yes, if we wanted to, we
could certainly change jit_warn_above_fraction to be
auto_explain.log_min_jit_fraction or something like that, and do
basically the same thing. But then, we could also have
log_min_duration_statement be part of auto_explain instead, so it's
all about where to draw the line :)

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#6Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#5)
Re: Add parameter jit_warn_above_fraction

Hi,

On 2022-02-25 17:28:41 +0100, Magnus Hagander wrote:

On Fri, Feb 25, 2022 at 5:20 PM Andres Freund <andres@anarazel.de> wrote:

On 2022-02-25 16:16:01 +0100, Magnus Hagander wrote:

This patch adds a configuration parameter jit_warn_above_fraction that
will cause a warning to be logged if the fraction of time spent on
doing JIT is bigger than the specified one. For example, this can be
used to track down those cases where JIT ends up taking 90% of the
query runtime because of bad estimates...

Hm. Could it make sense to do this as a auto_explain feature?

It could be. But I was looking for something a lot more "light weight"
than having to install an extension. But yes, if we wanted to, we
could certainly change jit_warn_above_fraction to be
auto_explain.log_min_jit_fraction or something like that, and do
basically the same thing. But then, we could also have
log_min_duration_statement be part of auto_explain instead, so it's
all about where to draw the line :)

I guess it feels a tad on the "too narrow/specific" side of things for the
general code. We don't have log_min_duration_{parsing,planning,execution}
either. But I also get it. So I just wanted to raise it ;)

Greetings,

Andres Freund

#7Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#6)
Re: Add parameter jit_warn_above_fraction

On Fri, Feb 25, 2022 at 5:47 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-02-25 17:28:41 +0100, Magnus Hagander wrote:

On Fri, Feb 25, 2022 at 5:20 PM Andres Freund <andres@anarazel.de> wrote:

On 2022-02-25 16:16:01 +0100, Magnus Hagander wrote:

This patch adds a configuration parameter jit_warn_above_fraction that
will cause a warning to be logged if the fraction of time spent on
doing JIT is bigger than the specified one. For example, this can be
used to track down those cases where JIT ends up taking 90% of the
query runtime because of bad estimates...

Hm. Could it make sense to do this as a auto_explain feature?

It could be. But I was looking for something a lot more "light weight"
than having to install an extension. But yes, if we wanted to, we
could certainly change jit_warn_above_fraction to be
auto_explain.log_min_jit_fraction or something like that, and do
basically the same thing. But then, we could also have
log_min_duration_statement be part of auto_explain instead, so it's
all about where to draw the line :)

I guess it feels a tad on the "too narrow/specific" side of things for the
general code. We don't have log_min_duration_{parsing,planning,execution}
either. But I also get it. So I just wanted to raise it ;)

Oh it's absolutely a valid issue to raise :) In fact, one could
definitely argue that we should have a parameter for auto_explain *as
well*.

It's true we don't have those -- but it's also in my experience a lot
more common to be an issue with JIT. And unfortunately the current
"solution" people tend to apply is to globally disable JIT, and
there's no similar "globally disable parsing or "globally disable
planning" pattern, for obvious reasons.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#8Magnus Hagander
magnus@hagander.net
In reply to: Justin Pryzby (#4)
Re: Add parameter jit_warn_above_fraction

On Fri, Feb 25, 2022 at 5:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Fri, Feb 25, 2022 at 04:16:01PM +0100, Magnus Hagander wrote:

+     {
+             {"jit_warn_above_fraction", PGC_SUSET, LOGGING_WHEN,
+                     gettext_noop("Sets the fraction of query time spent on JIT before writing"
+                                              "a warning to the log."),
+                     gettext_noop("Write a message tot he server log if more than this"
+                                              "fraction of the query runtime is spent on JIT."
+                                              "Zero turns off the warning.")
+             },
+             &jit_warn_above_fraction,
+             0.0, 0.0, 1.0,
+             NULL, NULL, NULL
+     },

Should be PGC_USERSET ?

Yes. Definitely. Copy/paste thinko.

+ gettext_noop("Write a message tot he server log if more than this"

to the

Yup.

+       if (jit_enabled && jit_warn_above_fraction > 0)
+       {
+               int64 jit_time =
+                       INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.generation_counter) +
+                       INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.inlining_counter) +
+                       INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.optimization_counter) +
+                       INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.emission_counter);
+
+               if (jit_time > msecs * jit_warn_above_fraction)
+               {
+                       ereport(WARNING,
+                                       (errmsg("JIT time was %ld ms of %d ms",
+                                                       jit_time, msecs)));
+               }
+       }

I think it should be a NOTICE (or less?)

Hmm. I'm not so sure.

Other similar parameters often use LOG, but the downside of that is
that it won't be sent to the client.

The problem with using NOTICE is that it won't go to the log by
default. It needs to be at least warning to do that.

Is it more useful if this is applied combined with log_min_duration_statement ?

It's easy to imagine a query for which the planner computes a high cost, but
actually runs quickly. You might get a bunch of WARNINGs that the query took
10 MS and JIT was 75% of that, even if you don't care about queries that take
less than 10 SEC.

Yeah, that's definitely a problem. But which way would you want to tie
it to log_min_duration_statement? That a statement would both have to
take longer than log_min_duration_statement *and* have JIT above a
certain percentage? In my experience that is instead likely to miss
most of the interesting times. Maybe it would need a separate guc for
the timing, but I'd like to avoid that, I'm not sure it's a function
worth *that* much...

I should say that this is already available by processing the output of
autoexplain.

True. You just can't trigger it based on it. (and it can be somewhat
of a PITA to parse things out of auto_explain on busy systems, but it
does give a lot of very useful details)

Meanwhile here is an updated based on your other comments above, as
well as those from Julien.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Attachments:

jit_warn_above_fraction_v2.patchtext/x-patch; charset=US-ASCII; name=jit_warn_above_fraction_v2.patchDownload+94-13
#9Justin Pryzby
pryzby@telsasoft.com
In reply to: Magnus Hagander (#8)
Re: Add parameter jit_warn_above_fraction

On Mon, Mar 07, 2022 at 01:10:32PM +0100, Magnus Hagander wrote:

On Fri, Feb 25, 2022 at 5:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

I think it should be a NOTICE (or less?)

Hmm. I'm not so sure.

Other similar parameters often use LOG, but the downside of that is
that it won't be sent to the client.

The problem with using NOTICE is that it won't go to the log by
default. It needs to be at least warning to do that.

Anyone who uses this parameter is aleady going to be changing GUCs, so it
doesn't need to work by default. The people who are likely to enable this
already monitor their logs and have probably customized their logging
configuration.

Is it more useful if this is applied combined with log_min_duration_statement ?

It's easy to imagine a query for which the planner computes a high cost, but
actually runs quickly. You might get a bunch of WARNINGs that the query took
10 MS and JIT was 75% of that, even if you don't care about queries that take
less than 10 SEC.

Yeah, that's definitely a problem. But which way would you want to tie
it to log_min_duration_statement? That a statement would both have to
take longer than log_min_duration_statement *and* have JIT above a
certain percentage? In my experience that is instead likely to miss
most of the interesting times.

I don't understand - why doesn't it combine trivially with
log_min_duration_statement? Are you saying that the default / pre-existing min
duration may not log all of the intended queries ? I think that just means the
configuration needs to be changed. The GUC needs to allow/help finding these
JIT issues, but going to require an admin's interaction in any case. Avoiding
false positives may be important for it to be useful at all.

Hmm .. should it also combine with min_sample_rate ? Maybe that addresses your
concern.

--
Justin

#10Magnus Hagander
magnus@hagander.net
In reply to: Justin Pryzby (#9)
Re: Add parameter jit_warn_above_fraction

On Mon, Mar 7, 2022 at 2:09 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Mon, Mar 07, 2022 at 01:10:32PM +0100, Magnus Hagander wrote:

On Fri, Feb 25, 2022 at 5:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

I think it should be a NOTICE (or less?)

Hmm. I'm not so sure.

Other similar parameters often use LOG, but the downside of that is
that it won't be sent to the client.

The problem with using NOTICE is that it won't go to the log by
default. It needs to be at least warning to do that.

Anyone who uses this parameter is aleady going to be changing GUCs, so it
doesn't need to work by default. The people who are likely to enable this
already monitor their logs and have probably customized their logging
configuration.

Sure, but if you set log_min_messagse to NOTICE you are likely going
to flood your logs beyond usefulness. And the whole idea of this
parameter is you can keep it on during "normal times" to catch
outliers.

Is it more useful if this is applied combined with log_min_duration_statement ?

It's easy to imagine a query for which the planner computes a high cost, but
actually runs quickly. You might get a bunch of WARNINGs that the query took
10 MS and JIT was 75% of that, even if you don't care about queries that take
less than 10 SEC.

Yeah, that's definitely a problem. But which way would you want to tie
it to log_min_duration_statement? That a statement would both have to
take longer than log_min_duration_statement *and* have JIT above a
certain percentage? In my experience that is instead likely to miss
most of the interesting times.

I don't understand - why doesn't it combine trivially with
log_min_duration_statement? Are you saying that the default / pre-existing min
duration may not log all of the intended queries ? I think that just means the
configuration needs to be changed. The GUC needs to allow/help finding these
JIT issues, but going to require an admin's interaction in any case. Avoiding
false positives may be important for it to be useful at all.

Hmm .. should it also combine with min_sample_rate ? Maybe that addresses your
concern.

For one, what if I don't want to log any queries unless this is the
case? I leave log_min_duration_statement at -1...
Or what if I want to log say only queries taking more than 60 seconds.
Now I will miss all queries taking 30 seconds where 99.9% of the time
is spent on JIT which I definitely *would* want.

If all I wanted was to get the JIT information for all queries over a
certain amount of time, then as someone else mentioned as well, I can
just use auto_explain. The point here is I want to trigger the logging
for queries where the runtime is *shorter* than what I have in
log_min_duration_statement, but a large portion of the runtime is
spent on JIT.

I maen. If I have a bunch of queries that take 10ms and 75% is spent
in JIT (per your example above), that is actually a problem and I'd
like to know about it and fix it (by tuning the jit triggers either
globally or locally). Yes, at 10ms I *may* ignore it, and in general
I'd use pg_stat_statements for that. But at 100ms with 75% JIT it
starts becoming exactly the situation that I intended this patch to
help with.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#11Justin Pryzby
pryzby@telsasoft.com
In reply to: Magnus Hagander (#10)
Re: Add parameter jit_warn_above_fraction

On Mon, Mar 07, 2022 at 04:02:16PM +0100, Magnus Hagander wrote:

On Mon, Mar 7, 2022 at 2:09 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Mon, Mar 07, 2022 at 01:10:32PM +0100, Magnus Hagander wrote:

On Fri, Feb 25, 2022 at 5:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

I think it should be a NOTICE (or less?)

Hmm. I'm not so sure.

Other similar parameters often use LOG, but the downside of that is
that it won't be sent to the client.

The problem with using NOTICE is that it won't go to the log by
default. It needs to be at least warning to do that.

Anyone who uses this parameter is aleady going to be changing GUCs, so it
doesn't need to work by default. The people who are likely to enable this
already monitor their logs and have probably customized their logging
configuration.

Sure, but if you set log_min_messagse to NOTICE you are likely going
to flood your logs beyond usefulness. And the whole idea of this
parameter is you can keep it on during "normal times" to catch
outliers.

I do set log_min_messages - not to NOTICE, but to INFO ;)

Would NOTICEs really flood most people's logs ? From what ? They're aren't
that many, and most seem to be for DDL and utility commands. We do, uh, more
of that than most people would say is reasonable.

I see a couple hundred of these per day:
| PostGIS: Unable to compute statistics for...
And during deployment, a few hundred more for IF NOT EXIST commands.
That's it.

I still think this GUC should not be WARNING. If it's a warning, then *this*
will cause previously-quiet logs to be flooded - the opposite problem. Which
is not desirable for a new guc.

https://www.postgresql.org/docs/current/runtime-config-logging.html

INFO Provides information implicitly requested by the user, e.g., output from VACUUM VERBOSE. INFO INFORMATION
NOTICE Provides information that might be helpful to users, e.g., notice of truncation of long identifiers. NOTICE INFORMATION
WARNING Provides warnings of likely problems, e.g., COMMIT outside a transaction block. NOTICE WARNING

I don't understand - why doesn't it combine trivially with
log_min_duration_statement? Are you saying that the default / pre-existing min
duration may not log all of the intended queries ? I think that just means the
configuration needs to be changed. The GUC needs to allow/help finding these
JIT issues, but going to require an admin's interaction in any case. Avoiding
false positives may be important for it to be useful at all.

Hmm .. should it also combine with min_sample_rate ? Maybe that addresses your
concern.

For one, what if I don't want to log any queries unless this is the
case? I leave log_min_duration_statement at -1...

Then you will conclude that our logging is inadequate for your case.
You need to filter the lines which don't interest you.

Or what if I want to log say only queries taking more than 60 seconds.
Now I will miss all queries taking 30 seconds where 99.9% of the time
is spent on JIT which I definitely *would* want.

If you're unwilling to deal with log entries which are uninteresting, then
clearly you'll need a separate GUC just for log_min_duration_jit_above_fraction
(and maybe another one for jit_above_duration_log_level). Or implement generic
log filtering based on file/severity/message.

As I see it: most people won't set this GUC at all. Those who do might enable
it with a high threshold value of log_min_duration_statement (probably a
pre-existing, configured value) to see what there is to see - the worst issues.

A small fraction of people will enable it with a low threshold value for
log_min_duration_statement - maybe only temporarily or per-session. They'll be
willing to sift through the logs to look for interesting things, like queries
that take 10ms, 75% of which is in JIT, but run hundreds of times per second.

This feature intends to make it easier to identify queries affected by this
problem, but it doesn't need to be possible to do that without logging anything
else.

--
Justin

#12Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#8)
Re: Add parameter jit_warn_above_fraction

Hi,

On 2022-03-07 13:10:32 +0100, Magnus Hagander wrote:

Meanwhile here is an updated based on your other comments above, as
well as those from Julien.

This fails on cfbot, due to compiler warnings:
https://cirrus-ci.com/task/5127667648299008?logs=mingw_cross_warning#L390

Greetings,

Andres Freund

#13Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#12)
Re: Add parameter jit_warn_above_fraction

On Tue, Mar 22, 2022 at 12:50 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-03-07 13:10:32 +0100, Magnus Hagander wrote:

Meanwhile here is an updated based on your other comments above, as
well as those from Julien.

This fails on cfbot, due to compiler warnings:
https://cirrus-ci.com/task/5127667648299008?logs=mingw_cross_warning#L390

Huh. That's annoying. I forgot int64 is %d on linux and %lld on Windows :/

PFA a fix for that.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

Attachments:

jit_warn_above_fraction_v3.patchtext/x-patch; charset=US-ASCII; name=jit_warn_above_fraction_v3.patchDownload+94-13
#14Julien Rouhaud
rjuju123@gmail.com
In reply to: Magnus Hagander (#13)
Re: Add parameter jit_warn_above_fraction

Hi,

On Mon, Mar 28, 2022 at 10:11:16PM +0200, Magnus Hagander wrote:

On Tue, Mar 22, 2022 at 12:50 AM Andres Freund <andres@anarazel.de> wrote:

This fails on cfbot, due to compiler warnings:
https://cirrus-ci.com/task/5127667648299008?logs=mingw_cross_warning#L390

Huh. That's annoying. I forgot int64 is %d on linux and %lld on Windows :/

PFA a fix for that.

The v3 is now stable on all platforms according to the cfbot, and apart from
the log level the patch looks good to me.

The last remaining thing is whether logging at WARNING level is the correct
choice. I'm personally fine with it, because the people who are going to use
it will probably use the same approach as for log_min_duration_statements:
enable it first with a high value, and check if that just lead to a massive log
spam. If not, see if there's anything that needs attention and fix it,
otherwise keep lowering it and keep going.

Also, jitting isn't particularly fast in general, so it's probably somewhat
unlikely to have a massive amount of queries that both have a cost higher than
the jit threshold cost and still run fast enough to spam the log without jit
slowing everything down.

I'm not sure what status the patch should be. I think it's the committer's
decision to chose which log level to use, and that won't affect the patch
enough to justify another review, so I'm switching the patch to ready for
committer. This doesn't prevent any further discussions for the log level.

#15Robert Haas
robertmhaas@gmail.com
In reply to: Julien Rouhaud (#14)
Re: Add parameter jit_warn_above_fraction

On Tue, Mar 29, 2022 at 6:09 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

The last remaining thing is whether logging at WARNING level is the correct
choice. I'm personally fine with it, because the people who are going to use
it will probably use the same approach as for log_min_duration_statements:
enable it first with a high value, and check if that just lead to a massive log
spam. If not, see if there's anything that needs attention and fix it,
otherwise keep lowering it and keep going.

I think WARNING is fine. After all, the parameter is called
"jit_warn_above_fraction". Yeah, we could also rename the parameter,
but I think "jit_notice_above_fraction" would be harder to understand.
It feels very intuitive to just say "warn me if X thing happens" and I
don't see a reason why we shouldn't just do that.

--
Robert Haas
EDB: http://www.enterprisedb.com

#16David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#15)
Re: Add parameter jit_warn_above_fraction

On Wed, 30 Mar 2022 at 02:38, Robert Haas <robertmhaas@gmail.com> wrote:

I think WARNING is fine. After all, the parameter is called
"jit_warn_above_fraction".

I had a think about this patch. I guess it's a little similar to
checkpoint_warning. The good thing about the checkpoint_warning is
that in the LOG message we give instructions about how the DBA can fix
the issue, i.e increase max_wal_size.

With the proposed patch I see there is no hint about what might be
done to remove/reduce the warnings. I imagine that's because it's not
all that clear which GUC should be changed. In my view, likely
jit_above_cost is the most relevant but there is also
jit_inline_above_cost, jit_optimize_above_cost, jit_tuple_deforming
and jit_expressions which are relevant too.

If we go with this patch, the problem I see here is that the amount
of work the JIT compiler must do for a given query depends mostly on
the number of expressions that must be compiled in the query (also to
a lesser extent jit_inline_above_cost, jit_optimize_above_cost,
jit_tuple_deforming and jit_expressions). The DBA does not really have
much control over the number of expressions in the query. All he or
she can do to get rid of the warning is something like increase
jit_above_cost. After a few iterations of that, the end result is
that jit_above_cost is now high enough that JIT no longer triggers
for, say, that query to that table with 1000 partitions where no
plan-time pruning takes place. Is that really a good thing? It likely
means that we just rarely JIT anything at all!

I really believe that the main problem here is that JIT only enables
when the *total* plan cost reaches a certain threshold. The number of
expressions to be compiled is not a factor in the decision at all.
That means that even if the total execution time of a plan was a true
reflection of the total estimated plan cost, then the fraction of time
spent (as is measured by jit_warn_above_fraction) doing JIT would
entirely depend on the number of expressions to compile. Of course,
the planner's not that good, but does that not indicate that the JIT
costing should really account for the number of expressions and not
just the total plan cost?

Anyway, what I'm trying to indicate here is that JIT is pretty much
impossible to tune properly and I don't really see why adding a
warning about it not being tuned correctly would help anyone. I think
it would be better to focus on making improvements to how the JIT
costing works.

I did propose a patch to address this in [1]/messages/by-id/CAApHDvpQJqLrNOSi8P1JLM8YE2C+ksKFpSdZg=q6sTbtQ-v=aw@mail.gmail.com. It does need more work
and I do plan to come back to it for v16.

I'd much rather see us address the costing problem before adding some
warning, especially a warning where it's not clear how to make go
away.

David

[1]: /messages/by-id/CAApHDvpQJqLrNOSi8P1JLM8YE2C+ksKFpSdZg=q6sTbtQ-v=aw@mail.gmail.com

In reply to: David Rowley (#16)
Re: Add parameter jit_warn_above_fraction

On Tue, Mar 29, 2022 at 1:06 PM David Rowley <dgrowleyml@gmail.com> wrote:

That means that even if the total execution time of a plan was a true
reflection of the total estimated plan cost, then the fraction of time
spent (as is measured by jit_warn_above_fraction) doing JIT would
entirely depend on the number of expressions to compile. Of course,
the planner's not that good, but does that not indicate that the JIT
costing should really account for the number of expressions and not
just the total plan cost?

That's a good point. The difference between the actual cost of
executing the query with and without JIT'ing is what we care about,
for the most part. Maybe we could do a lot better just by inventing a
fairly crude model that captures the benefits of JIT'ing -- that's
what "counting the number of expressions" sounds like to me. This
model could probably assume that JIT'ing itself was free -- maybe
something this simple would work well.

The planner has traditionally used the cost units to determine the
cheapest plan; it compared total plan cost for plans that were taken
from the universe of possible plans for *one specific query*. That's
completely different to any model that expects plan costs to be
meaningful in an absolute sense. I'm not completely sure how much that
difference matters, but I suspect that the answer is: "it depends, but
often it matters a great deal".

--
Peter Geoghegan

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#17)
Re: Add parameter jit_warn_above_fraction

Peter Geoghegan <pg@bowt.ie> writes:

On Tue, Mar 29, 2022 at 1:06 PM David Rowley <dgrowleyml@gmail.com> wrote:

That means that even if the total execution time of a plan was a true
reflection of the total estimated plan cost, then the fraction of time
spent (as is measured by jit_warn_above_fraction) doing JIT would
entirely depend on the number of expressions to compile. Of course,
the planner's not that good, but does that not indicate that the JIT
costing should really account for the number of expressions and not
just the total plan cost?

That's a good point.

I think David's questions are sufficiently cogent and difficult
that we should not add jit_warn_above_fraction at this time.
Maybe we'll eventually decide it's the best we can do; but I do
not think we have a problem there that's so pressing that we need
to rush out a partially-baked bandaid. Especially not at the
tail end of the development cycle, with not much time to gain
experience with it before we ship.

regards, tom lane

In reply to: Tom Lane (#18)
Re: Add parameter jit_warn_above_fraction

On Tue, Mar 29, 2022 at 3:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think David's questions are sufficiently cogent and difficult
that we should not add jit_warn_above_fraction at this time.

+1

--
Peter Geoghegan

#20Andres Freund
andres@anarazel.de
In reply to: David Rowley (#16)
Re: Add parameter jit_warn_above_fraction

Hi,

On 2022-03-30 09:05:44 +1300, David Rowley wrote:

I really believe that the main problem here is that JIT only enables
when the *total* plan cost reaches a certain threshold.

Yes, that is/was a clear design mistake. It wasn't quite as bad back when it
was written - partitioning blows the problem up by an order of magnitude or
three. . The costing really needs to at least multiply the number of
to-be-compiled expressions with some cost to decide whether the cost of JITing
is worth it, rather than making "flat" decision.

I did propose a patch to address this in [1]. It does need more work
and I do plan to come back to it for v16.

FWIW, that doesn't seem quite right - won't it stop JITing e.g. on the inner
side of a nested loop, just because it's cheap, even though that's where the
bulk of the benefits comes from?

Greetings,

Andres Freund

#21David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#21)
#23Andres Freund
andres@anarazel.de
In reply to: David Rowley (#21)
#24David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#23)
#25Andres Freund
andres@anarazel.de
In reply to: David Rowley (#24)
#26David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#25)
#27Magnus Hagander
magnus@hagander.net
In reply to: David Rowley (#16)
#28Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#27)
#29Stephen Frost
sfrost@snowman.net
In reply to: Magnus Hagander (#28)
#30David Rowley
dgrowleyml@gmail.com
In reply to: Magnus Hagander (#28)
#31Magnus Hagander
magnus@hagander.net
In reply to: David Rowley (#30)
#32Stephen Frost
sfrost@snowman.net
In reply to: Magnus Hagander (#31)
#33Julien Rouhaud
rjuju123@gmail.com
In reply to: Stephen Frost (#32)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Julien Rouhaud (#33)
#35Julien Rouhaud
rjuju123@gmail.com
In reply to: Tom Lane (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Julien Rouhaud (#35)
#37Julien Rouhaud
rjuju123@gmail.com
In reply to: Tom Lane (#36)
#38Ibrar Ahmed
ibrar.ahmad@gmail.com
In reply to: Julien Rouhaud (#37)