Caching query plan costs

Started by Bruce Momjianover 7 years ago10 messages
#1Bruce Momjian
bruce@momjian.us

This email thread from last year:

/messages/by-id/CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw@mail.gmail.com

got stuck on when to check for certain optimizations in the planner. It
was discussed that knowing a plan was going to be expensive _before_
planning could help determine when to try certain expensive
optimizations.

What if we globally or locally cache the _cost_ of plans, so we can
consult that cache before planning and enable certain optimizations? It
also might be useful to cache some of the unexpected things found during
execution, e.g. inaccurate cost estimates.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: Caching query plan costs

Bruce Momjian <bruce@momjian.us> writes:

What if we globally or locally cache the _cost_ of plans, so we can
consult that cache before planning and enable certain optimizations?

But what would you use as cache key? And how's this help if we haven't
seen a similar query before in the session?

regards, tom lane

#3Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#2)
Re: Caching query plan costs

On Mon, Sep 3, 2018 at 01:30:33PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

What if we globally or locally cache the _cost_ of plans, so we can
consult that cache before planning and enable certain optimizations?

But what would you use as cache key? And how's this help if we haven't

Uh, I assume we would do what pg_stat_statements does and remove the
constants an hash that.

seen a similar query before in the session?

Well, if it was global we could use output from another session.

I guess my point is that this only used to turn on micro-optimizations
and maybe parallelism and JIT, so it doesn't have to be 100% accurate.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#4Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#3)
Re: Caching query plan costs

On September 3, 2018 11:33:35 AM PDT, Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Sep 3, 2018 at 01:30:33PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

What if we globally or locally cache the _cost_ of plans, so we can
consult that cache before planning and enable certain

optimizations?

But what would you use as cache key? And how's this help if we

haven't

Uh, I assume we would do what pg_stat_statements does and remove the
constants an hash that.

That's not particularly cheap... Constants heavily influence planning choices, so I don't think they actually could be removed.

seen a similar query before in the session?

Well, if it was global we could use output from another session.

I guess my point is that this only used to turn on micro-optimizations
and maybe parallelism

What kind of micro opts are you thinking of? The cases I remember are more in the vein of doing additional complex optimizations (join removal, transforming ORs into UNION, more complex analysis of predicates...).

Parallelism would definitely benefit from earlier knowledge, although I suspect some base rel analysis might be more realistic, because it's far from guaranteed that queries are ever repeated in a similar enough manner.

and JIT, so it doesn't have to be 100% accurate.

JIT decision is done after main planning, so we know the cost.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#5Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#4)
Re: Caching query plan costs

On Mon, Sep 3, 2018 at 11:42:31AM -0700, Andres Freund wrote:

On September 3, 2018 11:33:35 AM PDT, Bruce Momjian <bruce@momjian.us>
wrote:

On Mon, Sep 3, 2018 at 01:30:33PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

What if we globally or locally cache the _cost_ of plans, so we
can consult that cache before planning and enable certain

optimizations?

But what would you use as cache key? And how's this help if we

haven't

Uh, I assume we would do what pg_stat_statements does and remove the
constants an hash that.

That's not particularly cheap... Constants heavily influence planning
choices, so I don't think they actually could be removed.

Oh.

seen a similar query before in the session?

Well, if it was global we could use output from another session.

I guess my point is that this only used to turn on
micro-optimizations and maybe parallelism

What kind of micro opts are you thinking of? The cases I remember
are more in the vein of doing additional complex optimizations (join
removal, transforming ORs into UNION, more complex analysis of
predicates...).

Parallelism would definitely benefit from earlier knowledge, although
I suspect some base rel analysis might be more realistic, because it's
far from guaranteed that queries are ever repeated in a similar enough
manner.

Yes, no question that we would need something that could detect a
sufficient percentage of previous queries.

and JIT, so it doesn't have to be 100% accurate.

JIT decision is done after main planning, so we know the cost.

Well, as I remember, we are considering disabling JIT in PG 11 because
of the use of fixed costs to trigger it. Could executor information
help decide to use JIT?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#6Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Bruce Momjian (#5)
Re: Caching query plan costs

On 09/03/2018 08:56 PM, Bruce Momjian wrote:

On Mon, Sep 3, 2018 at 11:42:31AM -0700, Andres Freund wrote:

On September 3, 2018 11:33:35 AM PDT, Bruce Momjian <bruce@momjian.us>
wrote:

On Mon, Sep 3, 2018 at 01:30:33PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

What if we globally or locally cache the _cost_ of plans, so we
can consult that cache before planning and enable certain

optimizations?

But what would you use as cache key? And how's this help if we

haven't

Uh, I assume we would do what pg_stat_statements does and remove the
constants an hash that.

That's not particularly cheap... Constants heavily influence planning
choices, so I don't think they actually could be removed.

Oh.

Yeah, it doesn't really tell you the cost for the plan, because a single
query string may use vastly different plans for different constants.
Which pretty much is why we have pg_stat_plans.

Imagine a query that matches 99% of the table for one value and 1% for
another one. That's going to produce rather different plans for each
(say, seqscan vs. index scan), with very different costs.

seen a similar query before in the session?

Well, if it was global we could use output from another session.

I guess my point is that this only used to turn on
micro-optimizations and maybe parallelism

What kind of micro opts are you thinking of? The cases I remember
are more in the vein of doing additional complex optimizations (join
removal, transforming ORs into UNION, more complex analysis of
predicates...).

Parallelism would definitely benefit from earlier knowledge, although
I suspect some base rel analysis might be more realistic, because it's
far from guaranteed that queries are ever repeated in a similar enough
manner.

Yes, no question that we would need something that could detect a
sufficient percentage of previous queries.

and JIT, so it doesn't have to be 100% accurate.

JIT decision is done after main planning, so we know the cost.

Well, as I remember, we are considering disabling JIT in PG 11 because
of the use of fixed costs to trigger it. Could executor information
help decide to use JIT?

Isn't that issue more about what is the right default threshold, rather
than using fixed costs in principle?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#7Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#5)
Re: Caching query plan costs

On 2018-09-03 14:56:28 -0400, Bruce Momjian wrote:

On Mon, Sep 3, 2018 at 11:42:31AM -0700, Andres Freund wrote:

and JIT, so it doesn't have to be 100% accurate.

JIT decision is done after main planning, so we know the cost.

Well, as I remember, we are considering disabling JIT in PG 11 because
of the use of fixed costs to trigger it. Could executor information
help decide to use JIT?

I don't think so. The issues with JIT planning are more that it's
costing is simplistic (for good-ish reason, to avoid increasing the
number of plans), and that there's no caching (lots of infrastructure
work needed).

Greetings,

Andres Freund

#8Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#7)
Re: Caching query plan costs

On Mon, Sep 3, 2018 at 02:53:59PM -0700, Andres Freund wrote:

On 2018-09-03 14:56:28 -0400, Bruce Momjian wrote:

On Mon, Sep 3, 2018 at 11:42:31AM -0700, Andres Freund wrote:

and JIT, so it doesn't have to be 100% accurate.

JIT decision is done after main planning, so we know the cost.

Well, as I remember, we are considering disabling JIT in PG 11 because
of the use of fixed costs to trigger it. Could executor information
help decide to use JIT?

I don't think so. The issues with JIT planning are more that it's
costing is simplistic (for good-ish reason, to avoid increasing the
number of plans), and that there's no caching (lots of infrastructure
work needed).

Uh, yeah, that was my question. If we knew the cost was high before we
plan, could we realistically increase the number of plans to avoid the
cost-trigger issue?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#9Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#8)
Re: Caching query plan costs

On September 3, 2018 3:01:29 PM PDT, Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Sep 3, 2018 at 02:53:59PM -0700, Andres Freund wrote:

On 2018-09-03 14:56:28 -0400, Bruce Momjian wrote:

On Mon, Sep 3, 2018 at 11:42:31AM -0700, Andres Freund wrote:

and JIT, so it doesn't have to be 100% accurate.

JIT decision is done after main planning, so we know the cost.

Well, as I remember, we are considering disabling JIT in PG 11

because

of the use of fixed costs to trigger it. Could executor

information

help decide to use JIT?

I don't think so. The issues with JIT planning are more that it's
costing is simplistic (for good-ish reason, to avoid increasing the
number of plans), and that there's no caching (lots of infrastructure
work needed).

Uh, yeah, that was my question. If we knew the cost was high before we
plan, could we realistically increase the number of plans to avoid the
cost-trigger issue?

I think there are much more pressing / more general things to do. Caching of JITed "hunks" and scaling the cost with the number of JITed functions rather than one global cost. Having to run queries multiple times for good plans just isn't that interesting IMO. Especially for analytics queries, where JIT is interesting.

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#10Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#9)
Re: Caching query plan costs

On Mon, Sep 3, 2018 at 04:13:40PM -0700, Andres Freund wrote:

On September 3, 2018 3:01:29 PM PDT, Bruce Momjian <bruce@momjian.us>
wrote:

On Mon, Sep 3, 2018 at 02:53:59PM -0700, Andres Freund wrote:

On 2018-09-03 14:56:28 -0400, Bruce Momjian wrote:

On Mon, Sep 3, 2018 at 11:42:31AM -0700, Andres Freund wrote:

and JIT, so it doesn't have to be 100% accurate.

JIT decision is done after main planning, so we know the cost.

I don't think so. The issues with JIT planning are more that it's
costing is simplistic (for good-ish reason, to avoid increasing
the number of plans), and that there's no caching (lots of
infrastructure work needed).

Uh, yeah, that was my question. If we knew the cost was high before
we plan, could we realistically increase the number of plans to avoid
the cost-trigger issue?

I think there are much more pressing / more general things to
do. Caching of JITed "hunks" and scaling the cost with the number
of JITed functions rather than one global cost. Having to run
queries multiple times for good plans just isn't that interesting
IMO. Especially for analytics queries, where JIT is interesting.

I agree it isn't useful for JIT alone but if it can be used for multiple
purposes, it might be worth it.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +