More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

Started by David Rowleyabout 10 years ago9 messages
#1David Rowley
david.rowley@2ndquadrant.com

Hi,

On [1]/messages/by-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com I suggested an idea to make improvements to the planner around the
Equivalence Class code. Later in [2]/messages/by-id/30810.1449335261@sss.pgh.pa.us Tom raised concerns with this adding
too many planning cycles for a perhaps not common enough situation. I
don't want to discuss that particular patch here, I want to discuss more
generally about the dilemma about adding more smarts to the planner to
allow it to generate a more optimal plan in order to save on execution time.

In the case of the Equivalence Class Filters code, I quoted an example
where pushing these filters down into the joined relation caused a
significant performance improvement to a query. Now, I understand Tom's
concerns with slowing down the planner, as in cases where the query is
short running, or the optimisations don't apply, then we could cause the
query to overall (including planning time) perform worse. Nobody wants
that, but on the other hand, if spending 5-10 extra microseconds during
planning equates to 6 hours shaved off execution time, then nobody would
think to grudge that extra 5-10 microseconds during planning.

What I'd like to discuss here is what was touched on on that other thread
on ways to get around this problem:

A number of ideas were suggested on the other thread about how we might go
about solving this problem. In [3]/messages/by-id/CANP8+jLRpRN4ynMsRkOqhYi-Dw5JrODMOt05qejhrAyrsExVmg@mail.gmail.com Simon talked about perhaps enabling
extra optimisations when the planner sees that the plan will cost more than
some given threshold. That's perhaps an option, but may not work well for
optimisations which must take place very early in planning, for example [4]/messages/by-id/CAKJS1f_UZ_MXtpot6EPXsgHSujoUCrKuXYHLH06h072rDXsCzw@mail.gmail.com.
Another idea which came up was from Evgeniy [5]/messages/by-id/2F30BA8B-DAB9-4907-9E4E-102D242566E3@gmail.com, which was more of a
request not to do it this way, but never-the-less, the idea was basically
to add lots of GUCs to enable/disable each extra planner feature.

Another option which I've thought about previously was a planner_strength
GUC, at which various additional optimisations are enabled at various
predefined strength levels, so that databases which tend to spend a great
deal more execution time compared to planning time can turn this up a bit
to see if that helps change that ratio a bit. This idea is far from
perfect though, as who's to say that planner feature X should "kick in"
before planner feature Y? I've also often thought that it might be nice to
have it so the planner does not modify the Parse object, so that the
planner has the option to throw away what it's done so far and start
planning all over again with the "planner_strength" knob turned up to the
maximum, if the cost happened to indicate that the query was going to take
a long time to execute.

In reality we already have some planner features which are possible
candidates for non essential optimisations. For example join removals
likely don't apply in all that many cases, but when they do, this feature
is a great win. So by having some sort of ability to enable/disable planner
features we also stand to actually speed the planner up for fast simple
queries.

I do strongly believe that we need to come up with something to solve this
problem. I already summarised my thoughts on the other thread.

I wrote:

I believe that with parallel query on the horizon for 9.6 that we're now
aiming to support bigger OLAP type database than ever before. So if we
ignore patches like this one then it appears that we have some conflicting
goals in the community as it seems that we're willing to add the brawn,

but

we're not willing to add the brain. If this is the case then it's a shame,
as I think we can have both. So I very much agree on the fact that we must
find a way to maintain support and high performance of small OLTP

databases

too.

So here I'd very much like to kick off discussion on an acceptable way to
solve this problem, in a realistic way which we're all happy with.

Comments are of course welcome.

[1]: /messages/by-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com
/messages/by-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com
[2]: /messages/by-id/30810.1449335261@sss.pgh.pa.us
[3]: /messages/by-id/CANP8+jLRpRN4ynMsRkOqhYi-Dw5JrODMOt05qejhrAyrsExVmg@mail.gmail.com
/messages/by-id/CANP8+jLRpRN4ynMsRkOqhYi-Dw5JrODMOt05qejhrAyrsExVmg@mail.gmail.com
[4]: /messages/by-id/CAKJS1f_UZ_MXtpot6EPXsgHSujoUCrKuXYHLH06h072rDXsCzw@mail.gmail.com
/messages/by-id/CAKJS1f_UZ_MXtpot6EPXsgHSujoUCrKuXYHLH06h072rDXsCzw@mail.gmail.com
[5]: /messages/by-id/2F30BA8B-DAB9-4907-9E4E-102D242566E3@gmail.com
/messages/by-id/2F30BA8B-DAB9-4907-9E4E-102D242566E3@gmail.com

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#2Benedikt Grundmann
bgrundmann@janestreet.com
In reply to: David Rowley (#1)
Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

On Wed, Dec 30, 2015 at 7:16 AM, David Rowley <david.rowley@2ndquadrant.com>
wrote:

A number of ideas were suggested on the other thread about how we might go
about solving this problem. In [3] Simon talked about perhaps enabling
extra optimisations when the planner sees that the plan will cost more than
some given threshold. That's perhaps an option, but may not work well for
optimisations which must take place very early in planning, for example [4].

A small tweak on 3 to deal with 4. If the returned plan cost is quite high
(say you estimate minutes+) you could just restart planning from scratch
with all costly planning enabled, because even in the worst case (that is
the additional options don't find a better plan), the total planning cost
won't matter much in the grand scheme of things.

#3Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David Rowley (#1)
Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

On 12/30/2015 08:16 AM, David Rowley wrote:

I do strongly believe that we need to come up with something to
solve this problem. I already summarised my thoughts on the other
thread.

One approach that I don't see mentioned on any of the threads is plan
caching, which allows reusing the plan across many query executions,
hopefully amortizing the planning costs.

I'm sure implementing such caching is non-trivial and there are cases
where it may not help, but perhaps it's not entirely futile (AFAIK it's
used by some databases exactly to address the higher planning costs).

I imagine a single GUC enabling or disabling this (possibly not just
globally but per session, user or database).

We already have some form of plan caching, although only for prepared
statements within a single session - maybe that could be a good starting
point? For example what if we only enabled those "expensive"
optimizations for prepared statements, which are assumed to be executed
multiple times? Of course, this may not be entirely true (say, PL/pgSQL
uses prepared statements all the time).

Of course, the annoying consequence of this would be that the planning
may get somewhat unpredictable - the plan will depend on whether the
query was planned directly or as a prepared statement, or whether plan
caching is enabled. However, the same mostly applies to solutions
proposed in the other threads so far.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Evgeniy Shishkin
itparanoia@gmail.com
In reply to: David Rowley (#1)
Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

On 30 Dec 2015, at 10:16, David Rowley <david.rowley@2ndquadrant.com> wrote:

Hi,

On [1] I suggested an idea to make improvements to the planner around the Equivalence Class code. Later in [2] Tom raised concerns with this adding too many planning cycles for a perhaps not common enough situation. I don't want to discuss that particular patch here, I want to discuss more generally about the dilemma about adding more smarts to the planner to allow it to generate a more optimal plan in order to save on execution time.

In the case of the Equivalence Class Filters code, I quoted an example where pushing these filters down into the joined relation caused a significant performance improvement to a query. Now, I understand Tom's concerns with slowing down the planner, as in cases where the query is short running, or the optimisations don't apply, then we could cause the query to overall (including planning time) perform worse. Nobody wants that, but on the other hand, if spending 5-10 extra microseconds during planning equates to 6 hours shaved off execution time, then nobody would think to grudge that extra 5-10 microseconds during planning.

What I'd like to discuss here is what was touched on on that other thread on ways to get around this problem:

A number of ideas were suggested on the other thread about how we might go about solving this problem. In [3] Simon talked about perhaps enabling extra optimisations when the planner sees that the plan will cost more than some given threshold. That's perhaps an option, but may not work well for optimisations which must take place very early in planning, for example [4].
Another idea which came up was from Evgeniy [5], which was more of a request not to do it this way, but never-the-less, the idea was basically to add lots of GUCs to enable/disable each extra planner feature.

Well, my idea was to track planning/execution cost in something like pg_stat_statements.
That way we can track actual time, not estimated cost like Simon proposed.

This table can be combined with Tomas proposal of plan caching.

Another option which I've thought about previously was a planner_strength GUC, at which various additional optimisations are enabled at various predefined strength levels, so that databases which tend to spend a great deal more execution time compared to planning time can turn this up a bit to see if that helps change that ratio a bit. This idea is far from perfect though, as who's to say that planner feature X should "kick in" before planner feature Y? I've also often thought that it might be nice to have it so the planner does not modify the Parse object, so that the planner has the option to throw away what it's done so far and start planning all over again with the "planner_strength" knob turned up to the maximum, if the cost happened to indicate that the query was going to take a long time to execute.

In reality we already have some planner features which are possible candidates for non essential optimisations. For example join removals likely don't apply in all that many cases, but when they do, this feature is a great win. So by having some sort of ability to enable/disable planner features we also stand to actually speed the planner up for fast simple queries.

I do strongly believe that we need to come up with something to solve this problem. I already summarised my thoughts on the other thread.

I wrote:

I believe that with parallel query on the horizon for 9.6 that we're now
aiming to support bigger OLAP type database than ever before. So if we
ignore patches like this one then it appears that we have some conflicting
goals in the community as it seems that we're willing to add the brawn, but
we're not willing to add the brain. If this is the case then it's a shame,
as I think we can have both. So I very much agree on the fact that we must
find a way to maintain support and high performance of small OLTP databases
too.

So here I'd very much like to kick off discussion on an acceptable way to solve this problem, in a realistic way which we're all happy with.

Comments are of course welcome.

[1] /messages/by-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com
[2] /messages/by-id/30810.1449335261@sss.pgh.pa.us
[3] /messages/by-id/CANP8+jLRpRN4ynMsRkOqhYi-Dw5JrODMOt05qejhrAyrsExVmg@mail.gmail.com
[4] /messages/by-id/CAKJS1f_UZ_MXtpot6EPXsgHSujoUCrKuXYHLH06h072rDXsCzw@mail.gmail.com
[5] /messages/by-id/2F30BA8B-DAB9-4907-9E4E-102D242566E3@gmail.com

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5David Rowley
david.rowley@2ndquadrant.com
In reply to: Benedikt Grundmann (#2)
Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

On 30 December 2015 at 21:12, Benedikt Grundmann <bgrundmann@janestreet.com>
wrote:

On Wed, Dec 30, 2015 at 7:16 AM, David Rowley <
david.rowley@2ndquadrant.com> wrote:

A number of ideas were suggested on the other thread about how we might
go about solving this problem. In [3] Simon talked about perhaps enabling
extra optimisations when the planner sees that the plan will cost more than
some given threshold. That's perhaps an option, but may not work well for
optimisations which must take place very early in planning, for example [4].

A small tweak on 3 to deal with 4. If the returned plan cost is quite
high (say you estimate minutes+) you could just restart planning from
scratch with all costly planning enabled, because even in the worst case
(that is the additional options don't find a better plan), the total
planning cost won't matter much in the grand scheme of things.

I do personally quite like this idea. Quite likely the extra logic could be
added to the planner() function so that it calls standard_planner() again
in the event that the cost exceeds some specified threshold. I think the
planner might need a little bit of work before replanning on the same parse
is ok, as there's places where the planner makes changes to this object
which cause things not to work well during the replan. So I think if we
went down this route, then the first steps should be to find alternative
ways to do things so that the parse is never edited, and set new standards
that the parse cannot be changed within the planner anymore.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#6David Rowley
david.rowley@2ndquadrant.com
In reply to: Tomas Vondra (#3)
Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

On 31 December 2015 at 01:24, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

On 12/30/2015 08:16 AM, David Rowley wrote:

I do strongly believe that we need to come up with something to
solve this problem. I already summarised my thoughts on the other
thread.

One approach that I don't see mentioned on any of the threads is plan
caching, which allows reusing the plan across many query executions,
hopefully amortizing the planning costs.

I'm sure implementing such caching is non-trivial and there are cases
where it may not help, but perhaps it's not entirely futile (AFAIK it's
used by some databases exactly to address the higher planning costs).

I imagine a single GUC enabling or disabling this (possibly not just
globally but per session, user or database).

We already have some form of plan caching, although only for prepared
statements within a single session - maybe that could be a good starting
point? For example what if we only enabled those "expensive" optimizations
for prepared statements, which are assumed to be executed multiple times?
Of course, this may not be entirely true (say, PL/pgSQL uses prepared
statements all the time).

Of course, the annoying consequence of this would be that the planning may
get somewhat unpredictable - the plan will depend on whether the query was
planned directly or as a prepared statement, or whether plan caching is
enabled. However, the same mostly applies to solutions proposed in the
other threads so far.

Personally I'd like to see automatic plan caching occur in PostgreSQL.
There is a bit of a problem with it in regards to a query such as: select *
from t where mySkewedColumn = 1; where the value 1 appears 99% of the
time. Initially we may plan to seqscan, where with other values we'd likely
prefer to index scan. I imagine with my unique joins patch, that it could
be expanded to test baserestrictinfos to see if they contain quals with a
unique index. This knowledge could later permit plan caching to occur on
queries which are safe from having any skew in the results. It might sound
rather restrictive, but likely this would cover 99% of UPDATE and DELETE
operations in an OLTP workload, and likely a good deal of the SELECTs too.
The safety of caching could be analyzed during planning, and a flag could
be set somewhere, perhaps in PlannedStmt to mark if the plan is safe to
cache. The planner() function could initially hash the query string and
check if any cached plan exists with that hash, if not, plan the statement,
and then check if the "safeToCache" flag is set, and if so, stick that plan
into a hash table. Also plans with no baserestrictinfos could be
"safeToCache" too.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#7David Rowley
david.rowley@2ndquadrant.com
In reply to: Evgeniy Shishkin (#4)
Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

On 31 December 2015 at 04:39, Evgeniy Shishkin <itparanoia@gmail.com> wrote:

On 30 Dec 2015, at 10:16, David Rowley <david.rowley@2ndquadrant.com>

wrote:

A number of ideas were suggested on the other thread about how we might

go about solving this problem. In [3] Simon talked about perhaps enabling
extra optimisations when the planner sees that the plan will cost more than
some given threshold. That's perhaps an option, but may not work well for
optimisations which must take place very early in planning, for example [4].

Another idea which came up was from Evgeniy [5], which was more of a

request not to do it this way, but never-the-less, the idea was basically
to add lots of GUCs to enable/disable each extra planner feature.

Well, my idea was to track planning/execution cost in something like
pg_stat_statements.
That way we can track actual time, not estimated cost like Simon proposed.

I like this idea also, but I do see the use for this as more of a guide
which could be used by the DBA to tweak some sort of planner_effort GUCs. I
don't think that the data of pg_stat_statements could be used by the
planner as this is an extension rather than core code.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#8Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David Rowley (#6)
Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

On 12/30/2015 10:30 PM, David Rowley wrote:

On 31 December 2015 at 01:24, Tomas Vondra <tomas.vondra@2ndquadrant.com
<mailto:tomas.vondra@2ndquadrant.com>> wrote:

Personally I'd like to see automatic plan caching occur in
PostgreSQL. There is a bit of a problem with it in regards to a query
such as: select * from t where mySkewedColumn = 1; where the value 1
appears 99% of the time. Initially we may plan to seqscan, where with
other values we'd likely prefer to index scan. I imagine with my
unique joins patch, that it could be expanded to test
baserestrictinfos to see if they contain quals with a unique index.
This knowledge could later permit plan caching to occur on queries
which are safe from having any skew in the results. It might sound
rather restrictive, but likely this would cover 99% of UPDATE and
DELETE operations in an OLTP workload, and likely a good deal of the
SELECTs too. The safety of caching could be analyzed during planning,
and a flag could be set somewhere, perhaps in PlannedStmt to mark if
the plan is safe to cache. The planner() function could initially
hash the query string and check if any cached plan exists with that
hash, if not, plan the statement, and then check if the "safeToCache"
flag is set, and if so, stick that plan into a hash table. Also plans
with no baserestrictinfos could be "safeToCache" too.

Yeah, that's what I meant by "non-trivial". I don't know a good approach
to this problem, or if such a "good" approach even exists, but I'd say
being able to decide whether to rebuild a plan in such cases is a
"must-have" feature. Otherwise we could easily loose any gains from the
more thorough optimization because of poor plans.

In other words, we'd have to come up with a way to decide whether to use
the same plan as before, or try building another plan (for the same
query with different parameter values). I can think of two approaches:

(1) Try to measure how "different" are the parameter values used in the
new query, compared to the existing plan(s). This probably means
difference in terms of probability / frequencies etc.

(2) Compute the cost of the existing plan for the new parameters. I.e.
don't perform the whole optimization, just the costing for the
single plan. If the costs are "close" then use the existing plan.

Of course, none of this is trivial and still may fail for some cases.

I wonder what the other databases do, or if there are papers about this
topic (I'd guess there are, but I haven't looked too much).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Noah Misch
noah@leadboat.com
In reply to: David Rowley (#1)
Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

On Wed, Dec 30, 2015 at 08:16:28PM +1300, David Rowley wrote:

On [1] I suggested an idea to make improvements to the planner around the
Equivalence Class code. Later in [2] Tom raised concerns with this adding
too many planning cycles for a perhaps not common enough situation. I
don't want to discuss that particular patch here, I want to discuss more
generally about the dilemma about adding more smarts to the planner to
allow it to generate a more optimal plan in order to save on execution time.

It's an important topic.

A number of ideas were suggested on the other thread about how we might go
about solving this problem. In [3] Simon talked about perhaps enabling
extra optimisations when the planner sees that the plan will cost more than
some given threshold. That's perhaps an option, but may not work well for
optimisations which must take place very early in planning, for example [4].

Yeah. A slew of steps precede us assembling a notion of total cost. I bet
most controversial proposed steps will happen in that early period. We'd need
a rough cost earlier than we get it today, and I don't know what achieving
that would look like. Simon, did you have specifics in view?

Another idea which came up was from Evgeniy [5], which was more of a
request not to do it this way, but never-the-less, the idea was basically
to add lots of GUCs to enable/disable each extra planner feature.

This and subsequent ideas each envision some kind of optimization step
blacklist. Suppose it's a bitmap with one bit per optional optimizer step.
Each idea chooses blacklist membership differently, but the planner acts on
the blacklist about the same way. I paraphrase the ideas in those terms
below, and I offer a couple more. For this idea, the blacklist is simple:

1. User defines the blacklist fully.

It's essentially handing the hard part back to the user. While I sympathize
with allergic reactions to this, I like it far better than rejecting
optimizations based on thinking the average case wants them disabled.
work_mem likewise isn't the ideal knob for any user, but it has a simple
implementation and beats "1MB per hash table is okay for everyone."

Another option which I've thought about previously was a planner_strength
GUC, at which various additional optimisations are enabled at various
predefined strength levels, so that databases which tend to spend a great
deal more execution time compared to planning time can turn this up a bit
to see if that helps change that ratio a bit. This idea is far from

2. User selects from predefined blacklists like "maximum", "default", etc.

perfect though, as who's to say that planner feature X should "kick in"
before planner feature Y? I've also often thought that it might be nice to

Yep. It will be essentially impossible to build an argument to move a planner
feature from one strength to another. If the feature's committer has
personally experienced the problem in the field, the optimization will end up
active at default planner strength.

have it so the planner does not modify the Parse object, so that the
planner has the option to throw away what it's done so far and start
planning all over again with the "planner_strength" knob turned up to the
maximum, if the cost happened to indicate that the query was going to take
a long time to execute.

3. System first plans with a specific predefined blacklist that omits
speculative (low probability, high payoff) steps. If that yields a
high-cost plan, it repeats planning with an empty blacklist.

I agree that the planner's modification of the Query tree is a substantial
roadblock. The design needs to work for one-shot plans, and we're unlikely to
recoup the cost of copying every one-shot Query tree.

So here I'd very much like to kick off discussion on an acceptable way to
solve this problem, in a realistic way which we're all happy with.

It's bad to add 10us per plan times 3000/s/client, but the same waste once per
minute per client is no cause for concern. I advise accepting speculative
planner optimizations, enabled by default when similar in cost to comparable
existing steps. At the same time, I encourage developing automation to cap
the waste when a client elicits millions of planner runs that don't benefit
from a certain optimization. Specific lines of inquiry:

4. Derive a conservative blacklist for each rewritten Query when first
planning it, and use that blacklist for subsequent plans.

Some prepared statements use a custom plan every time. (Constraint exclusion
is one driver of such cases.) Many facets of a Query's planning problem don't
depend on parameter values, so a particular optimization will apply to all the
custom plans or to none of them. Let the first plan build a blacklist of
provably-irrelevant optimizations, which the plan cache stores and furnishes
to later runs. The first plan after an invalidation recomputes the blacklist.

5. Add a facility that profiles a workload to generate blacklist data.

Think along the lines of gcc -fprofile-generate/-fprofile-use. Start a
profile; run your application load test; the server collects data sufficient
to match each query with its optimal blacklist. Issue "ALTER USER appuser SET
optimizer_profile = 'profname'" to adopt profile-driven blacklisting. This is
the automated equivalent of (1).

I recommend starting with (1), full user control over the blacklist via GUC.
Specifically, I recommend one GUC expressing the list rather than one Boolean
GUC per optional step. The core work of enumerating optional steps and making
the planner obey a blacklist applies toward all five ideas. A GUC to set that
blacklist is cheap, and it would be worth having as an escape hatch even if we
add sophistication later.

(2) is a dead end for the reason you gave. (3) has potential, but each of
(3), (4) and (5) is a big project with great uncertainty. They shouldn't
block introducing optimizer steps.

Thanks,
nm

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers