Automatic function replanning

Started by Joachim Wielandover 20 years ago35 messageshackers

joe@mcknight.de

over 20 years ago

Hi,

there's a topic that comes up from time to time on the lists, the problem
that pgsql functions get planned only once and thereafter the same query
plan is used until server shutdown or explicit recreation of the function.

I'd like to implement a way of automatic function replanning. I can think of
two possible approaches.

1. in a more general way: extend pg_proc by an interval column "ttl" or
"replanAfter" and add a function declaration attribute to the parser
"... STRICT STABLE REPLAN AFTER '3 days'::interval"

+ general approach, every language that can pre-compute plans can use this
feature, the check can be done in one place for all languages

- in fact only plpsql can do that at the moment (right?) and there is no
other candidate for something similar at the moment

- catalog change that also requires interval to be specially treated while
bootstrapping

- catalog would grow, every function would have the attribute though it is
only applicable for a very low number of functions, let alone the
number of functions that would actually use it in a typical installation

2. use the #option feature of plpgsql. Add the possibility to specify
#option ttl '3 days'
or
#option replan-after '1 day 2 hours'

+ Minor changes, changes only local to plpgsql

- plpgsql specific solution

- is #option documented at all? Should it stay "unofficial"? If so, why?

3. (not automatic) add a statement that makes pgsql forget the plan and
compute a new one when the function gets called again.
"The user should rather use cron for doing maintenance tasks."

What do you think? Any other ideas?

Joachim

Neil Conway

neilc@samurai.com

over 20 years ago

In reply to: Joachim Wieland (#1)

Re: Automatic function replanning

On Tue, 2005-12-13 at 22:32 +0100, Joachim Wieland wrote:

there's a topic that comes up from time to time on the lists, the problem
that pgsql functions get planned only once and thereafter the same query
plan is used until server shutdown or explicit recreation of the function.

The problem really has nothing to do with functions, per se: whenever a
plan is created and then stored for future use, the assumptions made by
that plan may be invalidated by the time the plan is executed. This
applies to PREPARE, pl/pgsql functions, perhaps the plan caching done by
the RI triggers, and so forth.

I also think that invalidating cached plans on a periodic basis is the
wrong approach -- we can use sinval to invalidate plans as soon as a
dependent database object changes and not before. This thread contains
some ideas on how to do this:

http://archives.postgresql.org/pgsql-hackers/2005-03/msg00426.php

I got somewhat sidetracked by the complexities of the "central plan
caching module" that Tom would like to see, but I'm still hoping to take
a look at this for 8.2.

-Neil

Jim Nasby

Jim.Nasby@BlueTreble.com

over 20 years ago

In reply to: Neil Conway (#2)

Re: Automatic function replanning

On Tue, Dec 13, 2005 at 04:49:10PM -0500, Neil Conway wrote:

On Tue, 2005-12-13 at 22:32 +0100, Joachim Wieland wrote:

there's a topic that comes up from time to time on the lists, the problem
that pgsql functions get planned only once and thereafter the same query
plan is used until server shutdown or explicit recreation of the function.

The problem really has nothing to do with functions, per se: whenever a
plan is created and then stored for future use, the assumptions made by
that plan may be invalidated by the time the plan is executed. This
applies to PREPARE, pl/pgsql functions, perhaps the plan caching done by
the RI triggers, and so forth.

I also think that invalidating cached plans on a periodic basis is the
wrong approach -- we can use sinval to invalidate plans as soon as a
dependent database object changes and not before. This thread contains
some ideas on how to do this:

http://archives.postgresql.org/pgsql-hackers/2005-03/msg00426.php

I got somewhat sidetracked by the complexities of the "central plan
caching module" that Tom would like to see, but I'm still hoping to take
a look at this for 8.2.

As for predicate-driven plan changes (ie: query is planned the first
time with a predicate that has high cardinality, but there are also low
cardinality values that will be queried on), it would make more sense to
track the amount of work (probably tuples fetched) normally required to
execute a prepared statement. Any time that prepared statement is
executed with a set of predicates that substantially changes the amount
of work required it should be remembered and considered for re-planning
the next time the query is executed with those predicates.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

Bruce Momjian

bruce@momjian.us

over 20 years ago

In reply to: Jim Nasby (#3)

Re: Automatic function replanning

Good idea, TODO updated:

* Flush cached query plans when the dependent objects change or
when the cardinality of parameters changes dramatically

---------------------------------------------------------------------------

Jim C. Nasby wrote:

On Tue, Dec 13, 2005 at 04:49:10PM -0500, Neil Conway wrote:

On Tue, 2005-12-13 at 22:32 +0100, Joachim Wieland wrote:

there's a topic that comes up from time to time on the lists, the problem
that pgsql functions get planned only once and thereafter the same query
plan is used until server shutdown or explicit recreation of the function.

The problem really has nothing to do with functions, per se: whenever a
plan is created and then stored for future use, the assumptions made by
that plan may be invalidated by the time the plan is executed. This
applies to PREPARE, pl/pgsql functions, perhaps the plan caching done by
the RI triggers, and so forth.

I also think that invalidating cached plans on a periodic basis is the
wrong approach -- we can use sinval to invalidate plans as soon as a
dependent database object changes and not before. This thread contains
some ideas on how to do this:

http://archives.postgresql.org/pgsql-hackers/2005-03/msg00426.php

I got somewhat sidetracked by the complexities of the "central plan
caching module" that Tom would like to see, but I'm still hoping to take
a look at this for 8.2.

As for predicate-driven plan changes (ie: query is planned the first
time with a predicate that has high cardinality, but there are also low
cardinality values that will be queried on), it would make more sense to
track the amount of work (probably tuples fetched) normally required to
execute a prepared statement. Any time that prepared statement is
executed with a set of predicates that substantially changes the amount
of work required it should be remembered and considered for re-planning
the next time the query is executed with those predicates.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Jim Nasby

Jim.Nasby@BlueTreble.com

over 20 years ago

In reply to: Bruce Momjian (#4)

Re: Automatic function replanning

Is cardinality the only thing we'd need to worry about? My idea was
actually to track the amount of work normally required by a stored query
plan, and if a query uses that plan but requires a very different amount
of work it's a good indication that we either need to replan or store
multiple plans for that query. Though if we're certain that cardinality
is the only thing that could make a cached plan go bad it would
certainly simplify things greatly.

On Fri, Dec 16, 2005 at 11:10:43PM -0500, Bruce Momjian wrote:

Good idea, TODO updated:

* Flush cached query plans when the dependent objects change or
when the cardinality of parameters changes dramatically

---------------------------------------------------------------------------

Jim C. Nasby wrote:

On Tue, Dec 13, 2005 at 04:49:10PM -0500, Neil Conway wrote:

On Tue, 2005-12-13 at 22:32 +0100, Joachim Wieland wrote:

there's a topic that comes up from time to time on the lists, the problem
that pgsql functions get planned only once and thereafter the same query
plan is used until server shutdown or explicit recreation of the function.

The problem really has nothing to do with functions, per se: whenever a
plan is created and then stored for future use, the assumptions made by
that plan may be invalidated by the time the plan is executed. This
applies to PREPARE, pl/pgsql functions, perhaps the plan caching done by
the RI triggers, and so forth.

I also think that invalidating cached plans on a periodic basis is the
wrong approach -- we can use sinval to invalidate plans as soon as a
dependent database object changes and not before. This thread contains
some ideas on how to do this:

http://archives.postgresql.org/pgsql-hackers/2005-03/msg00426.php

I got somewhat sidetracked by the complexities of the "central plan
caching module" that Tom would like to see, but I'm still hoping to take
a look at this for 8.2.

As for predicate-driven plan changes (ie: query is planned the first
time with a predicate that has high cardinality, but there are also low
cardinality values that will be queried on), it would make more sense to
track the amount of work (probably tuples fetched) normally required to
execute a prepared statement. Any time that prepared statement is
executed with a set of predicates that substantially changes the amount
of work required it should be remembered and considered for re-planning
the next time the query is executed with those predicates.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

Bruce Momjian

bruce@momjian.us

over 20 years ago

In reply to: Jim Nasby (#5)

Re: Automatic function replanning

Jim C. Nasby wrote:

Is cardinality the only thing we'd need to worry about? My idea was
actually to track the amount of work normally required by a stored query
plan, and if a query uses that plan but requires a very different amount
of work it's a good indication that we either need to replan or store
multiple plans for that query. Though if we're certain that cardinality
is the only thing that could make a cached plan go bad it would
certainly simplify things greatly.

This gets into another area of re-optimizing when the executor finds
that the actual tables don't match the optimizer estimates. I think we
decided that was too hard/risky, but invalidating the plan might help,
though I am thinking re-planning might just generate the same plan as
before. I think something would need to have happened since the last
planning, like ANALYZE or something.

Updated TODO:

* Flush cached query plans when the dependent objects change,
when the cardinality of parameters changes dramatically, or
when new ANALYZE statistics are available

---------------------------------------------------------------------------

On Fri, Dec 16, 2005 at 11:10:43PM -0500, Bruce Momjian wrote:
Good idea, TODO updated:

* Flush cached query plans when the dependent objects change or
when the cardinality of parameters changes dramatically

---------------------------------------------------------------------------

Jim C. Nasby wrote:

On Tue, Dec 13, 2005 at 04:49:10PM -0500, Neil Conway wrote:

On Tue, 2005-12-13 at 22:32 +0100, Joachim Wieland wrote:

there's a topic that comes up from time to time on the lists, the problem
that pgsql functions get planned only once and thereafter the same query
plan is used until server shutdown or explicit recreation of the function.

The problem really has nothing to do with functions, per se: whenever a
plan is created and then stored for future use, the assumptions made by
that plan may be invalidated by the time the plan is executed. This
applies to PREPARE, pl/pgsql functions, perhaps the plan caching done by
the RI triggers, and so forth.

I also think that invalidating cached plans on a periodic basis is the
wrong approach -- we can use sinval to invalidate plans as soon as a
dependent database object changes and not before. This thread contains
some ideas on how to do this:

http://archives.postgresql.org/pgsql-hackers/2005-03/msg00426.php

I got somewhat sidetracked by the complexities of the "central plan
caching module" that Tom would like to see, but I'm still hoping to take
a look at this for 8.2.

As for predicate-driven plan changes (ie: query is planned the first
time with a predicate that has high cardinality, but there are also low
cardinality values that will be queried on), it would make more sense to
track the amount of work (probably tuples fetched) normally required to
execute a prepared statement. Any time that prepared statement is
executed with a set of predicates that substantially changes the amount
of work required it should be remembered and considered for re-planning
the next time the query is executed with those predicates.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Lukas Smith

mls@pooteeweet.org

over 20 years ago

In reply to: Bruce Momjian (#6)

Re: Automatic function replanning

Bruce Momjian wrote:

* Flush cached query plans when the dependent objects change,
when the cardinality of parameters changes dramatically, or
when new ANALYZE statistics are available

Wouldn't it also make sense to flush a cached query plan when after
execution it is determined that one or more assumptions that the cached
query plan was based on was found to be off? Like the query plan was
based on the assumption that a particular table would only return a hand
full of rows, but in reality it returned a few thousand.

regards,
Lukas

Chris Browne

cbbrowne@acm.org

over 20 years ago

In reply to: Jim Nasby (#5)

Re: Automatic function replanning

Lukas Smith <mls@pooteeweet.org> writes:

Bruce Momjian wrote:

* Flush cached query plans when the dependent objects change,
when the cardinality of parameters changes dramatically, or
when new ANALYZE statistics are available

Wouldn't it also make sense to flush a cached query plan when after
execution it is determined that one or more assumptions that the
cached query plan was based on was found to be off? Like the query
plan was based on the assumption that a particular table would only
return a hand full of rows, but in reality it returned a few
thousand.

There is some merit to that.

I could also see it being sensible to flush a cached plan any time the
query took more than some [arbitrary/GUC-chosen] interval.

Supposing it took 20s to execute the query, it would surely seem
surprising for re-evaluating the plan to be expected to make up a
material proportion of the cost of the *next* invocation.

If we flush every plan that took >10s to evaluate, that offers the
possibility for it to be done better next time...
--
output = reverse("gro.mca" "@" "enworbbc")
http://www.ntlug.org/~cbbrowne/internet.html
Points are awarded for getting the last word in. Drawing the
conversation out so long that the original message disappears due to
being indented off the right hand edge of the screen is one way to do
this. Another is to imply that anyone replying further is a hopeless
cretin and is wasting everyone's valuable time.
-- from the Symbolics Guidelines for Sending Mail

Bruce Momjian

bruce@momjian.us

over 20 years ago

In reply to: Chris Browne (#8)

Re: Automatic function replanning

Chris Browne wrote:

Lukas Smith <mls@pooteeweet.org> writes:

Bruce Momjian wrote:

* Flush cached query plans when the dependent objects change,
when the cardinality of parameters changes dramatically, or
when new ANALYZE statistics are available

Wouldn't it also make sense to flush a cached query plan when after
execution it is determined that one or more assumptions that the
cached query plan was based on was found to be off? Like the query
plan was based on the assumption that a particular table would only
return a hand full of rows, but in reality it returned a few
thousand.

There is some merit to that.

I could also see it being sensible to flush a cached plan any time the
query took more than some [arbitrary/GUC-chosen] interval.

Supposing it took 20s to execute the query, it would surely seem
surprising for re-evaluating the plan to be expected to make up a
material proportion of the cost of the *next* invocation.

If we flush every plan that took >10s to evaluate, that offers the
possibility for it to be done better next time...

Ah, but how do you pass that information back to the optimizer so you
don't end up getting the same plan again?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#10

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Lukas Smith (#7)

Re: Automatic function replanning

Lukas Smith <mls@pooteeweet.org> writes:

Bruce Momjian wrote:

* Flush cached query plans when the dependent objects change,
when the cardinality of parameters changes dramatically, or
when new ANALYZE statistics are available

Wouldn't it also make sense to flush a cached query plan when after
execution it is determined that one or more assumptions that the cached
query plan was based on was found to be off?

Not unless you do something that would cause the planner to make
different choices next time. (Such as changing the ANALYZE statistics,
perhaps.) The TODO item is OK as stated, it's just talking about
mechanism and not the things that might trigger the mechanism.

regards, tom lane

#11

Chris Browne

cbbrowne@acm.org

over 20 years ago

In reply to: Chris Browne (#8)

Re: Automatic function replanning

Chris Browne wrote:

Lukas Smith <mls@pooteeweet.org> writes:

Bruce Momjian wrote:

* Flush cached query plans when the dependent objects change,
when the cardinality of parameters changes dramatically, or
when new ANALYZE statistics are available

Wouldn't it also make sense to flush a cached query plan when after
execution it is determined that one or more assumptions that the
cached query plan was based on was found to be off? Like the query
plan was based on the assumption that a particular table would only
return a hand full of rows, but in reality it returned a few
thousand.

There is some merit to that.

I could also see it being sensible to flush a cached plan any time the
query took more than some [arbitrary/GUC-chosen] interval.

Supposing it took 20s to execute the query, it would surely seem
surprising for re-evaluating the plan to be expected to make up a
material proportion of the cost of the *next* invocation.

If we flush every plan that took >10s to evaluate, that offers the
possibility for it to be done better next time...

Ah, but how do you pass that information back to the optimizer so you
don't end up getting the same plan again?

We can't, in any direct fashion, of course.

Even if flushing the plan doesn't lead to a better one, now, it
still leaves you ready for when an ANALYZE will come along and change
the stats and possibly improve things.

One possibility to do something indirect would be for this "plan
invalidation" to also feed some stats to pg_autovacuum, such that
every "bad query plan" (as evaluated by the notion that the actual
number of tuples for a given table wildly varied from expectations)
would bump up the stats for the offending table.

That would give some feedback to encourage pg_autovacuum to analyze
trouble tables again sooner.
--
(reverse (concatenate 'string "moc.liamg" "@" "enworbbc"))
http://cbbrowne.com/info/slony.html
``Lisp has jokingly been called "the most intelligent way to misuse a
computer". I think that description is a great compliment because it
transmits the full flavor of liberation: it has assisted a number of our
most gifted fellow humans in thinking previously impossible thoughts.''
-- "The Humble Programmer", E. Dijkstra, CACM, vol. 15, n. 10, 1972

#12

Chris Browne

cbbrowne@acm.org

over 20 years ago

In reply to: Jim Nasby (#5)

Re: Automatic function replanning

Lukas Smith <mls@pooteeweet.org> writes:

Bruce Momjian wrote:

* Flush cached query plans when the dependent objects change,
when the cardinality of parameters changes dramatically, or
when new ANALYZE statistics are available

Wouldn't it also make sense to flush a cached query plan when after
execution it is determined that one or more assumptions that the cached
query plan was based on was found to be off?

Not unless you do something that would cause the planner to make
different choices next time. (Such as changing the ANALYZE statistics,
perhaps.) The TODO item is OK as stated, it's just talking about
mechanism and not the things that might trigger the mechanism.

A mechanism might be to bump up the stats stored for pg_autovacuum,
which would encourage a table to be re-ANALYZEd.

That may not immediately change ANALYZE statistics, but it's
something...

Even if there is NO feedback mechanism on statistics, if we know the
plan was pretty bad, it is surely at least *a* feedback to invalidate
the plan.
--
(reverse (concatenate 'string "moc.liamg" "@" "enworbbc"))
http://cbbrowne.com/info/slony.html
``Lisp has jokingly been called "the most intelligent way to misuse a
computer". I think that description is a great compliment because it
transmits the full flavor of liberation: it has assisted a number of our
most gifted fellow humans in thinking previously impossible thoughts.''
-- "The Humble Programmer", E. Dijkstra, CACM, vol. 15, n. 10, 1972

#13

Bruce Momjian

bruce@momjian.us

over 20 years ago

In reply to: Chris Browne (#12)

Re: Automatic function replanning

Christopher Browne wrote:

Not unless you do something that would cause the planner to make
different choices next time. (Such as changing the ANALYZE statistics,
perhaps.) The TODO item is OK as stated, it's just talking about
mechanism and not the things that might trigger the mechanism.

A mechanism might be to bump up the stats stored for pg_autovacuum,
which would encourage a table to be re-ANALYZEd.

That may not immediately change ANALYZE statistics, but it's
something...

Even if there is NO feedback mechanism on statistics, if we know the
plan was pretty bad, it is surely at least *a* feedback to invalidate
the plan.

Something has to cause the replan to be better, and that is one
possibility.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#14

Trent Shipley

tshipley@deru.com

over 20 years ago

In reply to: Lukas Smith (#7)

Re: Automatic function replanning

On Saturday 2005-12-17 16:28, Lukas Smith wrote:

Bruce Momjian wrote:

* Flush cached query plans when the dependent objects change,
when the cardinality of parameters changes dramatically, or
when new ANALYZE statistics are available

Wouldn't it also make sense to flush a cached query plan when after
execution it is determined that one or more assumptions that the cached
query plan was based on was found to be off? Like the query plan was
based on the assumption that a particular table would only return a hand
full of rows, but in reality it returned a few thousand.

regards,
Lukas

Proposed rewrite

* Mark query plan for flush (opportunistic replan) when:
** dependent objects change,
** cardinality of parameters changes sufficiently (per planner parameter)
** when new ANALYZE statistics are available and per planner parameter differ
sufficiently from prior statistics.

* Mark plan as "tried" when parameters of returned set out of statistical
control, create alternate plan hill-climbing to statical control.
** Too many/too few rows relative to plan expectations
*** Auto-sample for better statistics?
** History of plan shows throughput time for result set varies excessively
(need more execution stability, possibly at expense of median optimality).

Show quoted text

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#15

Jim Nasby

Jim.Nasby@BlueTreble.com

over 20 years ago

In reply to: Bruce Momjian (#6)

Re: Automatic function replanning

On Sat, Dec 17, 2005 at 01:07:10PM -0500, Bruce Momjian wrote:

Jim C. Nasby wrote:

Is cardinality the only thing we'd need to worry about? My idea was
actually to track the amount of work normally required by a stored query
plan, and if a query uses that plan but requires a very different amount
of work it's a good indication that we either need to replan or store
multiple plans for that query. Though if we're certain that cardinality
is the only thing that could make a cached plan go bad it would
certainly simplify things greatly.

This gets into another area of re-optimizing when the executor finds
that the actual tables don't match the optimizer estimates. I think we
decided that was too hard/risky, but invalidating the plan might help,
though I am thinking re-planning might just generate the same plan as
before. I think something would need to have happened since the last
planning, like ANALYZE or something.

Well, in the stored plan case, presumably what's changed is one of the
bound parameters. And if we want to be sophisticated about it, we won't
just throw out the old plan; rather we'll try and figure out what
parameter it is that's wanting a different plan.

Updated TODO:

* Flush cached query plans when the dependent objects change,
when the cardinality of parameters changes dramatically, or
when new ANALYZE statistics are available

Probably worth pointing to this therad in the TODO...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#16

Rick Gigger

rick@alpinenetworking.com

over 20 years ago

In reply to: Jim Nasby (#15)

Re: Automatic function replanning

It seems to me like there are two classes of problems here:

1) Simply invalidating plans made with out of date statistics.
2) Using run-time collected data to update the plan to something more
intelligent.

It also seems like #1 would be fairly straightforward and simple
whereas #2 would be much more complex. #1 would do me a world of
good and probably other people as well. Postgres's query planning
has always been fine for me, or at least I have always been able to
optimize my queries when I've got a representative data set to work
with. Query plan caching only gets me when the query plan is created
before the statistics are present to create a good plan.

Just one users 2 cents.

- Rick Gigger

On Dec 19, 2005, at 12:00 PM, Jim C. Nasby wrote:

Show quoted text

On Sat, Dec 17, 2005 at 01:07:10PM -0500, Bruce Momjian wrote:

Jim C. Nasby wrote:

Is cardinality the only thing we'd need to worry about? My idea was
actually to track the amount of work normally required by a
stored query
plan, and if a query uses that plan but requires a very different
amount
of work it's a good indication that we either need to replan or
store
multiple plans for that query. Though if we're certain that
cardinality
is the only thing that could make a cached plan go bad it would
certainly simplify things greatly.

This gets into another area of re-optimizing when the executor finds
that the actual tables don't match the optimizer estimates. I
think we
decided that was too hard/risky, but invalidating the plan might
help,
though I am thinking re-planning might just generate the same plan as
before. I think something would need to have happened since the last
planning, like ANALYZE or something.

Well, in the stored plan case, presumably what's changed is one of the
bound parameters. And if we want to be sophisticated about it, we
won't
just throw out the old plan; rather we'll try and figure out what
parameter it is that's wanting a different plan.

Updated TODO:

* Flush cached query plans when the dependent objects change,
when the cardinality of parameters changes dramatically, or
when new ANALYZE statistics are available

Probably worth pointing to this therad in the TODO...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of
broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

#17

Bruce Momjian

bruce@momjian.us

over 20 years ago

In reply to: Trent Shipley (#14)

Re: Automatic function replanning

Trent Shipley wrote:

On Saturday 2005-12-17 16:28, Lukas Smith wrote:

Bruce Momjian wrote:

* Flush cached query plans when the dependent objects change,
when the cardinality of parameters changes dramatically, or
when new ANALYZE statistics are available

Wouldn't it also make sense to flush a cached query plan when after
execution it is determined that one or more assumptions that the cached
query plan was based on was found to be off? Like the query plan was
based on the assumption that a particular table would only return a hand
full of rows, but in reality it returned a few thousand.

regards,
Lukas

Proposed rewrite

* Mark query plan for flush (opportunistic replan) when:
** dependent objects change,
** cardinality of parameters changes sufficiently (per planner parameter)
** when new ANALYZE statistics are available and per planner parameter differ
sufficiently from prior statistics.

Frankly, I think any new ANALYZE statistics should just invalidate the
plan. I don't think it is worth trying to determine if they changed
sufficiently or not --- you might as we just replan.

* Mark plan as "tried" when parameters of returned set out of statistical
control, create alternate plan hill-climbing to statical control.
** Too many/too few rows relative to plan expectations
*** Auto-sample for better statistics?
** History of plan shows throughput time for result set varies excessively
(need more execution stability, possibly at expense of median optimality).

This is a new idea, that you are remembering bad plans. I am unsure how
we would track that information. It gets into the area of having the
optimizer change its behavior based on previous runs, and I am not sure
we have ever agreed to get into that kind of behavior.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#18

Bruce Momjian

bruce@momjian.us

over 20 years ago

In reply to: Rick Gigger (#16)

Re: Automatic function replanning

Rick Gigger wrote:

It seems to me like there are two classes of problems here:

1) Simply invalidating plans made with out of date statistics.
2) Using run-time collected data to update the plan to something more
intelligent.

It also seems like #1 would be fairly straightforward and simple
whereas #2 would be much more complex. #1 would do me a world of
good and probably other people as well. Postgres's query planning
has always been fine for me, or at least I have always been able to
optimize my queries when I've got a representative data set to work
with. Query plan caching only gets me when the query plan is created
before the statistics are present to create a good plan.

Just one users 2 cents.

Agreed. I just can't add #2 unless we get more agreement from the
group, because it has been a disputed issue in the past.

---------------------------------------------------------------------------

- Rick Gigger

On Dec 19, 2005, at 12:00 PM, Jim C. Nasby wrote:

On Sat, Dec 17, 2005 at 01:07:10PM -0500, Bruce Momjian wrote:

Jim C. Nasby wrote:

Is cardinality the only thing we'd need to worry about? My idea was
actually to track the amount of work normally required by a
stored query
plan, and if a query uses that plan but requires a very different
amount
of work it's a good indication that we either need to replan or
store
multiple plans for that query. Though if we're certain that
cardinality
is the only thing that could make a cached plan go bad it would
certainly simplify things greatly.

This gets into another area of re-optimizing when the executor finds
that the actual tables don't match the optimizer estimates. I
think we
decided that was too hard/risky, but invalidating the plan might
help,
though I am thinking re-planning might just generate the same plan as
before. I think something would need to have happened since the last
planning, like ANALYZE or something.

Well, in the stored plan case, presumably what's changed is one of the
bound parameters. And if we want to be sophisticated about it, we
won't
just throw out the old plan; rather we'll try and figure out what
parameter it is that's wanting a different plan.

Updated TODO:

* Flush cached query plans when the dependent objects change,
when the cardinality of parameters changes dramatically, or
when new ANALYZE statistics are available

Probably worth pointing to this therad in the TODO...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of
broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#19

Jim Nasby

Jim.Nasby@BlueTreble.com

over 20 years ago

In reply to: Bruce Momjian (#18)

Re: Automatic function replanning

On Wed, Dec 21, 2005 at 05:43:38PM -0500, Bruce Momjian wrote:

Rick Gigger wrote:

It seems to me like there are two classes of problems here:

1) Simply invalidating plans made with out of date statistics.
2) Using run-time collected data to update the plan to something more
intelligent.

It also seems like #1 would be fairly straightforward and simple
whereas #2 would be much more complex. #1 would do me a world of
good and probably other people as well. Postgres's query planning
has always been fine for me, or at least I have always been able to
optimize my queries when I've got a representative data set to work
with. Query plan caching only gets me when the query plan is created
before the statistics are present to create a good plan.

Just one users 2 cents.

Agreed. I just can't add #2 unless we get more agreement from the
group, because it has been a disputed issue in the past.

Well, how about this, since it's a prerequisit for #2 and would be
generally useful anyway:

Track normal resource consumption (ie: tuples read) for planned queries
and record parameter values that result in drastically different
resource consumption.

This would at least make it easy for admins to identify prepared queries
that have a highly variable execution cost.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#20

Bruce Momjian

bruce@momjian.us

over 20 years ago

In reply to: Jim Nasby (#19)

Re: Automatic function replanning

Jim C. Nasby wrote:

On Wed, Dec 21, 2005 at 05:43:38PM -0500, Bruce Momjian wrote:

Rick Gigger wrote:

It seems to me like there are two classes of problems here:

1) Simply invalidating plans made with out of date statistics.
2) Using run-time collected data to update the plan to something more
intelligent.

It also seems like #1 would be fairly straightforward and simple
whereas #2 would be much more complex. #1 would do me a world of
good and probably other people as well. Postgres's query planning
has always been fine for me, or at least I have always been able to
optimize my queries when I've got a representative data set to work
with. Query plan caching only gets me when the query plan is created
before the statistics are present to create a good plan.

Just one users 2 cents.

Agreed. I just can't add #2 unless we get more agreement from the
group, because it has been a disputed issue in the past.

Well, how about this, since it's a prerequisit for #2 and would be
generally useful anyway:

Track normal resource consumption (ie: tuples read) for planned queries
and record parameter values that result in drastically different
resource consumption.

This would at least make it easy for admins to identify prepared queries
that have a highly variable execution cost.

We have that TODO already:

* Log statements where the optimizer row estimates were dramatically
different from the number of rows actually found?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#21