per table random-page-cost?

Started by marcin mankover 16 years ago39 messageshackers
Jump to latest
#1marcin mank
marcin.mank@gmail.com

Currently random_page_cost is a GUC. I propose that this could be set per-table.

I think this is a good idea for widely-wanted planner hints. This way
You can say "I do NOT want this table to be index-scanned, because I
know it is not cached" by setting it`s random_page_cost to a large
value (an obviously You can do the other way around, when setting the
random_page_cost to 1 You say "I don`t care how You fetch the pages,
they are all in cache")

The value for the per-table setting could be inferred from
pg_stat(io)?.*tables . We could have a tool to suggest appropriate
values.

We could call it something like cached_percentage (and have the cost
of a random tuple fetch be inferred from the global random_page_cost,
seq_tuple_cost and the per-table cached_percentage). Then we could set
the global random_page_cost to a sane value like 200. Now one can
wonder why the planner works while having such blantantly unrealistic
values for random_page_cost :)

What do You think?

Greetings
Marcin Mank

#2Robert Haas
robertmhaas@gmail.com
In reply to: marcin mank (#1)
Re: per table random-page-cost?

On Mon, Oct 19, 2009 at 5:08 PM, marcin mank <marcin.mank@gmail.com> wrote:

Currently random_page_cost is a GUC. I propose that this could be set per-table.

I think this is a good idea for widely-wanted planner hints. This way
You can say "I do NOT want this table to be index-scanned, because I
know it is not cached" by setting it`s random_page_cost to a large
value (an obviously You can do the other way around, when setting the
random_page_cost to 1 You say "I don`t care how You fetch the pages,
they are all in cache")

The value for the per-table setting could be inferred from
pg_stat(io)?.*tables . We could have a tool to suggest appropriate
values.

We could call it something like cached_percentage (and have the cost
of a random tuple fetch be inferred from the global random_page_cost,
seq_tuple_cost and the per-table cached_percentage). Then we could set
the global random_page_cost to a sane value like 200. Now one can
wonder why the planner works while having such blantantly unrealistic
values for random_page_cost :)

What do You think?

I've been thinking about this a bit, too. I've been wondering if it
might make sense to have a "random_page_cost" and "seq_page_cost"
setting for each TABLESPACE, to compensate for the fact that different
media might be faster or slower, and a percent-cached setting for each
table over top of that.

...Robert

#3Bruce Momjian
bruce@momjian.us
In reply to: marcin mank (#1)
Re: per table random-page-cost?

On Mon, Oct 19, 2009 at 2:08 PM, marcin mank <marcin.mank@gmail.com> wrote:

Currently random_page_cost is a GUC. I propose that this could be set per-table.

Or per-tablespace.

Yes, I think there are a class of GUCs which describe the physical
attributes of the storage system which should be per-table or
per-tablespace. random_page_cost, sequential_page_cost,
effective_io_concurrency come to mind.

While this isn't a simple flag to change it does seem like a bit of a
SMOP. The GUC infrastructure stores these values in global variables
which the planner and other systems consult directly. They would
instead have to be made storage parameters which the planner and other
systems check on the appropriate table and default to the global GUC
if they're not set.

--
greg

#4Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#2)
Re: per table random-page-cost?

Robert Haas <robertmhaas@gmail.com> wrote:

I've been wondering if it might make sense to have a
"random_page_cost" and "seq_page_cost" setting for each TABLESPACE,
to compensate for the fact that different media might be faster or
slower, and a percent-cached setting for each table over top of
that.

[after recovering from the initial cringing reaction...]

How about calculating an effective percentage based on other
information. effective_cache_size, along with relation and database
size, come to mind. How about the particular index being considered
for the plan? Of course, you might have to be careful about working
in TOAST table size for a particular query, based on the columns
retrieved.

I have no doubt that there would be some major performance regressions
in the first cut of anything like this, for at least *some* queries.
The toughest part of this might be to get adequate testing to tune it
for a wide enough variety of real-life situations.

-Kevin

#5marcin mank
marcin.mank@gmail.com
In reply to: Robert Haas (#2)
Re: per table random-page-cost?

I've been thinking about this a bit, too.  I've been wondering if it
might make sense to have a "random_page_cost" and "seq_page_cost"
setting for each TABLESPACE, to compensate for the fact that different
media might be faster or slower, and a percent-cached setting for each
table over top of that.

I thought about making it per-table, but realistically I think most
people don`t use tablespaces now. I would not want to be telling
people "to be able to hint the planner to (not) index-scan the table,
You must move it to a separate tablespace".

A global default, a per-tablespace default overriding it, and a
per-table value overriding them both seems like over-engineering to
me.

Greetings
Marcin

#6marcin mank
marcin.mank@gmail.com
In reply to: marcin mank (#5)
Re: per table random-page-cost?
Show quoted text

I thought about making it per-table***space***, but realistically I

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: marcin mank (#5)
Re: per table random-page-cost?

marcin mank <marcin.mank@gmail.com> writes:

I thought about making it per-table, but realistically I think most
people don`t use tablespaces now. I would not want to be telling
people "to be able to hint the planner to (not) index-scan the table,
You must move it to a separate tablespace".

Per-table is not physically sensible. Per-tablespace has some rationale
to it.

regards, tom lane

#8Andrew Dunstan
andrew@dunslane.net
In reply to: marcin mank (#5)
Re: per table random-page-cost?

marcin mank wrote:

I've been thinking about this a bit, too. I've been wondering if it
might make sense to have a "random_page_cost" and "seq_page_cost"
setting for each TABLESPACE, to compensate for the fact that different
media might be faster or slower, and a percent-cached setting for each
table over top of that.

I thought about making it per-table, but realistically I think most
people don`t use tablespaces now. I would not want to be telling
people "to be able to hint the planner to (not) index-scan the table,
You must move it to a separate tablespace".

This is just plain wrong, in my experience. *Every* large installation I
deal with uses tablespaces.

This proposal is just "hints by the back door", ISTM. As Tom says, there
is a justification for having it on tablespaces but not on individual
tables.

If you want to argue for full blown planner hints, that's a whole other
story. Have you read the previous debates on the subject?

cheers

#9marcin mank
marcin.mank@gmail.com
In reply to: Andrew Dunstan (#8)
Re: per table random-page-cost?

This proposal is just "hints by the back door", ISTM. As Tom says, there is
a justification for having it on tablespaces but not on individual tables.

If the parameter is defined as "the chance that a page is in cache"
there is very real physical meaning to it. And this is per-table, not
per-tablespace. A "users" table will likely be fetched from cache all
the time, while a "billing_records" table will be fetched mostly from
disk.

Greetings
Marcin

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: marcin mank (#9)
Re: per table random-page-cost?

marcin mank <marcin.mank@gmail.com> writes:

This proposal is just "hints by the back door", ISTM. As Tom says, there is
a justification for having it on tablespaces but not on individual tables.

If the parameter is defined as "the chance that a page is in cache"
there is very real physical meaning to it.

We have no such parameter...

regards, tom lane

#11Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#10)
Re: per table random-page-cost?

On Mon, Oct 19, 2009 at 4:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

marcin mank <marcin.mank@gmail.com> writes:

This proposal is just "hints by the back door", ISTM. As Tom says, there is
a justification for having it on tablespaces but not on individual tables.

If the parameter is defined as "the chance that a page is in cache"
there is very real physical meaning to it.

We have no such parameter...

And we want our parameters to be things the DBA has a chance of being
able to estimate. How would you come up with sensible figures for this
hypothetical parameter?

--
greg

#12Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#4)
Re: per table random-page-cost?

On Mon, Oct 19, 2009 at 2:54 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

How about calculating an effective percentage based on other
information.  effective_cache_size, along with relation and database
size, come to mind.

I think previous proposals for this have fallen down when you actually
try to work out a formula for this. The problem is that you could have
a table which is much smaller than effective_cache_size but is never
in cache due to it being one of many such tables.

I think it would still be good to have some naive kind of heuristic
here as long as it's fairly predictable for DBAs.

But the long-term strategy here I think is to actually have some way
to measure the real cache hit rate on a per-table basis. Whether it's
by timing i/o operations, programmatic access to dtrace, or some other
kind of os interface, if we could know the real cache hit rate it
would be very helpful.

Perhaps we could extrapolate from the shared buffer cache percentage.
If there's a moderately high percentage in shared buffers then it
seems like a reasonable supposition to assume the filesystem cache
would have a similar distribution.

--
greg

#13Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#4)
Re: per table random-page-cost?

On Mon, Oct 19, 2009 at 5:54 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

I've been wondering if it might make sense to have a
"random_page_cost" and "seq_page_cost" setting for each TABLESPACE,
to compensate for the fact that different media might be faster or
slower, and a percent-cached setting for each table over top of
that.

[after recovering from the initial cringing reaction...]

How about calculating an effective percentage based on other
information.  effective_cache_size, along with relation and database
size, come to mind.  How about the particular index being considered
for the plan?  Of course, you might have to be careful about working
in TOAST table size for a particular query, based on the columns
retrieved.

I think that a per-tablespace page cost should be set by the DBA, same
as we do with global page-costs now.

OTOH, I think that a per-relation percent-in-cache should be
automatically calculated by the database (somehow) and the DBA should
have an option to override in case the database does the wrong thing.
I gave a lightning talk on this topic at PGcon.

I have no doubt that there would be some major performance regressions
in the first cut of anything like this, for at least *some* queries.
The toughest part of this might be to get adequate testing to tune it
for a wide enough variety of real-life situations.

Agreed.

...Robert

#14Jeff Davis
pgsql@j-davis.com
In reply to: Bruce Momjian (#12)
Re: per table random-page-cost?

On Mon, 2009-10-19 at 16:39 -0700, Greg Stark wrote:

But the long-term strategy here I think is to actually have some way
to measure the real cache hit rate on a per-table basis. Whether it's
by timing i/o operations, programmatic access to dtrace, or some other
kind of os interface, if we could know the real cache hit rate it
would be very helpful.

Maybe it would be simpler to just get the high-order bit: is this table
likely to be completely in cache (shared buffers or os buffer cache), or
not?

The lower cache hit ratios are uninteresting: the performance difference
between 1% and 50% is only a factor of two. The higher cache hit ratios
that are lower than "almost 100%" seem unlikely: what kind of scenario
would involve a stable 90% cache hit ratio for a table?

Regards,
Jeff Davis

#15Jeff Janes
jeff.janes@gmail.com
In reply to: Bruce Momjian (#11)
Re: per table random-page-cost?

On Mon, Oct 19, 2009 at 4:29 PM, Greg Stark <gsstark@mit.edu> wrote:

On Mon, Oct 19, 2009 at 4:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

marcin mank <marcin.mank@gmail.com> writes:

This proposal is just "hints by the back door", ISTM. As Tom says, there is
a justification for having it on tablespaces but not on individual tables.

If the parameter is defined as "the chance that a page is in cache"
there is very real physical meaning to it.

We have no such parameter...

And we want our parameters to be things the DBA has a chance of being
able to estimate.

Do the current parameters meet that standard? When setting
seq_page_cost now, don't people have a lot of "Well, we're about this
likely to find it in the cache anyway" built into their settings?

Jeff

#16Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Janes (#15)
Re: per table random-page-cost?

Jeff Davis <pgsql@j-davis.com> wrote:

what kind of scenario
would involve a stable 90% cache hit ratio for a table?

I'd bet accounts receivable applications often hit that.
(Most payments on recent billings; a sprinkling on older ones.)
I'm sure there are others.

-Kevin

#17Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#16)
Re: per table random-page-cost?

On Mon, 2009-10-19 at 21:22 -0500, Kevin Grittner wrote:

I'd bet accounts receivable applications often hit that.
(Most payments on recent billings; a sprinkling on older ones.)
I'm sure there are others.

You worded the examples in terms of writes (I think), and we're talking
about read caching, so I still don't entirely understand.

Also, the example sounds like you'd like to optimize across queries.
There's no mechanism for the planner to remember some query executed a
while ago, and match it up to some new query that it's trying to plan.
Maybe there should be, but that's an entirely different feature.

I'm not clear on the scenario that we're trying to improve.

Regards,
Jeff Davis

#18Greg Smith
gsmith@gregsmith.com
In reply to: Jeff Davis (#17)
Re: per table random-page-cost?

On Mon, 19 Oct 2009, Jeff Davis wrote:

On Mon, 2009-10-19 at 21:22 -0500, Kevin Grittner wrote:

I'd bet accounts receivable applications often hit that.
(Most payments on recent billings; a sprinkling on older ones.)
I'm sure there are others.

You worded the examples in terms of writes (I think), and we're talking
about read caching, so I still don't entirely understand.

No, that part was fair. The unfortunate reality of accounts receivable is
that reports run to list people who owe one money happen much more often
than posting payments into the system does.

Also, the example sounds like you'd like to optimize across queries.
There's no mechanism for the planner to remember some query executed a
while ago, and match it up to some new query that it's trying to plan.

Some of the use-cases here involve situations where you know most of a
relation is likely to be in cache just because there's not much going on
that might evict it. In any case, something that attempts to model some
average percentage you can expect a relation to be in cache is in effect
serving as a memory of past queries.

I'm not clear on the scenario that we're trying to improve.

Duh, that would be the situation where someone wants optimizer hints but
can't call them that because then the idea would be reflexively rejected!

Looks like I should dust off the much more complicated proposal for
tracking and using in-cache hit percentages I keep not having time to
finish writing up. Allowing a user-set value for that is a lot more
reasonable if the system computes a reasonable one itself under normal
circumstances. That's what I think people really want, even if it's not
what they're asking for.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#19marcin mank
marcin.mank@gmail.com
In reply to: Tom Lane (#10)
Re: per table random-page-cost?

On Tue, Oct 20, 2009 at 1:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If the parameter is defined as "the chance that a page is in cache"

there is very real physical meaning to it.

We have no such parameter...

What a simple person like me would think would work is:

- call the parameter "cached_probability".
- Invent a way to store it (I`d actually try to do it the exact same
way recent "alter table set statistics distinct" does it)

a) less radical idea: replace all usage of random_page_cost with
seq_page_cost * cached_probability + random_page_cost *
(1-cached_probability)

b) more radical idea:
b1) invent a new GUC: cached_page_cost
b2) replace all usage of seq_page_cost with cached_page_cost *
cached_probability + seq_page_cost * (1-cached_probability)
b3) replace all usage of random_page_cost with cached_page_cost *
cached_probability + random_page_cost * (1-cached_probability)

How would you come up with sensible figures for this hypothetical parameter?

select schemaname,relname,heap_blks_hit /
cast(heap_blks_read+heap_blks_hit+1 as float)
from pg_statio_all_tables

Would be a nice starting point.

Greetings
Marcin

#20Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Greg Smith (#18)
Re: per table random-page-cost?

Greg Smith <gsmith@gregsmith.com> wrote:

The unfortunate reality of accounts receivable is that reports run
to list people who owe one money happen much more often than posting
payments into the system does.

How often do you have to print a list of past due accounts? I've
generally seen that done weekly or monthly, in the same places that
there are many people standing full time in payment windows just to
collect money from those lining up to pay. When they bring in a
document which identifies the receivable (like, in our case, a traffic
or parking ticket), there's no need to look at any older data in the
database.

Heck, even our case management applications likely follow the 90% to
95% cache hit pattern in counties which aren't fully cached, since
there's a lot more activity on court cases filed this year and last
year than on cases filed 30 years ago.

-Kevin

#21Greg Smith
gsmith@gregsmith.com
In reply to: Kevin Grittner (#20)
#22Cédric Villemain
cedric.villemain@dalibo.com
In reply to: Greg Smith (#18)
#23Cédric Villemain
cedric.villemain@dalibo.com
In reply to: Robert Haas (#2)
#24Cédric Villemain
cedric.villemain@dalibo.com
In reply to: Bruce Momjian (#3)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Cédric Villemain (#24)
#26Bruce Momjian
bruce@momjian.us
In reply to: Cédric Villemain (#24)
#27Josh Berkus
josh@agliodbs.com
In reply to: Bruce Momjian (#26)
#28Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Bruce Momjian (#26)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#27)
#30Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#28)
#31Joshua D. Drake
jd@commandprompt.com
In reply to: Robert Haas (#25)
#32Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#26)
#33Cédric Villemain
cedric.villemain@dalibo.com
In reply to: Joshua D. Drake (#31)
#34Robert Haas
robertmhaas@gmail.com
In reply to: Cédric Villemain (#33)
#35Cédric Villemain
cedric.villemain@dalibo.com
In reply to: Robert Haas (#34)
#36Josh Berkus
josh@agliodbs.com
In reply to: Cédric Villemain (#33)
#37Stefan Moeding
pgsql@moeding.net
In reply to: Josh Berkus (#36)
#38Cédric Villemain
cedric.villemain@dalibo.com
In reply to: Josh Berkus (#36)
#39Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#29)