per-tablespace random_page_cost/seq_page_cost

Started by Robert Haasover 16 years ago30 messageshackers
Jump to latest
#1Robert Haas
robertmhaas@gmail.com

On Mon, Oct 19, 2009 at 6:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Per-table is not physically sensible.  Per-tablespace has some rationale
to it.

I took a look at this and it seems fairly straightforward. It
basically requires (1) deciding where and how to store per-tablespace
defaults, (2) making those defaults conveniently available to
cost_seqscan(), cost_index(), cost_bitmap_heap_scan(), cost_tidscan(),
and genericcostestimate(), and (3) deciding on some syntax.

As to (1), my thought is to add two new float8 columns to
pg_tablespace. The naming is a little awkward, because
random_page_cost and seq_page_cost would not fit with our (rather odd)
convention for naming system catalog columns. I'm tempted to call
them spcrandompagecost and spcseqpagecost, but I wonder if anyone has
any strong preferences.

As to (2), it looks like we could pretty add reltablespace to
RangeTblEntry (for tables) and IndexOptInfo (for indices). This could
be populated in addRangeTblEntry()/addRangeTblEntryForRelation() for
tables, and in get_relation_info() for indices, essentially for free.
Then the above-mentioned functions that need to use the page costs
could just call a function with a name like
get_tablespace_page_costs() and pass the tablespace OID.

As things stand today, that function would need to scan
pg_tablespace_oid_index to find the correct heap tuple, because there
is no catcache for pg_tablespace entries. I'm not sure whether that's
something that would need to be changed.

As to (3), I was thinking about:

ALTER TABLESPACE name SET ( parameter = value [, ...] )

...where parameter is either seq_page_cost or random_page_cost. The
parentheses are for parity with ALTER TABLE, which employs them so as
to allow change storage parameters and making other table
modifications with a single command. I don't see any immediate need
to support that for ALTER TABLESPACE, which doesn't have many options
at present, but neither do I see any reason to pick a deliberately
incompatible syntax, in case someone wants to implement it in the
future.

Arguably, you would expect parameters set using this syntax to be
stored similar to reloptions - that is, as text[]. But as we're going
to need these values multiple times per table to plan any non-trivial
query, I don't want to inject unnecessary parsing overhead and code
complexity.

Thoughts? Comments? Reservations?

...Robert

#2Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#1)
Re: per-tablespace random_page_cost/seq_page_cost

On Sun, Oct 25, 2009 at 9:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Arguably, you would expect parameters set using this syntax to be
stored similar to reloptions - that is, as text[].  But as we're going
to need these values multiple times per table to plan any non-trivial
query, I don't want to inject unnecessary parsing overhead and code
complexity.

Two comments, perhaps complementary, though I'm not sure of either answer.

1 Would we rather the storage scheme allow for future GUCs to be
easily moved to per-tablespace as well without changing the catalog
schema for every option? (Someone might accuse me of trolling the
anti-EAV people here though...)

2 Would it make sense to slurp these options from the tablespace
options into the relcache when building the relcache entry for a
table? That would make the storage format in the tablespace options
much less relevant. It might even make the catcache less important
too.

--
greg

#3Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#1)
Re: per-tablespace random_page_cost/seq_page_cost

Robert,

As to (1), my thought is to add two new float8 columns to
pg_tablespace. The naming is a little awkward, because
random_page_cost and seq_page_cost would not fit with our (rather odd)
convention for naming system catalog columns. I'm tempted to call
them spcrandompagecost and spcseqpagecost, but I wonder if anyone has
any strong preferences.

I'm thinking an array, in case we want to make other tablespace cost
parameters in the future.* Or, better, whatever structure we're
currently using for ROLEs.

(* for example, if someone does manage a filesystem with a separate
cache space per mount, then we'd want effective_cache_size to be
tablespace-based as well)

--Josh

#4Bruce Momjian
bruce@momjian.us
In reply to: Josh Berkus (#3)
Re: per-tablespace random_page_cost/seq_page_cost

On Mon, Oct 26, 2009 at 3:05 PM, Josh Berkus <josh@agliodbs.com> wrote:

I'm thinking an array, in case we want to make other tablespace cost
parameters in the future.*  Or, better, whatever structure we're
currently using for ROLEs.

(* for example, if someone does manage a filesystem with a separate
cache space per mount, then we'd want effective_cache_size to be
tablespace-based as well)

Still far from convinced on that one. But effective_io_concurrency
should be included even in the first pass.

--
greg

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#4)
Re: per-tablespace random_page_cost/seq_page_cost

Greg Stark <gsstark@mit.edu> writes:

Still far from convinced on that one. But effective_io_concurrency
should be included even in the first pass.

I think a design that is limited to a prespecified set of GUCs is
broken by definition. It'd be better to make it work like
ALTER DATABASE SET.

regards, tom lane

#6Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#5)
Re: per-tablespace random_page_cost/seq_page_cost

Hi,

On Tuesday 27 October 2009 00:42:39 Tom Lane wrote:

Greg Stark <gsstark@mit.edu> writes:

Still far from convinced on that one. But effective_io_concurrency
should be included even in the first pass.

I think a design that is limited to a prespecified set of GUCs is
broken by definition. It'd be better to make it work like
ALTER DATABASE SET.

How should that work if there are conflicting settings in two tablespaces when
tables from both are used?
Some settings make sense per tablespace, but I dont see a valid model to
accept e.g. 'standard_conforming_strings' set to 'off' in one and 'on' in the
other...

Andres

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#5)
Re: per-tablespace random_page_cost/seq_page_cost

Tom Lane escribi�:

Greg Stark <gsstark@mit.edu> writes:

Still far from convinced on that one. But effective_io_concurrency
should be included even in the first pass.

I think a design that is limited to a prespecified set of GUCs is
broken by definition. It'd be better to make it work like
ALTER DATABASE SET.

Well, not exactly like ALTER DATABASE SET because those are now stored
in pg_db_role_setting. But a new spcoptions column storing an array of
key/value pairs seems a reasonable way to do it.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#6)
Re: per-tablespace random_page_cost/seq_page_cost

Andres Freund <andres@anarazel.de> writes:

On Tuesday 27 October 2009 00:42:39 Tom Lane wrote:

I think a design that is limited to a prespecified set of GUCs is
broken by definition. It'd be better to make it work like
ALTER DATABASE SET.

How should that work if there are conflicting settings in two tablespaces when
tables from both are used?

Well, most of the settings that would be sensible for this are used in
cost estimates that are basically per-table or per-index, so I don't
think it's a huge problem in practice. But I should clarify my comment:
the set of GUCs used this way must not be wired into the catalog
structure. I think that the code will only pay attention to certain
GUCs that are valid in-context, but we shouldn't have to redesign the
catalog each time we add one.

regards, tom lane

In reply to: Alvaro Herrera (#7)
Re: per-tablespace random_page_cost/seq_page_cost

Alvaro Herrera escreveu:

Tom Lane escribi�:

Greg Stark <gsstark@mit.edu> writes:

Still far from convinced on that one. But effective_io_concurrency
should be included even in the first pass.

I think a design that is limited to a prespecified set of GUCs is
broken by definition. It'd be better to make it work like
ALTER DATABASE SET.

Well, not exactly like ALTER DATABASE SET because those are now stored
in pg_db_role_setting. But a new spcoptions column storing an array of
key/value pairs seems a reasonable way to do it.

+1. That's what I have in mind too.

--
Euler Taveira de Oliveira
http://www.timbira.com/

#10Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#8)
Re: per-tablespace random_page_cost/seq_page_cost

On Mon, Oct 26, 2009 at 11:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On Tuesday 27 October 2009 00:42:39 Tom Lane wrote:

I think a design that is limited to a prespecified set of GUCs is
broken by definition.  It'd be better to make it work like
ALTER DATABASE SET.

How should that work if there are conflicting settings in two tablespaces when
tables from both are used?

Well, most of the settings that would be sensible for this are used in
cost estimates that are basically per-table or per-index, so I don't
think it's a huge problem in practice.  But I should clarify my comment:
the set of GUCs used this way must not be wired into the catalog
structure.  I think that the code will only pay attention to certain
GUCs that are valid in-context, but we shouldn't have to redesign the
catalog each time we add one.

These don't exactly fit into the GUC framework because AIUI a GUC is a
global variable, and the function of the GUC machinery is simply to
make sure that the global variable in question is set to the right
value at the right time. These are really more like reloptions (that
may happen to have the same name as GUCs, I suppose) - always in
effect, but only for a particular object.

I confess that I'm a bit mystified about the design of the reloptions
stuff. It seems kind of odd to store structured data as text[]; it's
kind of the opposite of what I would normally recommend as good
database design.

...Robert

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#10)
Re: per-tablespace random_page_cost/seq_page_cost

Robert Haas <robertmhaas@gmail.com> writes:

I confess that I'm a bit mystified about the design of the reloptions
stuff. It seems kind of odd to store structured data as text[]; it's
kind of the opposite of what I would normally recommend as good
database design.

It's definitely a bit EAV-ish :-(. But I don't see any particularly
easy way to modify it to store bool/int/float parameters in their native
types; do you?

regards, tom lane

#12David Fetter
david@fetter.org
In reply to: Tom Lane (#11)
Re: per-tablespace random_page_cost/seq_page_cost

On Tue, Oct 27, 2009 at 09:13:29PM -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I confess that I'm a bit mystified about the design of the
reloptions stuff. It seems kind of odd to store structured data as
text[]; it's kind of the opposite of what I would normally
recommend as good database design.

It's definitely a bit EAV-ish :-(. But I don't see any particularly
easy way to modify it to store bool/int/float parameters in their
native types; do you?

More columns, each of the correct type, with the table constraint that
at most one may be populated is how I handle stuff like that.

Cheers
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Fetter (#12)
Re: per-tablespace random_page_cost/seq_page_cost

David Fetter escribi�:

On Tue, Oct 27, 2009 at 09:13:29PM -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I confess that I'm a bit mystified about the design of the
reloptions stuff. It seems kind of odd to store structured data as
text[]; it's kind of the opposite of what I would normally
recommend as good database design.

It's definitely a bit EAV-ish :-(. But I don't see any particularly
easy way to modify it to store bool/int/float parameters in their
native types; do you?

More columns, each of the correct type, with the table constraint that
at most one may be populated is how I handle stuff like that.

So we would have "spcintoptions", "spcfloatoptions", and so on? (I
don't see the need for the table constraint in this case.)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#14Robert Haas
robertmhaas@gmail.com
In reply to: David Fetter (#12)
Re: per-tablespace random_page_cost/seq_page_cost

On Tue, Oct 27, 2009 at 9:20 PM, David Fetter <david@fetter.org> wrote:

On Tue, Oct 27, 2009 at 09:13:29PM -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I confess that I'm a bit mystified about the design of the
reloptions stuff. It seems kind of odd to store structured data as
text[]; it's kind of the opposite of what I would normally
recommend as good database design.

It's definitely a bit EAV-ish :-(.  But I don't see any particularly
easy way to modify it to store bool/int/float parameters in their
native types; do you?

More columns, each of the correct type, with the table constraint that
at most one may be populated is how I handle stuff like that.

I don't see why we'd need to constrain more than one from being
populated, but yeah, that's basically what I was thinking: one column
per parameter, of the appropriate type. That might not be such a good
model if the number of possible options was really large, but at this
point there's no reason to believe that will be the case for either
reloptions or the proposed spcoptions.

For things like autovacuum options, the actual representation probably
doesn't matter much because I'm guessing that the amount of work being
done by vacuum dwarfs the parsing cost, and it's a background task
anyway. But this seems like a less solid argument for things like
fillfactor and the proposed per-tablespace
seq_page_cost/random_page_cost, which will be accessed by many queries
and in the latter case often more than once. Ideally (or so it seems
to me) you'd like to fetch those things out of Form_pg_whatever rather
than parsing text strings to get at 'em.

...Robert

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#14)
Re: per-tablespace random_page_cost/seq_page_cost

Robert Haas <robertmhaas@gmail.com> writes:

For things like autovacuum options, the actual representation probably
doesn't matter much because I'm guessing that the amount of work being
done by vacuum dwarfs the parsing cost, and it's a background task
anyway. But this seems like a less solid argument for things like
fillfactor and the proposed per-tablespace
seq_page_cost/random_page_cost, which will be accessed by many queries
and in the latter case often more than once. Ideally (or so it seems
to me) you'd like to fetch those things out of Form_pg_whatever rather
than parsing text strings to get at 'em.

I think efficiency arguments here are hogwash. In the first place,
we'd certainly cache the results someplace (relcache or something like
it) if retrieve performance seems to be a bottleneck at all. In the
second place, composite types are so hugely inefficient as to swamp any
gain you'd get from the columns being the right type once you got at
them. (atoi and friends are cheap by comparison.)

It's possible that changing this is worthwhile on logical cleanliness
grounds; but I think it will be a net loss in efficiency, and definitely
a net loss in terms of code complexity at the C level.

regards, tom lane

#16Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#15)
Re: per-tablespace random_page_cost/seq_page_cost

On Tue, Oct 27, 2009 at 10:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

For things like autovacuum options, the actual representation probably
doesn't matter much because I'm guessing that the amount of work being
done by vacuum dwarfs the parsing cost, and it's a background task
anyway.  But this seems like a less solid argument for things like
fillfactor and the proposed per-tablespace
seq_page_cost/random_page_cost, which will be accessed by many queries
and in the latter case often more than once.  Ideally (or so it seems
to me) you'd like to fetch those things out of Form_pg_whatever rather
than parsing text strings to get at 'em.

I think efficiency arguments here are hogwash.  In the first place,
we'd certainly cache the results someplace (relcache or something like
it) if retrieve performance seems to be a bottleneck at all.  In the
second place, composite types are so hugely inefficient as to swamp any
gain you'd get from the columns being the right type once you got at
them.  (atoi and friends are cheap by comparison.)

We must be talking about different things, because I can't believe
this is true of what I'm thinking about. I'm not suggesting having a
column called reloptions of composite type; I'm suggesting that an
option like fillfactor would have its very own table column, just like
relpages or relhasindex. Surely that's gotta be faster than text; it
overlays onto a C struct, which is about as fast as it gets.

I agree that caching mitigates many of the problems with this from an
efficiency standpoint, possibly to the point where it isn't worth
caring about. But it does seem grotty, and I feel like it has to cost
something: we read in an array of text[] and convert it to a C struct,
which is exactly the form it would already be in if we just made it
part of Form_pg_class in the first place. The only way I can think
that the current representation could be faster is that when there are
NO reloptions at all, the parsing step can be skipped, and yet overall
Form_pg_class is smaller than it would be otherwise, which is of some
miniscule benefit.

It's possible that changing this is worthwhile on logical cleanliness
grounds; but I think it will be a net loss in efficiency, and definitely
a net loss in terms of code complexity at the C level.

One of my concerns about the current representation is that it doesn't
seem easily generalizable to objects that are not in pg_class, such as
tablespaces. I fear that supporting spcoptions as text[] along the
lines of reloptions will require quite a bit of refactoring to avoid
code duplication, whereas adding a few new columns to pg_tablespace
and maybe making a syscache for it looks pretty simple. On the other
hand, that would leave us with completely different representations
for essentially the same sort of data, which isn't particularly
appealing either.

...Robert

#17Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#11)
Re: per-tablespace random_page_cost/seq_page_cost

On Tue, Oct 27, 2009 at 9:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I confess that I'm a bit mystified about the design of the reloptions
stuff. It seems kind of odd to store structured data as text[]; it's
kind of the opposite of what I would normally recommend as good
database design.

It's definitely a bit EAV-ish :-(.  But I don't see any particularly
easy way to modify it to store bool/int/float parameters in their native
types; do you?

Looking at this a little more, it seems that part of the motivation
for the existing design is that user-created AMs might require
arbitrary options, which therefore can't be wired into the system
catalogs. There's no equivalent for tablespaces (we could add one
some day, I suppose), so there's less intrinsic reason to think we
have to do it that way.

It doesn't actually look like it would be too terrible to un-hard-wire
the on-disk representation. The existing transformRelOptions() and
untransformRelOptions() code could be made to handle whatever is
stored in reloptions[]; but we could arrange to remove (in the case of
transform) or insert (in the case untransform) the DefElems for any
options stored elsewhere before those functions are called. There is
an existing hack to do this for "oids" which could probably be cleaned
up and made part of some more general structure.

However... as you basically already said, it's not entirely clear that
this solves any real problem. The problem is not so much that the
design is too tightly coupled to a particular storage representation
as that it is too tightly coupled to pg_class - starting with the name
reloptions, which would be inapposite for options associated with a
tablespace, schema, etc. That hasn't stopped the foreign data wrapper
stuff from reaching in and unceremoniously borrowing a few functions
like transformRelOptions, whose comment is now mildly incorrect in
enumerating the ways the function is used.

I don't see anything in this code that is very rel-specific, so I
think it would be possible to implement spcoptions by just defining
RELOPT_KIND_TABLESPACE and ignoring the irony, but that has enough of
an unsavory feeling that I'm sure someone is going to complain about
it... I suppose we could go through and systematically rename all
instances of reloptions to ent(ity)options or storageoptions or
gen(eric)options or somesuch...

...Robert

#18Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#17)
Re: per-tablespace random_page_cost/seq_page_cost

On Sat, Oct 31, 2009 at 6:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Looking at this a little more, it seems that part of the motivation
for the existing design is that user-created AMs might require
arbitrary options, which therefore can't be wired into the system
catalogs.  There's no equivalent for tablespaces (we could add one
some day, I suppose), so there's less intrinsic reason to think we
have to do it that way.

Can't custom modules define arbitrary options which they declare can
be defined per tablespace?

We could have a column for all booleans, a column for all integers,
etc. but that's not really any more normalized than having a single
column for all the types with a rule for how to marshal each value
type.

--
greg

#19Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#18)
Re: per-tablespace random_page_cost/seq_page_cost

On Nov 1, 2009, at 7:43 AM, Greg Stark <gsstark@mit.edu> wrote:

On Sat, Oct 31, 2009 at 6:04 PM, Robert Haas <robertmhaas@gmail.com>
wrote:

Looking at this a little more, it seems that part of the motivation
for the existing design is that user-created AMs might require
arbitrary options, which therefore can't be wired into the system
catalogs. There's no equivalent for tablespaces (we could add one
some day, I suppose), so there's less intrinsic reason to think we
have to do it that way.

Can't custom modules define arbitrary options which they declare can
be defined per tablespace?

Yeah, probably we can support that for free, although I'm not sure
there is much demand for it.

We could have a column for all booleans, a column for all integers,
etc. but that's not really any more normalized than having a single
- how to marshal each value
type.

That has no advantages and several disadvantages AFAICS.

I don't want to get sidetracked here. The real issue is the one I
discussed in the portion of the email you didn't quote...

...Robert

#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#17)
Re: per-tablespace random_page_cost/seq_page_cost

Robert Haas escribi�:

I don't see anything in this code that is very rel-specific, so I
think it would be possible to implement spcoptions by just defining
RELOPT_KIND_TABLESPACE and ignoring the irony, but that has enough of
an unsavory feeling that I'm sure someone is going to complain about
it... I suppose we could go through and systematically rename all
instances of reloptions to ent(ity)options or storageoptions or
gen(eric)options or somesuch...

Maybe I missed part of the discussion, but do these really need to be
handled like reloptions instead of like datoptions? Perhaps the
deciding factor is that we want to parse them once and store them in a
cache, so like reloptions; the others are used once per connection and
then thrown away.

If this is the case, then I think we could just decide that their name
is reloptions due to hysterical reasons and be done with it.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#21Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Bruce Momjian (#18)
#22Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dimitri Fontaine (#21)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Dimitri Fontaine (#21)
#24Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#20)
#25Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#24)
#26Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Robert Haas (#23)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#25)
#28Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#27)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Dimitri Fontaine (#26)
#30Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#28)