Partitioning such that key field of inherited tables no longer retains any selectivity

Started by Tim Kanealmost 12 years ago6 messagesgeneral
Jump to latest
#1Tim Kane
tim.kane@gmail.com

The subject line may not actually describe what I want to illustrate…

Basically, let’s say we have a nicely partitioned data-set. Performance is a
net win and I’m happy with it.
The partitioning scheme is equality based, rather than range based.

That is, each partition contains a subset of the data where partition_key =
{some_value}, and of course we let constraint exclusion enable the optimiser
to do its thing.

As such, all of the data contained in a given partition has the same value
for partition_key. That field, within the scope of its partition – isn’t
terribly useful anymore, and in my mind is wasting bytes – it’s only purpose
really is to allow the CHECK constraint to verify the data is what it should
be.

Wouldn’t it be nice if we could somehow create a child table where we could
define a const field value, that did not need to be stored on disk at the
tuple level?
This would allow the check constraint to supply the optimiser with the
information it needs, while removing the need to consume disk to record a
field whose value is always the same.

Extending this idea..
Postgresql could possibly look at any equality based check constraint for a
table and instead of storing each field value verbatim, we could implicitly
optimise away the need to write those field values to disk, on the
understanding that those values can never change (unless the constraint is
removed/altered).

I’m sure there are all kinds of worms in this canister, but I thought it
might be an interesting discussion.

Cheers,

Tim

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Tim Kane (#1)
Re: Partitioning such that key field of inherited tables no longer retains any selectivity

Tim Kane wrote

The subject line may not actually describe what I want to illustrate…

Basically, let’s say we have a nicely partitioned data-set. Performance is
a
net win and I’m happy with it.
The partitioning scheme is equality based, rather than range based.

That is, each partition contains a subset of the data where partition_key
=
{some_value}, and of course we let constraint exclusion enable the
optimiser
to do its thing.

As such, all of the data contained in a given partition has the same value
for partition_key. That field, within the scope of its partition – isn’t
terribly useful anymore, and in my mind is wasting bytes – it’s only
purpose
really is to allow the CHECK constraint to verify the data is what it
should
be.

Wouldn’t it be nice if we could somehow create a child table where we
could
define a const field value, that did not need to be stored on disk at the
tuple level?
This would allow the check constraint to supply the optimiser with the
information it needs, while removing the need to consume disk to record a
field whose value is always the same.

Extending this idea..
Postgresql could possibly look at any equality based check constraint for
a
table and instead of storing each field value verbatim, we could
implicitly
optimise away the need to write those field values to disk, on the
understanding that those values can never change (unless the constraint is
removed/altered).

I’m sure there are all kinds of worms in this canister, but I thought it
might be an interesting discussion.

Cheers,

Tim

Two approaches:
1. Standard virtual column name that, when used, gets rewritten into a
constant that is stored at the table level.
2. A way for a column's value to be defined as a function call.

Option 2 has the virtue of being more generally applicable but you'd need
some way to know that for any give table that a given function resolves to a
constant. Maybe have a magic function like partitonid(tabloid) that if used
in a query would be interpreted in this way. Combined with option 1 and the
stand column could be pre-defined in this way - if the partition constant
exists which is the main thing to avoid - increased checking/rewriting time
for non-partitioned tables.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Partitioning-such-that-key-field-of-inherited-tables-no-longer-retains-any-selectivity-tp5803549p5803561.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#2)
Re: Re: Partitioning such that key field of inherited tables no longer retains any selectivity

David G Johnston <david.g.johnston@gmail.com> writes:

Two approaches:
1. Standard virtual column name that, when used, gets rewritten into a
constant that is stored at the table level.
2. A way for a column's value to be defined as a function call.

Recent versions of the SQL spec have a notion of "generated columns"
that I think subsumes both of these concepts. We had a draft patch
awhile back that attempted to implement that feature. It crashed
and burned for reasons I don't recall ... but certainly implementing
an already-standardized feature is more attractive than just inventing
behavior on our own.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Tim Kane
tim.kane@gmail.com
In reply to: Tom Lane (#3)
Re: Re: Partitioning such that key field of inherited tables no longer retains any selectivity

From: Tom Lane <tgl@sss.pgh.pa.us>
David G Johnston <david.g.johnston@gmail.com> writes:

Two approaches:
1. Standard virtual column name that, when used, gets rewritten into a
constant that is stored at the table level.
2. A way for a column's value to be defined as a function call.

Recent versions of the SQL spec have a notion of "generated columns"
that I think subsumes both of these concepts. We had a draft patch
awhile back that attempted to implement that feature. It crashed
and burned for reasons I don't recall ... but certainly implementing
an already-standardized feature is more attractive than just inventing
behavior on our own.

That sounds interesting.
Is this what you are referring to? Actually, it looks like it would fit the
bill and then some.

—————————————————
4.14.8 Base columns and generated columns

A column of a base table is either a base column or a generated column. A
base column is one that is not a generated column. A generated column is one
whose values are determined by evaluation of a generation expression, a
<value expression> whose declared type is by implication that of the column.
A generation expression can reference base columns of the base table to
which it belongs but cannot otherwise access SQL- data. Thus, the value of
the field corresponding to a generated column in row R is determined by the
values of zero or more other fields of R.

A generated column GC depends on each column that is referenced by a <column
reference> in its generation expression, and each such referenced column is
a parametric column of GC.

—————————————————

#5David G. Johnston
david.g.johnston@gmail.com
In reply to: Tim Kane (#4)
Re: Partitioning such that key field of inherited tables no longer retains any selectivity

On Sun, May 11, 2014 at 2:08 PM, Tim Kane [via PostgreSQL] <
ml-node+s1045698n5803574h18@n5.nabble.com> wrote:

From: Tom Lane <[hidden email]<http://user/SendEmail.jtp?type=node&amp;node=5803574&amp;i=0&gt;

David G Johnston <[hidden email]<http://user/SendEmail.jtp?type=node&amp;node=5803574&amp;i=1&gt;&gt;
writes:

Two approaches:
1. Standard virtual column name that, when used, gets rewritten into a
constant that is stored at the table level.
2. A way for a column's value to be defined as a function call.

Recent versions of the SQL spec have a notion of "generated columns"
that I think subsumes both of these concepts. We had a draft patch
awhile back that attempted to implement that feature. It crashed
and burned for reasons I don't recall ... but certainly implementing
an already-standardized feature is more attractive than just inventing
behavior on our own.

That sounds interesting.
Is this what you are referring to? Actually, it looks like it would fit
the bill and then some.

—————————————————

4.14.8 Base columns and generated columns

A column of a base table is either a base column or a generated column. A
base column is one that is not a generated column. A generated column is
one whose values are determined by evaluation of a generation expression,
a <value expression> whose declared type is by implication that of the
column. A generation expression can reference base columns of the base
table to which it belongs but cannot otherwise access SQL- data. Thus, the
value of the field corresponding to a generated column in row R is
determined by the values of zero or more other fields of R.

A generated column GC depends on each column that is referenced by a
<column reference> in its generation expression, and each such referenced
column is a parametric column of GC.

—————————————————

​This is basically what I intended to describe in "option 2"...without the
benefit of ever having really read the SQL standard.

So the planner would have to know that, for a given table, the generation
expression results in a constant - would likely in fact have to be a
constant expression like, assuming a non-number value, ='column_value',
where the "=" sign indicates that this is a generation expression and not a
stored value (like default behaves currently).

Given that value of the partition column is constant, and each child table
has a name, is there some way, with the current implementation, to write a
query like:

SELECT *
FROM table_hierarchy
WHERE tableoid = ANY( get_tableoids_as_array('TABLE_NAME1'[,VARARGS]))

and have exclusion constraints work correctly?

Also, toward that end, it would seem that in this particular situation you
could accomplish much the same by using dynamic SQL; though I guess that
would depend on whether any given query needs to be able to return values
from more than one table.

My fluency with respect to inheritance is poor so please forgive if I'm out
in the rough on this one.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Partitioning-such-that-key-field-of-inherited-tables-no-longer-retains-any-selectivity-tp5803549p5803579.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#6Rafal Pietrak
rafal@ztk-rp.eu
In reply to: David G. Johnston (#5)
Re: Re: Partitioning such that key field of inherited tables no longer retains any selectivity

W dniu 11.05.2014 22:01, David G Johnston pisze:

On Sun, May 11, 2014 at 2:08 PM, Tim Kane [via PostgreSQL] <[hidden
email] </user/SendEmail.jtp?type=node&node=5803579&i=0>>wrote:

[------------------]

​This is basically what I intended to describe in "option 2"...without
the benefit of ever having really read the SQL standard.

So the planner would have to know that, for a given table, the
generation expression results in a constant - would likely in fact
have to be a constant expression like, assuming a non-number value,
='column_value', where the "=" sign indicates that this is a
generation expression and not a stored value (like default behaves
currently).

wouldn't it be ways better, if the constraints for partitioning by
inharitance were set at the "master" table, instead of the way it's
currently done at the inharited tables (as exclusive CHECK-s there)?

I mean a constraint like a "function(table columns) reutrning table_name
or tablespace_name of the actual target table"?

<start preudocode>
create table master (a int, b int, c int);
create table table_a (inharits master);
create table table_b (inharits master);

create function(a,b) returns text as $$ if a > b then return "table_a"
else return "table_b"; end if; end $$
... or:
create function(a,b) returns tablespace as $$ if a > b then return
tablespace("table_a") else return tablespace("table_b"); end if; end $$

alter table master add constraint "partitioning" check/select/route
function(a,b);
<end pseudocode>

-R