A Guide to Constraint Exclusion (Partitioning)

Started by Simon Riggsover 20 years ago37 messageshackers
Jump to latest
#1Simon Riggs
simon@2ndQuadrant.com

A Guide to the Constraint Exclusion Feature
===========================================

Simon Riggs 2ndQuadrant simon@2ndquadrant.com

INTRODUCTION

Constraint Exclusion (CE) is an optimizer patch submitted for PostgreSQL
8.1. CE aims to greatly improve the performance for certain types of
common queries against large tables. CE extends the logic originally
developed for Partial Indexes to allow the optimizer to avoid scanning
particular tables for certain queries. CE will exclude a table from the
query plan when the query's WHERE clause is proven not to "overlap" with
any of the Constraints defined on those tables.

No new syntax is required to take advantage of this feature, which is
designed to work in conjunction with existing Inheritance features. Some
syntax *is* required to declare Constraints, which are already
ANSI/ISO SQL:2003 compliant.

In other RDBMS these features are often referred to as Partitioning. The
PostgreSQL CE feature is much more widely applicable than that name
might suggest and will be of benefit to many users, not just for
Business Intelligence applications.

The aim of this guide is to help prospective users understand how the
feature will work, to allow improvement during the 8.1 beta test cycle.

All feedback is welcome.

TABLE PARTITIONING OVERVIEW

Let's look at a practical example of how this works and the performance
benefits it provides:

In many BI workloads there are a small number of very large tables,
often referred to as Fact tables. In order to improve load and query
performance against those tables it is often useful to split these
larger tables into pieces, known as partitions.

PostgreSQL 8.0 could support pseudo-partitioning using the Inheritance
feature. Here's a BI Fact table for a Retail DW as an example of that:

CREATE TABLE Sales_DateItemOutlet
( DateKey Integer
, OutletKey Integer
, ItemKey Integer
, SoldQty Integer
);

This table could be accessed using a query like
SELECT sum(soldqty) FROM Sales_DateItemOutlet
WHERE DateKey between 20050101 and 20050101

This can then be split up so that we have a number of sub-tables:

CREATE TABLE Sales_Jan_DateItemOutlet
() INHERITS (Sales_DateItemOutlet);

CREATE TABLE Sales_Feb_DateItemOutlet
() INHERITS (Sales_DateItemOutlet);

CREATE TABLE Sales_Mar_DateItemOutlet
() INHERITS (Sales_DateItemOutlet);

When we load this we put January's data in the Jan table etc, though put
no rows at all in the "parent" table.

Now, if we run our example query again
SELECT sum(soldqty) FROM Sales_DateItemOutlet
WHERE DateKey between 20050101 and 20050101
we find that the query will
Scan all rows in Sales_DateItemOutlet (which is empty)
Scan all rows in Sales_Jan_DateItemOutlet
Scan all rows in Sales_Feb_DateItemOutlet
Scan all rows in Sales_Mar_DateItemOutlet
and return the correct answer. But we know that the query did not really
need to have scanned the Feb and Mar tables, since these do not contain
any data that would satisfy the query.

(The full EXPLAIN output is not shown above, for clarity only.)

The new CE functionality aims to improve the performance of such
queries. Ideally we would like the query to
Scan all rows in Sales_Jan_DateItemOutlet
and ignore the parent table, and all other child tables.

To do allow this, we must provide more declarative information to allow
the optimizer to understand as much as we do. To do this, we add
constraints on to each table, so that it is clear what rows they can
contain.

CREATE TABLE Sales_Jan_DateItemOutlet
( CHECK (DateKey BETWEEN 20050101 AND 20050131) )
INHERITS (Sales_DateItemOutlet);

CREATE TABLE Sales_Feb_DateItemOutlet
( CHECK (DateKey BETWEEN 20050201 AND 20050229) )
INHERITS (Sales_DateItemOutlet);

CREATE TABLE Sales_Mar_DateItemOutlet
( CHECK (DateKey BETWEEN 20050301 AND 20050331) )
INHERITS (Sales_DateItemOutlet);

Now, when we execute our test query, the optimizer can understand that
the Feb and Mar tables can *never* hold rows that would match our query.
As a result it is able to provably exclude them from the query plan.

Thus, with Constraint Exclusion enabled our test query
SELECT sum(soldqty) FROM Sales_DateItemOutlet
WHERE DateKey between 20050101 and 20050101
will perform the following scans
Scan all rows in Sales_DateItemOutlet (which is empty)
Scan all rows in Sales_Jan_DateItemOutlet

Running an EXPLAIN or EXPLAIN ANALYZE will allow you to see which tables
have been included in the query. There is no explicit message to say
that a table has been excluded.

We aren't able to exclude the parent table from the above query because
no Constraint was defined upon it. Since, in our example, the parent is
empty there will be little effect on the query performance. It would be
a mistake to attempt to get around this by placing a Constraint on the
parent, since that would then automatically be created on all child
tables also. So we can never exclude the parent without automatically
excluding *all* of the children also.

Overall then, if the partitions are all the same size, we will have
reduced execution time of the query to around one-third of its previous
elapsed time. If we had 10 or 100 child tables, certain queries could be
10 or 100 times faster than without the CE feature.

In summary, the CE feature will be a huge performance gain for
qualifying queries against large tables in PostgreSQL databases. Since
no new syntax is required, existing PostgreSQL databases designed to
take advantage of inheritance will automatically benefit from the
performance enhancements.

CONSTRAINT EXCLUSION

If you wish to enable this feature you must set the server parameter
enable_constraint_exclusion = true
This is a user-settable parameter, so could be enabled for specific
users or queries.

Constraint exclusion is not enabled by default. This is because
PostgreSQL does not yet have full plan invalidation when DDL changes are
made. If a change were made to any of the tables, the query would not
automatically re-optimize, so it would be possible to get an incorrect
answer returned from a CE plan. This is not a fault of CE, but simply a
side-effect of the current lack of full plan invalidation in PostgreSQL.

CE will work only when the query contains a direct and simple
restriction of the table, such as
Attribute > Constant

If Operators are available for that datatype, CE will use
= < <= > >= <>
These operators must be defined as IMMUTABLE.

CE will work for most simple Constraints. Constraints are already
limited to IMMUTABLE predicates, such as Attribute > Constant.

Constraints like
Attribute BETWEEN Const1 and Const2
can be used to produce ranges of values, and would be generally
described as RANGE PARTITIONING.

Constraints like
Attribute IN (Const1, Const2, Const3 ...)
can be used to produce lists of values, and would generally be described
as LIST PARTITIONING.

All of these Constraint types could be mixed, to allow Constraints like
Attribute1 IN (Const1, Const2)
AND Attribute2 >= Const3 AND Attribute2 < Const4

Thus, the CE feature allows a very flexible partitioning scheme to be
developed that could mix LIST, RANGE style partitioning clauses.

Currently, there is no restriction that all constraints *must* be
mutually exclusive, nor even that the constraints may be similar on each
table. This can be useful for some designs where the inheritance
hierarchy is not "disjoint", as UML would term this situation.

CE does not prevent direct access to one of the child tables in an
inheritance hierarchy. In this case, no exclusion test would be
performed. Exclusion tests are performed *only* when the parent table in
an inheritance hierarchy is accessed. Exclusion tests are performed even
if the inheritance hierarchy is many levels deep (e.g. parent-child-
grandchild). CE also supports multiple inheritance.

PostgreSQL makes no restriction upon what indices are defined on the
various child tables. CE is not dependent upon the existence or absence
of indices, only upon the Query's WHERE clause and the Constraints
defined upon the tables.

CE can be very useful for large historical databases. As an example,
PostgreSQL would allow a historical data table split into sections like
this:
- Last 3 months: one table per week, each table defined with 3 indices
- 3-12 months: one table per month, no indices defined
- 12-36 months: tablespaces defined on hierarchical storage (near-line),
with no indices, so de-archived when required for use.

The CE feature would allow the optimizer to avoid wasteful de-archiving
of the older data, with an appropriate database/query design.

The current patch also includes a suite of 700 tests that successfully
exclude child tables in all common query types (whilst returning the
correct answer!). These include examples of List, Range and mixed
partitioning scenarios.

CURRENT RESTRICTIONS

It is not yet possible to specify that Constraints on child tables will
be mutually exclusive of each other. Currently, it would be up to the
designer to ensure that, if desired.

It is not yet possible to specify that an inheritance parent has no
rows, and, if so, should always be excluded from the query.

If a parent table has a Constraint defined upon it, then this will be
automatically copied to all child tables. Currently, there is no way to
tell which Constraints have been inherited from the parent, so exclusion
tests will be re-executed against all child tables. This will cause
additional optimization time.

Currently, all child tables will be considered. It may be possible in
the future to pre-sort the list of child tables, so that optimization
time can be reduced for parent tables with large numbers of partitions.

Currently, there is no index on the pg_inherits system table. As a
result, parents with more than 1000 child tables are likely to
experience longer than desirable planning times for their queries.

CE checks will not currently recognise STABLE functions within a query.
So WHERE clauses such as
DateKey > CURRENT DATE
will not cause exclusion because CURRENT DATE is a STABLE function.

CE checks are not made when the parent table is involved in a join.

Other existing restrictions on Inherited tables continue to apply.

Further enhancements to the CE feature can be expected in the future.

COMPARISONS WITH OTHER RDBMS

In brief:

Teradata and Oracle already have table Partitioning. Oracle's
partitioning allows either LIST or RANGE partitioning. The new
PostgreSQL CE feature is considerably more flexible than Oracle's
declarative Partitioning syntax, and is achieved without adding
non-standard SQL extensions.

Both Sybase and MS SQLServer 2005 have new Partitioning features
released this year. The Microsoft feature does allow joins between
tables to exclude partitions. However, it also requires that all
partitions should have identical indexes, which is a major blocking
factor to the use of very large tables.

DB2 has partitioning using UNION ALL views, which includes somewhat
similar functionality to the PostgreSQL CE feature. DB2 uses the
PARTITION keyword to signify a different type of feature, so do not be
confused that they do have this feature with declarative syntax.

=================================================================

#2Richard Huxton
dev@archonet.com
In reply to: Simon Riggs (#1)
Re: A Guide to Constraint Exclusion (Partitioning)

Sounds very useful - even for my small systems. Does it/would it work on
an ordinary table (for those cases currently using UNION ALL)?

--
Richard Huxton
Archonet Ltd

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Richard Huxton (#2)
Re: A Guide to Constraint Exclusion (Partitioning)

On Wed, 2005-07-13 at 14:02 +0100, Richard Huxton wrote:

Sounds very useful - even for my small systems. Does it/would it work on
an ordinary table (for those cases currently using UNION ALL)?

I'm looking into that aspect right now.

I see no reason why it shouldn't work for UNION ALL views/queries in the
same manner as it does for inheritance.

Best Regards, Simon Riggs

#4Hannu Krosing
hannu@tm.ee
In reply to: Simon Riggs (#3)
Re: [Bizgres-general] Re: A Guide to Constraint

On K, 2005-07-13 at 16:10 +0100, Simon Riggs wrote:

On Wed, 2005-07-13 at 14:02 +0100, Richard Huxton wrote:

Sounds very useful - even for my small systems. Does it/would it work on
an ordinary table (for those cases currently using UNION ALL)?

I'm looking into that aspect right now.

I see no reason why it shouldn't work for UNION ALL views/queries in the
same manner as it does for inheritance.

Why does the CE speed depend on having index on pg_inherits ?

Can't you just check the constraints on individual tables ?

Or are the constraints not actually copied to child tables, and you must
look them up from parent tabless ?

--
Hannu Krosing <hannu@tm.ee>

#5Hannu Krosing
hannu@tm.ee
In reply to: Simon Riggs (#1)
Re: [Bizgres-general] A Guide to Constraint Exclusion

On K, 2005-07-13 at 11:53 +0100, Simon Riggs wrote:

We aren't able to exclude the parent table from the above query because
no Constraint was defined upon it. Since, in our example, the parent is
empty there will be little effect on the query performance. It would be
a mistake to attempt to get around this by placing a Constraint on the
parent, since that would then automatically be created on all child
tables also. So we can never exclude the parent without automatically
excluding *all* of the children also.

At least in 8.0 you can drop the inherited constraint from child table:

hannu=# create table empty_master_table (
hannu(#   id serial,
hannu(#   data text,
hannu(#   constraint table_must_be_empty check(false)
hannu(# );
NOTICE:  CREATE TABLE will create implicit sequence
"empty_master_table_id_seq" for "serial" column "empty_master_table.id"
CREATE TABLE
hannu=# insert into empty_master_table (data) values (NULL);
ERROR:  new row for relation "empty_master_table" violates check
constraint "tab le_must_be_empty"
hannu=# create table first_partition() inherits (empty_master_table);
CREATE TABLE
hannu=# \d first_partition
                            Table "public.first_partition"
 Column |  Type   |                             Modifiers
--------+---------
+--------------------------------------------------------------------
 id     | integer | not null default nextval
('public.empty_master_table_id_seq'::text)
 data   | text    |
Check constraints:
    "table_must_be_empty" CHECK (false)
Inherits: empty_master_table
hannu=# alter table first_partition drop constraint table_must_be_empty;
ALTER TABLE
hannu=# \d first_partition
                            Table "public.first_partition"
 Column |  Type   |                             Modifiers
--------+---------
+--------------------------------------------------------------------
 id     | integer | not null default nextval
('public.empty_master_table_id_seq'::text)
 data   | text    |
Inherits: empty_master_table
hannu=# \d empty_master_table
                           Table "public.empty_master_table"
 Column |  Type   |                             Modifiers
--------+---------
+--------------------------------------------------------------------
 id     | integer | not null default nextval
('public.empty_master_table_id_seq'::text)
 data   | text    |
Check constraints:
    "table_must_be_empty" CHECK (false)

hannu=# insert into first_partition(data) values ('first_partition');
INSERT 19501405 1
hannu=#

I imagine that this kind of thing does not work well with pg_dump, but
it is at least possible.

Currently, there is no restriction that all constraints *must* be
mutually exclusive, nor even that the constraints may be similar on each
table. This can be useful for some designs where the inheritance
hierarchy is not "disjoint", as UML would term this situation.

actually this is GOOD, as this way I can have a constraint on both
insert_timestamp and primary_key fields, which are mostly but not
absolutely in the same order. And also to add extra IN (X,Y,Z)
constraints for some other fields.

CE does not prevent direct access to one of the child tables in an
inheritance hierarchy. In this case, no exclusion test would be
performed. Exclusion tests are performed *only* when the parent table in
an inheritance hierarchy is accessed. Exclusion tests are performed even
if the inheritance hierarchy is many levels deep (e.g. parent-child-
grandchild). CE also supports multiple inheritance.

I'd like to see an option to ALWAYS do CE, inheritance or union (all) or
even simple queries.

CURRENT RESTRICTIONS

It is not yet possible to specify that Constraints on child tables will
be mutually exclusive of each other. Currently, it would be up to the
designer to ensure that, if desired.

It is not yet possible to specify that an inheritance parent has no
rows, and, if so, should always be excluded from the query.

I think that a simple "CHECK(false)" constraint should be enough for
this.

If a parent table has a Constraint defined upon it, then this will be
automatically copied to all child tables.

But they can be removed later if desired.

Currently, there is no way to
tell which Constraints have been inherited from the parent, so exclusion
tests will be re-executed against all child tables. This will cause
additional optimization time.

Have you done any performance testing, i.e. what is the actual impact of
CE on planning time ?

Currently, all child tables will be considered. It may be possible in
the future to pre-sort the list of child tables, so that optimization
time can be reduced for parent tables with large numbers of partitions.

Currently, there is no index on the pg_inherits system table. As a
result, parents with more than 1000 child tables are likely to
experience longer than desirable planning times for their queries.

Am I right that this is a general postgresql issue and has nothing to do
with CE ?

CE checks will not currently recognise STABLE functions within a query.
So WHERE clauses such as
DateKey > CURRENT DATE
will not cause exclusion because CURRENT DATE is a STABLE function.

CE checks are not made when the parent table is involved in a join.

Is this also the case where parent table is inside subquery and that
subquery is involved in a join?

Or do I have to make sure that it is not lifted out of that subquery
using something like pl/pgsql function ?

Other existing restrictions on Inherited tables continue to apply.

WHat happens for multiple inheritance ?

Is it detected and then also excluded from CE ?

Or is that just a "don't do it" item ?

Further enhancements to the CE feature can be expected in the future.

Great! :D

And a big Thank You!

--
Hannu Krosing <hannu@tm.ee>

#6Hannu Krosing
hannu@tm.ee
In reply to: Hannu Krosing (#5)
Re: [Bizgres-general] A Guide to Constraint Exclusion

On N, 2005-07-14 at 00:13 +0300, Hannu Krosing wrote:

On K, 2005-07-13 at 11:53 +0100, Simon Riggs wrote:

We aren't able to exclude the parent table from the above query because
no Constraint was defined upon it. Since, in our example, the parent is
empty there will be little effect on the query performance. It would be
a mistake to attempt to get around this by placing a Constraint on the
parent, since that would then automatically be created on all child
tables also. So we can never exclude the parent without automatically
excluding *all* of the children also.

hannu=# create table ttt(id serial, t text);
NOTICE: CREATE TABLE will create implicit sequence "ttt_id_seq" for
"serial" co lumn "ttt.id"
CREATE TABLE
hannu=# create table ttt2() inherits (ttt);
CREATE TABLE
hannu=# alter table ONLY ttt add constraint ccc check(false);
ALTER TABLE

this also works, but new inherited tables will still inherit the
constraint.

Perhaps we just need a way to store the "ONLY" status of the constraint,
and then not inherit these.

--
Hannu Krosing <hannu@skype.net>

#7Bruce Momjian
bruce@momjian.us
In reply to: Hannu Krosing (#5)
Re: [Bizgres-general] A Guide to Constraint Exclusion

Hannu Krosing <hannu@tm.ee> writes:

It is not yet possible to specify that an inheritance parent has no
rows, and, if so, should always be excluded from the query.

I think that a simple "CHECK(false)" constraint should be enough for
this.

huh, that's clever.

CE checks are not made when the parent table is involved in a join.

Is this also the case where parent table is inside subquery and that
subquery is involved in a join?

I assume this is a TODO and just not in your first batch of work? It seems
like a pretty important piece eventually. Are there any fundamental
difficulties with handling joins eventually?

--
greg

#8Simon Riggs
simon@2ndQuadrant.com
In reply to: Hannu Krosing (#5)
Re: [Bizgres-general] A Guide to Constraint Exclusion

On Thu, 2005-07-14 at 00:13 +0300, Hannu Krosing wrote:

On K, 2005-07-13 at 11:53 +0100, Simon Riggs wrote:

We aren't able to exclude the parent table from the above query because
no Constraint was defined upon it. Since, in our example, the parent is
empty there will be little effect on the query performance. It would be
a mistake to attempt to get around this by placing a Constraint on the
parent, since that would then automatically be created on all child
tables also. So we can never exclude the parent without automatically
excluding *all* of the children also.

At least in 8.0 you can drop the inherited constraint from child table:

Yes, you can. But in discussion on -hackers on 20 May there was
agreement (OK, me agreeing with Tom) that being able to drop inherited
constraints was a bug (or at least a deprecated feature...)

Currently, there is no restriction that all constraints *must* be
mutually exclusive, nor even that the constraints may be similar on each
table. This can be useful for some designs where the inheritance
hierarchy is not "disjoint", as UML would term this situation.

actually this is GOOD, as this way I can have a constraint on both
insert_timestamp and primary_key fields, which are mostly but not
absolutely in the same order. And also to add extra IN (X,Y,Z)
constraints for some other fields.

Yes, understood. My description was not very good. I did not mean that
constraints on any one table should be mutually exclusive. I meant that
a set of constraints across a set of child tables should be able to be
defined mutually exclusive (e.g. just as Oracle partitions are).

CE does not prevent direct access to one of the child tables in an
inheritance hierarchy. In this case, no exclusion test would be
performed. Exclusion tests are performed *only* when the parent table in
an inheritance hierarchy is accessed. Exclusion tests are performed even
if the inheritance hierarchy is many levels deep (e.g. parent-child-
grandchild). CE also supports multiple inheritance.

I'd like to see an option to ALWAYS do CE, inheritance or union (all) or
even simple queries.

I've argued against that, but I guess if there is enough opinion I can
be persuaded.

My argument is that most constraints look something like
CHECK ( salary > 0 and salary < 1000000)
If you ask a query like
select count(*) from emp where salary < -10000
then CE would work great. But you don't find many people asking such
nonsensical questions, or at least very often. Those queries do get
asked, but they seem either naive or "data profiling" type queries. So
my conclusion is that for most constraints and most queries, CE is
literally just a waste of planning time.

When a designer deliberately creates a scenario where constraints have
meaning, as in a mutually exclusive inheritance hierarchy or classic
partitioned table design, then you have a high chance of CE being
effective.

The indicator for the value of CE, IMHO, is the "and/or" aspect of
multiple related tables.

Thats my argument, but I'm willing to hear counter arguments. I just
wish to avoid overselling CE and slowing most people's queries down for
no good reason.

CURRENT RESTRICTIONS

It is not yet possible to specify that Constraints on child tables will
be mutually exclusive of each other. Currently, it would be up to the
designer to ensure that, if desired.

It is not yet possible to specify that an inheritance parent has no
rows, and, if so, should always be excluded from the query.

I think that a simple "CHECK(false)" constraint should be enough for
this.

That works, but as I said....

If a parent table has a Constraint defined upon it, then this will be
automatically copied to all child tables.

But they can be removed later if desired.

But should not be...

Currently, there is no way to
tell which Constraints have been inherited from the parent, so exclusion
tests will be re-executed against all child tables. This will cause
additional optimization time.

Have you done any performance testing, i.e. what is the actual impact of
CE on planning time ?

I think "bad" would be my assessment. But not "very bad". But this must
be offset by the extraordinary time saving in execution time.

Planning time depends upon the complexity of the query, the number and
complexity of the constraints and the number of child tables.

I think I would guess currently at child tables of around 1 GB, with
sizes as small as 100 MB being reasonable. Trying to keep table size
relatively evenly distributed would be useful also. With those
recommendations it will always be worth 0.1 sec investment in trying to
avoid 100+ sec sequential scan times. But again, you need to consider
your workload.

Currently, all child tables will be considered. It may be possible in
the future to pre-sort the list of child tables, so that optimization
time can be reduced for parent tables with large numbers of partitions.

Currently, there is no index on the pg_inherits system table. As a
result, parents with more than 1000 child tables are likely to
experience longer than desirable planning times for their queries.

Am I right that this is a general postgresql issue and has nothing to do
with CE ?

Yep.

CE checks will not currently recognise STABLE functions within a query.
So WHERE clauses such as
DateKey > CURRENT DATE
will not cause exclusion because CURRENT DATE is a STABLE function.

CE checks are not made when the parent table is involved in a join.

Is this also the case where parent table is inside subquery and that
subquery is involved in a join?

My comment was too terse. What I meant was that you can't do dynamic
exclusion based upon the results of a join. i.e. PPUC2

Or do I have to make sure that it is not lifted out of that subquery
using something like pl/pgsql function ?

Other existing restrictions on Inherited tables continue to apply.

WHat happens for multiple inheritance ?

CE works.

Is it detected and then also excluded from CE ?

No

Or is that just a "don't do it" item ?

Thats a religious debate.... I'll leave that one.

Best Regards, Simon Riggs

#9Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#7)
Re: [Bizgres-general] A Guide to Constraint Exclusion

On Thu, 2005-07-14 at 15:30 -0400, Greg Stark wrote:

Hannu Krosing <hannu@tm.ee> writes:

It is not yet possible to specify that an inheritance parent has no
rows, and, if so, should always be excluded from the query.

I think that a simple "CHECK(false)" constraint should be enough for
this.

huh, that's clever.

But not clever enough... see my reply to Hannu. Sorry.

CE checks are not made when the parent table is involved in a join.

Is this also the case where parent table is inside subquery and that
subquery is involved in a join?

I assume this is a TODO and just not in your first batch of work? It seems
like a pretty important piece eventually. Are there any fundamental
difficulties with handling joins eventually?

Its a reasonable size piece of work, and could not be reworked in time
for 8.1.

Some other mental leaps may also be required.

Best Regards, Simon Riggs

#10Luke Lonergan
llonergan@greenplum.com
In reply to: Simon Riggs (#8)
Re: [Bizgres-general] A Guide to Constraint Exclusion (

CE checks will not currently recognise STABLE functions within a query.
So WHERE clauses such as
DateKey > CURRENT DATE
will not cause exclusion because CURRENT DATE is a STABLE function.

CE checks are not made when the parent table is involved in a join.

Is this also the case where parent table is inside subquery and that
subquery is involved in a join?

My comment was too terse. What I meant was that you can't do dynamic
exclusion based upon the results of a join. i.e. PPUC2

Phew! Correlated subqueries won't CE then, but the more common complex
queries will. We'll test with some common ones soon.

- Luke

#11Hannu Krosing
hannu@tm.ee
In reply to: Simon Riggs (#8)
Re: [Bizgres-general] A Guide to Constraint Exclusion

On N, 2005-07-14 at 21:29 +0100, Simon Riggs wrote:

On Thu, 2005-07-14 at 00:13 +0300, Hannu Krosing wrote:

On K, 2005-07-13 at 11:53 +0100, Simon Riggs wrote:

We aren't able to exclude the parent table from the above query because
no Constraint was defined upon it. Since, in our example, the parent is
empty there will be little effect on the query performance. It would be
a mistake to attempt to get around this by placing a Constraint on the
parent, since that would then automatically be created on all child
tables also. So we can never exclude the parent without automatically
excluding *all* of the children also.

At least in 8.0 you can drop the inherited constraint from child table:

Yes, you can. But in discussion on -hackers on 20 May there was
agreement (OK, me agreeing with Tom) that being able to drop inherited
constraints was a bug (or at least a deprecated feature...)

Why is it a bug ?

Tom wrote on May 20:

I think a good argument can be made that the above behavior is a bug,
and that the ALTER command should have been rejected. We've gone to
great lengths to make sure you can't ALTER a child table to make it
incompatible with the parent in terms of the column names and types;
shouldn't this be true of check constraints as well?

There is a good fundamental reason why we dont let people drop columns
from children or to add them to parent ONLY: if we did not, then there
would be no way to query from the hierarchy.

I can't see any such reason for forbidding dropping constraints from
child tables or disallowing CREAETE CONSTRAINT ON parent ONLY , at least
not for CHECK constraints.

And even disallowing it can probably be circumvented by a clever
functional CHECK constraint, which checks for also the table it is
defined on.

OTOH, disallowing all this for child FOREIGN KEY's seems logical though.

OTOOH, to be symmetrical with previous, we should also have UNIQUE and
PK constraints that span all the inheritance hierarchy, but I don't want
to go there now :)

CE does not prevent direct access to one of the child tables in an
inheritance hierarchy. In this case, no exclusion test would be
performed. Exclusion tests are performed *only* when the parent table in
an inheritance hierarchy is accessed. Exclusion tests are performed even
if the inheritance hierarchy is many levels deep (e.g. parent-child-
grandchild). CE also supports multiple inheritance.

I'd like to see an option to ALWAYS do CE, inheritance or union (all) or
even simple queries.

I've argued against that, but I guess if there is enough opinion I can
be persuaded.

My argument is that most constraints look something like
CHECK ( salary > 0 and salary < 1000000)
If you ask a query like
select count(*) from emp where salary < -10000
then CE would work great. But you don't find many people asking such
nonsensical questions, or at least very often. Those queries do get
asked, but they seem either naive or "data profiling" type queries. So
my conclusion is that for most constraints and most queries, CE is
literally just a waste of planning time.

CE may have some use for automatically generated queries, but in general
I agree with you.

Btw, not just UNION ALL, but also simple UNION, INTERSECT and EXCEPT
could probably be taught to use CE at some stage.

Also CE could at some stage be used to drop the UNIQUEifying nodes from
UNION (without ALL), if it can prove that the UNION is already UNIQUE.

If a parent table has a Constraint defined upon it, then this will be
automatically copied to all child tables.

But they can be removed later if desired.

But should not be...

FOREIGN KEY constraints should not, but I think that simple CHECK's
could.

Have you done any performance testing, i.e. what is the actual impact of
CE on planning time ?

I think "bad" would be my assessment. But not "very bad". But this must
be offset by the extraordinary time saving in execution time.

Planning time depends upon the complexity of the query, the number and
complexity of the constraints and the number of child tables.

I think I would guess currently at child tables of around 1 GB, with
sizes as small as 100 MB being reasonable. Trying to keep table size
relatively evenly distributed would be useful also. With those
recommendations it will always be worth 0.1 sec investment in trying to
avoid 100+ sec sequential scan times. But again, you need to consider
your workload.

That's why I asked for GUC, not a default behaviour ;)

CE checks will not currently recognise STABLE functions within a query.
So WHERE clauses such as
DateKey > CURRENT DATE
will not cause exclusion because CURRENT DATE is a STABLE function.

CE checks are not made when the parent table is involved in a join.

Is this also the case where parent table is inside subquery and that
subquery is involved in a join?

My comment was too terse. What I meant was that you can't do dynamic
exclusion based upon the results of a join. i.e. PPUC2

but what about _static_ exlusion based on constraints ?

I mean if there is a left side table with say a single partition having
CHECK(id_order BETWEEN 1 AND 1000)
(either originally or left after eliminating other by other constraints)

and 3 right side partition with
CHECK(key_order BETWEEN 1 AND 1000)
CHECK(key_order BETWEEN 1001 AND 2000)
CHECK(key_order BETWEEN 2001 AND 3000)

then the 3rd one could be eliminated statically from a join on
id_order=key_order

...

WHat happens for multiple inheritance ?

CE works.

Is it detected and then also excluded from CE ?

No

Or is that just a "don't do it" item ?

Thats a religious debate.... I'll leave that one.

:)

Best Regards, Simon Riggs

And thanks for the good work so far!

--
Hannu Krosing <hannu@skype.net>

#12Simon Riggs
simon@2ndQuadrant.com
In reply to: Hannu Krosing (#11)
Re: [Bizgres-general] A Guide to Constraint Exclusion

On Fri, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote:

On N, 2005-07-14 at 21:29 +0100, Simon Riggs wrote:

On Thu, 2005-07-14 at 00:13 +0300, Hannu Krosing wrote:

On K, 2005-07-13 at 11:53 +0100, Simon Riggs wrote:

We aren't able to exclude the parent table from the above query because
no Constraint was defined upon it. Since, in our example, the parent is
empty there will be little effect on the query performance. It would be
a mistake to attempt to get around this by placing a Constraint on the
parent, since that would then automatically be created on all child
tables also. So we can never exclude the parent without automatically
excluding *all* of the children also.

At least in 8.0 you can drop the inherited constraint from child table:

Yes, you can. But in discussion on -hackers on 20 May there was
agreement (OK, me agreeing with Tom) that being able to drop inherited
constraints was a bug (or at least a deprecated feature...)

Why is it a bug ?

Tom wrote on May 20:

I think a good argument can be made that the above behavior is a bug,
and that the ALTER command should have been rejected. We've gone to
great lengths to make sure you can't ALTER a child table to make it
incompatible with the parent in terms of the column names and types;
shouldn't this be true of check constraints as well?

There is a good fundamental reason why we dont let people drop columns
from children or to add them to parent ONLY: if we did not, then there
would be no way to query from the hierarchy.

I can't see any such reason for forbidding dropping constraints from
child tables or disallowing CREAETE CONSTRAINT ON parent ONLY , at least
not for CHECK constraints.

If we allow DROPing them, why allow them to be inherited in the first
place? One or the other, not both.

I do still agree with Tom on that, but as I said at the time, I don't
see it as a big enough problem to spend time removing that feature. But
I personally wouldn't grow to rely on its existence either.

There are other ways...

And even disallowing it can probably be circumvented by a clever
functional CHECK constraint, which checks for also the table it is
defined on.

OTOH, disallowing all this for child FOREIGN KEY's seems logical though.

OTOOH, to be symmetrical with previous, we should also have UNIQUE and
PK constraints that span all the inheritance hierarchy, but I don't want
to go there now :)

Well, I am working towards that.

Best Regards, Simon Riggs

#13Simon Riggs
simon@2ndQuadrant.com
In reply to: Hannu Krosing (#11)
Re: [Bizgres-general] A Guide to Constraint Exclusion

On Fri, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote:

Btw, not just UNION ALL, but also simple UNION, INTERSECT and EXCEPT
could probably be taught to use CE at some stage.

It turns out that to solve this problem you very nearly have to solve
the "any table" problem. Thats an extra argument in favour of making
this work for any table.

That's why I asked for GUC, not a default behaviour ;)

but what about _static_ exlusion based on constraints ?

I mean if there is a left side table with say a single partition having
CHECK(id_order BETWEEN 1 AND 1000)
(either originally or left after eliminating other by other constraints)

and 3 right side partition with
CHECK(key_order BETWEEN 1 AND 1000)
CHECK(key_order BETWEEN 1001 AND 2000)
CHECK(key_order BETWEEN 2001 AND 3000)

then the 3rd one could be eliminated statically from a join on
id_order=key_order

Well, SQL allows you to express almost any query, but that doesn't mean
it is all 3 of: frequently occcuring, meaningful and interesting.

Have you ever seen such a construct?

I think we might be able to use equivalence to show that a restriction
on one table could be translated into a restriction on the Fact table.

e.g.

SELECT
FROM Fact, OtherTable
WHERE Fact.Key = OtherTable.Key
AND OtherTable.Key > 28000

But the harder and yet more common problem is where there is no direct
restriction on the equivalent join column. Thats the one I would
eventually seek to solve

e.g.

SELECT
FROM Fact, Dimension
WHERE Fact.Key = Dimension.Key
AND Dimension.DescriptiveField = 'Blah'

where there is a relationship between DescriptiveField and Key enshrined
within the Dimension table.

and thanks for the good work so far!

Thank you,

Best Regards, Simon Riggs

#14Luke Lonergan
llonergan@greenplum.com
In reply to: Simon Riggs (#9)
Re: [Bizgres-general] A Guide to Constraint Exclusion

I assume this is a TODO and just not in your first batch of work? It seems
like a pretty important piece eventually. Are there any fundamental
difficulties with handling joins eventually?

Its a reasonable size piece of work, and could not be reworked in time
for 8.1.

As you've said previously, when you refer to joins not benefiting from CE,
you mean data-driven joins, or joins whose predicates are dynamically
determined, right?

- Luke

#15Luke Lonergan
llonergan@greenplum.com
In reply to: Hannu Krosing (#11)
Re: [Bizgres-general] A Guide to Constraint Exclusion (

Hannu,

My comment was too terse. What I meant was that you can't do dynamic
exclusion based upon the results of a join. i.e. PPUC2

but what about _static_ exlusion based on constraints ?

I mean if there is a left side table with say a single partition having
CHECK(id_order BETWEEN 1 AND 1000)
(either originally or left after eliminating other by other constraints)

and 3 right side partition with
CHECK(key_order BETWEEN 1 AND 1000)
CHECK(key_order BETWEEN 1001 AND 2000)
CHECK(key_order BETWEEN 2001 AND 3000)

then the 3rd one could be eliminated statically from a join on
id_order=key_order

I would expect that the 2nd and 3rd partitions would be CE'ed if the basic
support is there. Does your current implementation include this capability
Simon? Or is it limited to use with constant predicates?

- Luke

#16Hannu Krosing
hannu@tm.ee
In reply to: Hannu Krosing (#11)
Re: [Bizgres-general] A Guide to Constraint Exclusion

On R, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote:

There is a good fundamental reason why we dont let people drop columns
from children or to add them to parent ONLY: if we did not, then there
would be no way to query from the hierarchy.

I can't see any such reason for forbidding dropping constraints from
child tables or disallowing CREAETE CONSTRAINT ON parent ONLY , at least
not for CHECK constraints.

And even disallowing it can probably be circumvented by a clever
functional CHECK constraint, which checks for also the table it is
defined on.

OTOH, disallowing all this for child FOREIGN KEY's seems logical though.

Well, having thought more about it, I don't think that row-level
constraints (CHECK, FK and NULL) should be restricted at all by
inheritance hierarchy.

Table-level constraints (UNIQUE and PK) should, but they need to be
fixed first.

What should be done with CHECK, FK and NULL is making a way for marking
them as being defined on some table 'ONLY' if created as such. It should
also be marked with ONLY when any of the constraints inherited from this
one is dropped from child. Then all direct ancestor copies should get
'ONLY' status so thet they can be dumped properly.

Or perhaps just disallow dropping constraints from children, but still
allow creating constraints on ONLY parent table, which seems a cleaner
solution.

Why I also think that allowing this for FK's is good, is that it allows
partitioning tables on both sides of FK relation even without global
UNIQUE/PK constraints, by partitioning both of them on same or
compatible boundaries. By compatible I mean here that we could partition
the PK table on say PK values with a step of 1M but FK table on values
with step of 100K, so that they both will have approximately the same
number of rows per partition and there will be 10 tables with FK
constraints pointing to the same PK partition.

What comes to Toms's May 20 argument that people would be surprised when
they select form a table whith check(i>0) constraint and get out i<0
then I think this is a question of education.
If they don't know better, they will be equally surprised when changing
a row in parent table actually changes a row in child table, or when a
row inserted into child shows up in parent.

The ONLY status of constraints should be accounted for in pg_dump so it
will work right and also shown in psql's \d output so it confuses users
less :)

--
Hannu Krosing <hannu@tm.ee>

#17Luke Lonergan
llonergan@greenplum.com
In reply to: Simon Riggs (#13)
Re: [Bizgres-general] A Guide to Constraint Exclusion (

Simon,

SELECT
FROM Fact, Dimension
WHERE Fact.Key = Dimension.Key
AND Dimension.DescriptiveField = 'Blah'

So, what happens with this:

SELECT
FROM Fact, Dimension
WHERE Fact.Key = Dimension.Key
AND Fact.part = 100;

With Fact defined with 3 partitions:
CHECK(Part BETWEEN 1 AND 1000)
CHECK(Part BETWEEN 1001 AND 2000)
CHECK(Part BETWEEN 2001 AND 3000)

Will the 2nd and 3rd partitions be eliminated?

- Luke

#18Hannu Krosing
hannu@tm.ee
In reply to: Hannu Krosing (#11)
Re: [Bizgres-general] A Guide to Constraint Exclusion

On R, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote:

but what about _static_ exlusion based on constraints ?

I mean if there is a left side table with say a single partition having
CHECK(id_order BETWEEN 1 AND 1000)
(either originally or left after eliminating other by other constraints)

and 3 right side partition with
CHECK(key_order BETWEEN 1 AND 1000)
CHECK(key_order BETWEEN 1001 AND 2000)
CHECK(key_order BETWEEN 2001 AND 3000)

then the 3rd one could be eliminated statically from a join on
id_order=key_order

the simplest form of this seems to be carrying checks to both ends of
joins before CE.

so for two partitioned tables "main" and "detail", and query

select *
from main m,
detail d
where m.id_main = d.key_main
and m.id_main in (1,7,42)

CE is done based on main.id_main in (1,7,42) and detail.key_mainin
(1,7,42)

Or perhaps this carrying over is already done automatically by postgres
planner before CE ?

--
Hannu Krosing <hannu@skype.net>

#19Simon Riggs
simon@2ndQuadrant.com
In reply to: Luke Lonergan (#17)
Re: [Bizgres-general] A Guide to Constraint Exclusion (

On Thu, 2005-07-14 at 15:16 -0700, Luke Lonergan wrote:

Simon,

SELECT
FROM Fact, Dimension
WHERE Fact.Key = Dimension.Key
AND Dimension.DescriptiveField = 'Blah'

So, what happens with this:

SELECT
FROM Fact, Dimension
WHERE Fact.Key = Dimension.Key
AND Fact.part = 100;

With Fact defined with 3 partitions:
CHECK(Part BETWEEN 1 AND 1000)
CHECK(Part BETWEEN 1001 AND 2000)
CHECK(Part BETWEEN 2001 AND 3000)

Will the 2nd and 3rd partitions be eliminated?

Yes.

That is because the clause "part = 100" is a direct restriction on the
partitioned table, using an immutable operator and a constant.

Best Regards, Simon Riggs

#20Simon Riggs
simon@2ndQuadrant.com
In reply to: Luke Lonergan (#15)
Re: [Bizgres-general] A Guide to Constraint Exclusion (

On Thu, 2005-07-14 at 15:07 -0700, Luke Lonergan wrote:

Hannu,

My comment was too terse. What I meant was that you can't do dynamic
exclusion based upon the results of a join. i.e. PPUC2

but what about _static_ exlusion based on constraints ?

I mean if there is a left side table with say a single partition having
CHECK(id_order BETWEEN 1 AND 1000)
(either originally or left after eliminating other by other constraints)

and 3 right side partition with
CHECK(key_order BETWEEN 1 AND 1000)
CHECK(key_order BETWEEN 1001 AND 2000)
CHECK(key_order BETWEEN 2001 AND 3000)

then the 3rd one could be eliminated statically from a join on
id_order=key_order

I would expect that the 2nd and 3rd partitions would be CE'ed if the basic
support is there. Does your current implementation include this capability
Simon?

No

Or is it limited to use with constant predicates?

Yes.

It's the first time I've thought to compare the constraint predicates on
joined tables based upon the join restriction. That's possible, but
would take some time to work out.

I've argued that such a construct is not common. I'm open to suggestions
about what *is* common...

Best Regards, Simon Riggs

#21Simon Riggs
simon@2ndQuadrant.com
In reply to: Hannu Krosing (#18)
#22Luke Lonergan
llonergan@greenplum.com
In reply to: Simon Riggs (#20)
#23Simon Riggs
simon@2ndQuadrant.com
In reply to: Hannu Krosing (#11)
#24Hannu Krosing
hannu@tm.ee
In reply to: Simon Riggs (#12)
#25Simon Riggs
simon@2ndQuadrant.com
In reply to: Hannu Krosing (#24)
#26Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Hannu Krosing (#16)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#1)
#28Luke Lonergan
llonergan@greenplum.com
In reply to: Tom Lane (#27)
#29Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#27)
#30Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#30)
#32Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#31)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#32)
#34Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Simon Riggs (#29)
#35Simon Riggs
simon@2ndQuadrant.com
In reply to: Ron Mayer (#34)
#36Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#33)
#37Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#36)