Declarative partitioning grammar
Hi all,
Many of you will have read the dynamic partitioning thread here:
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00028.php
I've proposed an alternative approach, which we've called declarative
partitioning which is grammar based. This grammar was developed by Jeff
Cohen at Greenplum with some assistance from myself. It is to be
completely open source.
The grammar will need some refinement and we'd appreciate comments,
suggestions, etc. The grammar is designed to address the range of use
cases we have out there.
Basics
------
CREATE TABLE is modified to accept a PARTITION BY clause. This clause
contains one or more partition declarations. The syntax is as follows:
PARTITION BY {partition_type} (column_name[, column_name...])
[PARTITIONS number]
(
partition_declaration[, partition_declaration...]
)
The partition type can be one of HASH, RANGE or LIST. The column names
are the partitioning key. The PARTITIONS sub clause instructs the system
on the number of partitions to create if we're doing HASH or, in the
case of LIST or RANGE can act as a safe guard for users who want to
ensure that they do not generate more than a certain number of
partitions.
We have discussed adding the partition type REFERENCE which is akin to
the LIKE clause in CREATE TABLE. That is, it duplicates the partition
configuration of the specified table. Input would be appreciated.
Partition declarations
----------------------
Hash
----
... PARTITION BY HASH(order_date) PARTITIONS 5;
This will create 5 partitions on the column order_date. Inserts will be
distributed roughly evenly across the 5 partitions.
List
----
... PARTITION BY LIST (state)
(PARTITION q1_northwest VALUES ('OR', 'WA'),
PARTITION q1_southwest VALUES ('AZ', 'UT', 'NM'),
PARTITION q1_northeast VALUES ('NY', 'VM', 'NJ'),
PARTITION q1_southeast VALUES ('FL', 'GA'),
PARTITION q1_northcentral VALUES ('SD', 'WI'),
PARTITION q1_southcentral VALUES ('OK', 'TX'));
Here, we produce 6 different partitions. The first partition groups
states in the North West of the USA. We introduce here the named
partition concept for clarity.
Range
-----
Range has the most expressive grammar. I'll introduce it in steps:
... PARTITION BY RANGE (b)
(
PARTITION aa start (date '2007-01-01') end (date '2008-01-01'),
PARTITION bb start (date '2008-01-01') end (date '2009-01-01')
);
Here, we create 2 partitions: aa and bb. Partition aa has the range
2007-01-01 to 2008-01-01; partition bb has the range 2008-01-01 to
2009-01-01. Intervals always have this problem: are the bounds included
in the range? To deal with this we define: the start of the range is
included in the range. The ending bound is not. This can be modified
with the keywords INCLUSIVE and EXCLUSIVE, which modify this property on
a rule by rule basis.
It is common that these partitions follow a pattern, such as following
every week, month or year. So, we support the following specification:
... PARTITION BY RANGE(order_date)
(
START (date '2005-12-01') end (date '2007-12-01')
EVERY(interval '2 months')
);
If we like, we can mix the specification a little:
... PARTITION BY RANGE(order_date)
( PARTITION Q1_2005 end (date '2005-04-01'),
PARTITION Q2_2005 end (date '2005-07-01'),
PARTITION Q3_2005 end (date '2005-10-10'),
PARTITION Q4_2005 end (date '2006-01-01'),
START (date '2006-02-01') end (date '2008-04-01')
EVERY (interval '2 weeks')
);
an interesting result of the flexibility of the grammar we've come up
with is that you can do something like this:
... PARTITION BY RANGE(order_date)
( PARTITION minny end date '2004-12-01'),
end (date '2006-12-01'),
PARTITION maxny start (date '2006-12-01')
);
Here, when order_date is less than 2004-12-01, we put the data in minny,
when it is between 2004-12-01 and 2006-12-01 we put it in an unnamed
partition and after this we put it in maxny.
Tablespaces
-----------
We allow inline tablespace specification, such as:
... PARTITION BY RANGE(order_date)
(
PARTITION minny TABLESPACE compress,
start (date '2004-12-01') end (date '2006-12-01') TABLESPACE hot,
PARTITION maxny TABLESPACE compress
);
I've used the term compress here intentionally. A number of operating
systems now ship file systems which can compress partitions. Users with
issues with the amount of data they want to keep online can delay the
time until they need new storage to a future date by compressing less
regularly used data with this technique, for a performance cost. Data
being used heavily can live on an uncompressed file system, affected.
Multi-column support
--------------------
We can have multi-column partitions.
... PARTITION BY LIST (state, deptno)
(
VALUES ('OR', 1, 'WA', 1),
VALUES ('AZ', 1, 'UT', 1, 'NM', 1),
VALUES ('OR', 2, 'WA', 2),
VALUES ('AZ', 2, 'UT', 2, 'NM', 2),
PARTITION region_null VALUES (NULL, NULL),
PARTITION region_other
);
Looking at this syntax now, I think I prefer:
VALUES ('OR', 1),('WA', 1)
To specify keys for the same partition. Thoughts?
Composite partition support
---------------------------
Given that we're talking about systems with potentially very large
amounts of data, power users may want to combine range partitioning with
hash or list partitioning. For example, your analysis might always be on
a date range but also be broken down by sales office. So, this would
combine range and list partitioning (if the sales offices were known) or
hash partitioning (if they weren't known).
To do this, we introduce the SUBPARTITION clause:
... PARTITION BY RANGE(order_date) SUBPARTITION BY HASH (office_id)
SUBPARTITIONS 8
(
start (date '2005-12-01') end (date '2007-12-01')
every (interval '3 months'),
start (date '2007-12-01')
end (date '2008-12-01') every (interval '1 month')
);
The first partition specification covers 8 partitions, the second 12 for
20 partitions in total. Once we add the subpartitioning we have 160
partitions in total (20 * 8).
Subpartitioning by list can look like this (see templates below):
... PARTITION BY RANGE(order_date) SUBPARTITION BY LIST (customer_id)
(
partition minny (subpartition c1 values (1), subpartition
c2 values (2)),
start (date '2004-12-01') end (date '2006-12-01')
(subpartition c1 values (1), subpartition c2 values (2)),
partition maxy (values (1), values (2)
)
So, the list parameters of each sub partition look like arguments to the
primary partition. Again, see templates below if you think this looks
cumbersome.
We do not preclude subpartitions of subpartitions. So, the following is
valid:
... PARTITION BY HASH(b)
PARTITIONS 2
SUBPARTITION BY HASH(d)
SUBPARTITIONS 2,
SUBPARTITION BY HASH(e) SUBPARTITIONS 2,
SUBPARTITION BY HASH(f) SUBPARTITIONS 2,
SUBPARTITION BY HASH(g) SUBPARTITIONS 2,
SUBPARTITION BY HASH(h) SUBPARTITIONS 2;
Subpartition templates
----------------------
There are times we want subpartitions to be laid out in a specific way
for all partitions. To do this, we use templates:
... PARTITION BY RANGE (order_date)
SUBPARTITION BY LIST (state)
SUBPARTITION TEMPLATE
(
SUBPARTITION northwest VALUES ('OR', 'WA'),
SUBPARTITION southwest VALUES ('AZ', 'UT', 'NM'),
SUBPARTITION northeast VALUES ('NY', 'VM', 'NJ'),
SUBPARTITION southeast VALUES ('FL', 'GA'),
SUBPARTITION northcentral VALUES ('SD', 'WI'),
SUBPARTITION southcentral VALUES ('OK', 'TX')
)
(start (date '2001-01-01') end (date '2010-01-01')
every (interval '3 months')
)
For each of the 36 odd partitions we create here, each is subpartitioned
into geographical areas.
Data management with ALTER
--------------------------
These are all arguments to ALTER TABLE. All of these require validation
against the existing specification.
ADD
---
For range and list partitioning, it's important to be able to add
partitions for data not covered by the existing specification. So, we
propose:
... ADD PARTITION q1_2008 end (date '2008-04-01')
COALESCE (maybe)
----------------
For hash partitions, remove a partition from the number of hash
partitions and distribute its data to the remaining partitions.
... COALESCE PARTITION [name];
I'm not sure if this is really used but other systems we looked at have
it. Thoughts?
DROP
----
For list and range partitions, drop a specified partition from the set
of partitions.
... DROP PARTITION minny;
This drops a named partition. Often, it will be difficult for users to
know partition names, and they might be unnamed. So, we allow this
syntax:
... DROP PARTITION FOR(date '2007-01-01');
for range partitions; and:
... DROP PARTITION FOR(VALUES('CA'));
for list partitions.
We've also discussed something like:
... DROP PARTITION FOR(POSITION(1));
so that users can easily drop a specific partition in an array of range
partitions. It seems to me, though, that the use case is generally to
drop the oldest partition so perhaps we should have a more explicit
syntax. Thoughts?
EXCHANGE
--------
This sub-clause allows us to make a table a partition in a set of
partitions or take a partition out of a set but keep it as a table. IBM
uses ATTACH and DETACH, which is explicit but Oracle uses EXCHANGE. I'll
explain the latter:
... EXCHANGE <partition identifier> WITH TABLE <table name>
partition identifier is one of PARTITION <name> or PARTITION FOR(...).
The partition in the partition set 'becomes' the table <table name> and
vice-versa. Essentially, we'd swap the relfilenodes. This means that we
have to first ADD PARTITION then swap the table and the partition.
Thoughts?
MERGE
-----
You can merge and list partitions and any two range partitions:
... MERGE <partition id>, <partition id> [INTO PARTITION <partition name>]
For range partitions:
... MERGE PARTITION FOR(date '2006_01_01'), PARTITION FOR(date '2007-01-01');
For list partitions:
... MERGE PARTITION FOR(VALUES('CA', 'MA')
This begs the question of why we have COALESCE for hash partitioning. I
don't know, it just seems like the right thing since you can't merge two
hash partitions together (well, you shouldn't want to).
RENAME
------
Rename a partition. We can use partition name or FOR clause.
SPLIT
-----
Split is used to divide a partition in two. It is designed for list and
range partitioning but I guess we could/should support hash. I need to
think about that. For RANGE partitions:
... SPLIT <partition id> <AT-clause> [INTO (PARTITION <partition name1>,
PARTITION <partition name2>)];
AT clause specifies the point at which the partition is split in two:
... SPLIT PARTITION FOR(2000) AT 1000 INTO PARTITION (part1000,
part2000)
We might want ways to do this with unnamed partitions, it seems to me.
Thoughts?
For list:
... SPLIT PARTITION region_east AT( VALUES ('CT', 'MA', 'MD') )
INTO
(
PARTITION region_east_1,
PARTITION region_east_2
);
In this case, values from region_east specified in the AT() list are put in
region_east_1 and the rest are put in region_east_2.
I think a better way for supporting split with hash is via ADD. I'm sure
some people think that ugly so I'd like feedback.
TRUNCATE
--------
Truncate a specified partition:
... TRUNCATE PARTITION FOR ('2005-01-01')
We could specify a name too.
This will use truncate internally.
Gavin Sherry wrote:
CREATE TABLE is modified to accept a PARTITION BY clause. This clause
contains one or more partition declarations. The syntax is as follows:
PARTITION BY {partition_type} (column_name[, column_name...])
[PARTITIONS number]
(
partition_declaration[, partition_declaration...])
The partition type can be one of HASH, RANGE or LIST.
What would be the drawbacks of
CREATE TABLE tablename(...)
PARTITION BY function_taking_row_returning_partition_name
instead of the explicit types?
It seems that with my own function I could pretty easily emulate
the HASH,RANGE,or LIST types. It seems a function returning a
partition name would largely avoid the need for the sub-partition stuff
too -- at least for the cases when the only reason you wanted
sub-partitions was for composite partition support.
I'm not sure if a function would give more flexibility, but
it sure seems it'd be easier for me to remember than the various
PARTITION BY LIST (a) (
VALUES ('L') SUBPARTITION BY RANGE (b) (VALUES('x'),VALUES('y')),
VALUES ('M') SUBPARTITION BY RANGE (b) (VALUES('u'),VALUES('t')))
or whowever it'd look.
Pardon my ignorance as I've never actually used partitioning before but
plan to in the near future, but couldn't the grammar resemble a common
WHERE clause more closely?
... PARTITION BY RANGE(order_date)
(
START (date '2005-12-01') end (date '2007-12-01')
EVERY(interval '2 months')
);
PARTITION BY RANGE(order_date) ( WHERE order_date >= '2005-12-01' AND order_date < '2007-12-01' EVERY interval '2 months' )
OR
PARTITION BY RANGE(order_date) ( WHERE order_date BETWEEN '2005-12-01' AND '2007-12-01' )
Of course using '>','>=','<','<=' instead of start/end eliminates
any ambiguity along with the need for INCLUSIVE/EXCLUSIVE.
... PARTITION BY LIST (state, deptno)
(
VALUES ('OR', 1, 'WA', 1),
VALUES ('AZ', 1, 'UT', 1, 'NM', 1),
VALUES ('OR', 2, 'WA', 2),
VALUES ('AZ', 2, 'UT', 2, 'NM', 2),
PARTITION region_null VALUES (NULL, NULL),
PARTITION region_other
);
PARTITION BY LIST (state,deptno) (
PARTITION one WHERE state in ('OR', WA') AND deptno = 1
PARTITION two WHERE state in ('AZ', UT') AND deptno IN (1,2)
PARTITION region_null WHERE state is null OR deptno is NULL
PARTITION region_other
);
Do you even need to list the columns in the PARTITION BY part?
PARTITION BY LIST (
PARTITION one WHERE state in ('OR', WA') AND deptno = 1
PARTITION two WHERE state in ('AZ', UT') AND deptno IN (1,2)
PARTITION region_null WHERE state is null OR deptno is NULL
PARTITION region_other
);
Is there really a reason to not have a named partition as well? Sure it
saves a few keystrokes, but it makes trying to do anything with them at
a later date that much more difficult.
Your originally suggested grammar might be shorter to type, but using
WHERE clause syntax we are all familiar with seems a lot more intuitive
to me on the surface at least. Why not try to reuse grammar that already
exists as much as possible?
On Sat, 2008-01-12 at 00:19 +0100, Gavin Sherry wrote:
CREATE TABLE is modified to accept a PARTITION BY clause. This clause
contains one or more partition declarations. The syntax is as follows:PARTITION BY {partition_type} (column_name[, column_name...])
[PARTITIONS number]
(
partition_declaration[, partition_declaration...])
List
----... PARTITION BY LIST (state)
(PARTITION q1_northwest VALUES ('OR', 'WA'),
PARTITION q1_southwest VALUES ('AZ', 'UT', 'NM'),
PARTITION q1_northeast VALUES ('NY', 'VM', 'NJ'),
PARTITION q1_southeast VALUES ('FL', 'GA'),
PARTITION q1_northcentral VALUES ('SD', 'WI'),
PARTITION q1_southcentral VALUES ('OK', 'TX'));Here, we produce 6 different partitions. The first partition groups
states in the North West of the USA. We introduce here the named
partition concept for clarity.Range
-----Range has the most expressive grammar. I'll introduce it in steps:
... PARTITION BY RANGE (b)
(
PARTITION aa start (date '2007-01-01') end (date '2008-01-01'),
PARTITION bb start (date '2008-01-01') end (date '2009-01-01')
);Here, we create 2 partitions: aa and bb. Partition aa has the range
2007-01-01 to 2008-01-01; partition bb has the range 2008-01-01 to
2009-01-01. Intervals always have this problem: are the bounds included
in the range? To deal with this we define: the start of the range is
included in the range. The ending bound is not. This can be modified
with the keywords INCLUSIVE and EXCLUSIVE, which modify this property on
a rule by rule basis.It is common that these partitions follow a pattern, such as following
every week, month or year. So, we support the following specification:... PARTITION BY RANGE(order_date)
(
START (date '2005-12-01') end (date '2007-12-01')
EVERY(interval '2 months')
);If we like, we can mix the specification a little:
... PARTITION BY RANGE(order_date)
( PARTITION Q1_2005 end (date '2005-04-01'),
PARTITION Q2_2005 end (date '2005-07-01'),
PARTITION Q3_2005 end (date '2005-10-10'),
PARTITION Q4_2005 end (date '2006-01-01'),
START (date '2006-02-01') end (date '2008-04-01')
EVERY (interval '2 weeks')
);an interesting result of the flexibility of the grammar we've come up
with is that you can do something like this:... PARTITION BY RANGE(order_date)
( PARTITION minny end date '2004-12-01'),
end (date '2006-12-01'),
PARTITION maxny start (date '2006-12-01')
);Here, when order_date is less than 2004-12-01, we put the data in minny,
when it is between 2004-12-01 and 2006-12-01 we put it in an unnamed
partition and after this we put it in maxny.Tablespaces
-----------We allow inline tablespace specification, such as:
... PARTITION BY RANGE(order_date)
(
PARTITION minny TABLESPACE compress,
start (date '2004-12-01') end (date '2006-12-01') TABLESPACE hot,
PARTITION maxny TABLESPACE compress
);I've used the term compress here intentionally. A number of operating
systems now ship file systems which can compress partitions. Users with
issues with the amount of data they want to keep online can delay the
time until they need new storage to a future date by compressing less
regularly used data with this technique, for a performance cost. Data
being used heavily can live on an uncompressed file system, affected.Multi-column support
--------------------We can have multi-column partitions.
... PARTITION BY LIST (state, deptno)
(
VALUES ('OR', 1, 'WA', 1),
VALUES ('AZ', 1, 'UT', 1, 'NM', 1),
VALUES ('OR', 2, 'WA', 2),
VALUES ('AZ', 2, 'UT', 2, 'NM', 2),
PARTITION region_null VALUES (NULL, NULL),
PARTITION region_other
);Looking at this syntax now, I think I prefer:
VALUES ('OR', 1),('WA', 1)
To specify keys for the same partition. Thoughts?
Composite partition support
---------------------------Given that we're talking about systems with potentially very large
amounts of data, power users may want to combine range partitioning with
hash or list partitioning. For example, your analysis might always be on
a date range but also be broken down by sales office. So, this would
combine range and list partitioning (if the sales offices were known) or
hash partitioning (if they weren't known).To do this, we introduce the SUBPARTITION clause:
... PARTITION BY RANGE(order_date) SUBPARTITION BY HASH (office_id)
SUBPARTITIONS 8
(
start (date '2005-12-01') end (date '2007-12-01')
every (interval '3 months'),
start (date '2007-12-01')
end (date '2008-12-01') every (interval '1 month')
);The first partition specification covers 8 partitions, the second 12 for
20 partitions in total. Once we add the subpartitioning we have 160
partitions in total (20 * 8).Subpartitioning by list can look like this (see templates below):
... PARTITION BY RANGE(order_date) SUBPARTITION BY LIST (customer_id)
(
partition minny (subpartition c1 values (1), subpartition
c2 values (2)),
start (date '2004-12-01') end (date '2006-12-01')
(subpartition c1 values (1), subpartition c2 values (2)),
partition maxy (values (1), values (2)
)So, the list parameters of each sub partition look like arguments to the
primary partition. Again, see templates below if you think this looks
cumbersome.We do not preclude subpartitions of subpartitions. So, the following is
valid:... PARTITION BY HASH(b)
PARTITIONS 2
SUBPARTITION BY HASH(d)
SUBPARTITIONS 2,
SUBPARTITION BY HASH(e) SUBPARTITIONS 2,
SUBPARTITION BY HASH(f) SUBPARTITIONS 2,
SUBPARTITION BY HASH(g) SUBPARTITIONS 2,
SUBPARTITION BY HASH(h) SUBPARTITIONS 2;Subpartition templates
----------------------There are times we want subpartitions to be laid out in a specific way
for all partitions. To do this, we use templates:... PARTITION BY RANGE (order_date)
SUBPARTITION BY LIST (state)
SUBPARTITION TEMPLATE
(
SUBPARTITION northwest VALUES ('OR', 'WA'),
SUBPARTITION southwest VALUES ('AZ', 'UT', 'NM'),
SUBPARTITION northeast VALUES ('NY', 'VM', 'NJ'),
SUBPARTITION southeast VALUES ('FL', 'GA'),
SUBPARTITION northcentral VALUES ('SD', 'WI'),
SUBPARTITION southcentral VALUES ('OK', 'TX')
)
(start (date '2001-01-01') end (date '2010-01-01')
every (interval '3 months')
)For each of the 36 odd partitions we create here, each is subpartitioned
into geographical areas.Data management with ALTER
--------------------------These are all arguments to ALTER TABLE. All of these require validation
against the existing specification.ADD
---For range and list partitioning, it's important to be able to add
partitions for data not covered by the existing specification. So, we
propose:... ADD PARTITION q1_2008 end (date '2008-04-01')
COALESCE (maybe)
----------------For hash partitions, remove a partition from the number of hash
partitions and distribute its data to the remaining partitions.... COALESCE PARTITION [name];
I'm not sure if this is really used but other systems we looked at have
it. Thoughts?DROP
----For list and range partitions, drop a specified partition from the set
of partitions.... DROP PARTITION minny;
This drops a named partition. Often, it will be difficult for users to
know partition names, and they might be unnamed. So, we allow this
syntax:... DROP PARTITION FOR(date '2007-01-01');
for range partitions; and:
... DROP PARTITION FOR(VALUES('CA'));
for list partitions.
We've also discussed something like:
... DROP PARTITION FOR(POSITION(1));
so that users can easily drop a specific partition in an array of range
partitions. It seems to me, though, that the use case is generally to
drop the oldest partition so perhaps we should have a more explicit
syntax. Thoughts?EXCHANGE
--------This sub-clause allows us to make a table a partition in a set of
partitions or take a partition out of a set but keep it as a table. IBM
uses ATTACH and DETACH, which is explicit but Oracle uses EXCHANGE. I'll
explain the latter:... EXCHANGE <partition identifier> WITH TABLE <table name>
partition identifier is one of PARTITION <name> or PARTITION FOR(...).
The partition in the partition set 'becomes' the table <table name> and
vice-versa. Essentially, we'd swap the relfilenodes. This means that we
have to first ADD PARTITION then swap the table and the partition.
Thoughts?MERGE
-----You can merge and list partitions and any two range partitions:
... MERGE <partition id>, <partition id> [INTO PARTITION <partition name>]
For range partitions:
... MERGE PARTITION FOR(date '2006_01_01'), PARTITION FOR(date '2007-01-01');
For list partitions:
... MERGE PARTITION FOR(VALUES('CA', 'MA')
This begs the question of why we have COALESCE for hash partitioning. I
don't know, it just seems like the right thing since you can't merge two
hash partitions together (well, you shouldn't want to).RENAME
------Rename a partition. We can use partition name or FOR clause.
SPLIT
-----Split is used to divide a partition in two. It is designed for list and
range partitioning but I guess we could/should support hash. I need to
think about that. For RANGE partitions:... SPLIT <partition id> <AT-clause> [INTO (PARTITION <partition name1>,
PARTITION <partition name2>)];AT clause specifies the point at which the partition is split in two:
... SPLIT PARTITION FOR(2000) AT 1000 INTO PARTITION (part1000,
part2000)We might want ways to do this with unnamed partitions, it seems to me.
Thoughts?For list:
... SPLIT PARTITION region_east AT( VALUES ('CT', 'MA', 'MD') )
INTO
(
PARTITION region_east_1,
PARTITION region_east_2
);In this case, values from region_east specified in the AT() list are put in
region_east_1 and the rest are put in region_east_2.I think a better way for supporting split with hash is via ADD. I'm sure
some people think that ugly so I'd like feedback.TRUNCATE
--------Truncate a specified partition:
... TRUNCATE PARTITION FOR ('2005-01-01')
We could specify a name too.
This will use truncate internally.
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
--
Mike <ipso@snappymail.ca>
On Jan 11, 2008 3:42 PM, Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote:
What would be the drawbacks of
CREATE TABLE tablename(...)
PARTITION BY function_taking_row_returning_partition_name
instead of the explicit types?
Would that still allow the optimizer to work as well as it could? It
seems that an arbitrary map like that can't be optimized very well as
it might be too general.
wt
On Jan 11, 2008, at 4:16 PM, Warren Turkal wrote:
On Jan 11, 2008 3:42 PM, Ron Mayer <rm_pg@cheapcomplexdevices.com>
wrote:What would be the drawbacks of
CREATE TABLE tablename(...)
PARTITION BY function_taking_row_returning_partition_name
instead of the explicit types?Would that still allow the optimizer to work as well as it could? It
seems that an arbitrary map like that can't be optimized very well as
it might be too general.
We did look at allowing general functions for partitioning and this
was one concern. The other is that we want to enforce that a row
only gets inserted into a single partition, so we wanted a
declarative syntax where it was relatively easy to check that range
and list specifications don't overlap.
kind regards,
Jeff Cohen
On Jan 11, 2008 6:19 PM, Gavin Sherry <swm@alcove.com.au> wrote:
Hi all,
Many of you will have read the dynamic partitioning thread here:
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00028.php
I've proposed an alternative approach, which we've called declarative
partitioning which is grammar based. This grammar was developed by Jeff
Cohen at Greenplum with some assistance from myself. It is to be
completely open source.
I like the syntax, but what I really want to know is how well this is
going to work with the query planner. The current 8.x table
partitioning mechanism has a lot of issues...it causes more problems
than it solves unless you are willing to strictly control how you
query the tables...I usually end up rolling my own.
Are you confident that what you propose will integrate well with the
planner and produce (as much as possible) fully optimized queries?
merlin
On Fri, Jan 11, 2008 at 07:46:36PM -0500, Merlin Moncure wrote:
On Jan 11, 2008 6:19 PM, Gavin Sherry <swm@alcove.com.au> wrote:
Hi all,
Many of you will have read the dynamic partitioning thread here:
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00028.php
I've proposed an alternative approach, which we've called declarative
partitioning which is grammar based. This grammar was developed by Jeff
Cohen at Greenplum with some assistance from myself. It is to be
completely open source.I like the syntax, but what I really want to know is how well this is
going to work with the query planner. The current 8.x table
partitioning mechanism has a lot of issues...it causes more problems
than it solves unless you are willing to strictly control how you
query the tables...I usually end up rolling my own.
The syntax is half the problem, performance is the other. I will bring
the performance issues up in another thread. Yes, we are confident that
we can address the performance issues that rule out the existing
partitioning for many applications. We need it for our own stuff! :P
Thanks,
Gavin
On Jan 11, 2008, at 4:03 PM, Mike wrote:
Pardon my ignorance as I've never actually used partitioning before
but
plan to in the near future, but couldn't the grammar resemble a common
WHERE clause more closely?
Hi Mike,
Thanks for your suggestions. The current syntax we chose is similar
to syntax used by IBM, Oracle, and mysql, so it is familiar to folks
who have used partitioning with other databases. A WHERE clause
would of course be understandable by everyone, but it makes error
checking more difficult, since we want to ensure that partition
specifications don't overlap. In order to make such error checking
feasible, we would have to restrict the set of predicates you can use
in the WHERE clause, so it wouldn't be completely general anyway.
Is there really a reason to not have a named partition as well?
Sure it
saves a few keystrokes, but it makes trying to do anything with
them at
a later date that much more difficult.
For the case of partition by HASH, you can just specify the number of
buckets, so it might not be meaningful to name the partitions. For
range partitions, many users perform "rolling upgrades" on a regular
basis, where they drop the oldest data and add a new partition with
the latest data, so they might just refer to partitions by
"position" (either an ordinal number or using a keyword like FIRST/
LAST).
kind regards,
Jeff
Hi,
I've proposed an alternative approach, which we've called declarative
partitioning which is grammar based. This grammar was developed by Jeff
Cohen at Greenplum with some assistance from myself. It is to be
completely open source.
<..>
FWIW, I had done some very initial work on declarative partitioning (no
where as exhaustive as this proposal) and submitted a wip patch here:
http://momjian.us/mhonarc/patches_hold/msg00006.html
Kindly take a look at the patch, to see if would be useful to you folks in
any way.
<..>
Range
-----
Range has the most expressive grammar. I'll introduce it in steps:
... PARTITION BY RANGE (b)
(
PARTITION aa start (date '2007-01-01') end (date '2008-01-01'),
PARTITION bb start (date '2008-01-01') end (date '2009-01-01')
);
It is common that these partitions follow a pattern, such as following
every week, month or year. So, we support the following specification:
... PARTITION BY RANGE(order_date)
(
START (date '2005-12-01') end (date '2007-12-01')
EVERY(interval '2 months')
);
<..>
It will be interesting to see how this start,end, interval usage accomodates
data types other than dates. I hope, this specification is not
influenced overlty just by dates-like partitions.
<..>
ADD
---
For range and list partitioning, it's important to be able to add
partitions for data not covered by the existing specification. So, we
propose:
... ADD PARTITION q1_2008 end (date '2008-04-01')
<..>
What about data that does not match any existing partition specification? It
might make sense to have a dummy partition which handles all these cases.
<..>
DROP
----
For list and range partitions, drop a specified partition from the set
of partitions.
... DROP PARTITION minny;
This drops a named partition. Often, it will be difficult for users to
know partition names, and they might be unnamed. So, we allow this
syntax:
... DROP PARTITION FOR(date '2007-01-01');
for range partitions; and:
... DROP PARTITION FOR(VALUES('CA'));
for list partitions.
We've also discussed something like:
... DROP PARTITION FOR(POSITION(1));
so that users can easily drop a specific partition in an array of range
partitions. It seems to me, though, that the use case is generally to
drop the oldest partition so perhaps we should have a more explicit
syntax. Thoughts?
<..>
Surely, the partitions will get (default, parent inferred) names when they
get created? Do we expect the users to remember FOR() specifications like
the ones mentioned above? It might make sense to have a "\d in psql" e.g to
present a parent with all its named partitions alongwith the partition
clauses to facilitate drop partition using partition names.
<..>
EXCHANGE
--------
This sub-clause allows us to make a table a partition in a set of
partitions or take a partition out of a set but keep it as a table. IBM
uses ATTACH and DETACH, which is explicit but Oracle uses EXCHANGE. I'll
explain the latter:
... EXCHANGE <partition identifier> WITH TABLE <table name>
partition identifier is one of PARTITION <name> or PARTITION FOR(...).
The partition in the partition set 'becomes' the table <table name> and
vice-versa. Essentially, we'd swap the relfilenodes. This means that we
have to first ADD PARTITION then swap the table and the partition.
Thoughts?
<..>
Surely this wont be instantaneous?
<..>
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
Hi,
We did look at allowing general functions for partitioning and this
was one concern. The other is that we want to enforce that a row
only gets inserted into a single partition, so we wanted a
declarative syntax where it was relatively easy to check that range
and list specifications don't overlap.
Detection of mutually exclusive ranges might not turn out to be so easy
afterall. I think there is some code in the constraint_exclusion area which
might help out in this.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
Hi,
On Jan 12, 2008 6:29 AM, Gavin Sherry <swm@alcove.com.au> wrote:
The syntax is half the problem, performance is the other. I will bring
the performance issues up in another thread. Yes, we are confident that
we can address the performance issues that rule out the existing
partitioning for many applications. We need it for our own stuff! :P
Agreed, syntax is just the sugar.
Also other than performance, how are updates involving partition keys
causing the resultant tuple to end up in a new partition handled here?
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
Jeff Cohen wrote:
In order to make such error checking
feasible, we would have to restrict the set of predicates you can use
in the WHERE clause, so it wouldn't be completely general anyway.
Well, with an extensible system such as PostgreSQL you will need to have a
partitioning scheme that can deal with extensions. Perhaps people want to
partition by XML, GIS, text-search data, or whatever someone might come up
with in the future.
One possible way to achieve that might be to redefine your concepts of hash,
list, and range in terms of operator classes (or operator families or other
operator structures?). Those have well-defined properties as to how the
operators behave relative to each other, so checking the partition
definitions for mutual exclusivity and other properties would be possible.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
On Sat, 2008-01-12 at 01:59 +0100, Gavin Sherry wrote:
The syntax is half the problem, performance is the other.
The syntax looks great to me, but I think it is about 5% of the problem,
maybe less. I don't really have any questions about the syntax, but I
may have thoughts when the implementation details emerge.
I'm not sure you'll be able to use PARTITION BY since its part of the
SQL Standard for Windowed grouping, which we do hope to implement one
day. It will be confusing to have two completely separate meanings for
the one phrase in our grammar.
The burning questions from my perspective are:
What is a partition?
How will the syntax be implemented within the backend?
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
On Sat, Jan 12, 2008 at 05:47:30PM +0000, Simon Riggs wrote:
On Sat, 2008-01-12 at 01:59 +0100, Gavin Sherry wrote:
The syntax is half the problem, performance is the other.
The syntax looks great to me, but I think it is about 5% of the problem,
maybe less. I don't really have any questions about the syntax, but I
may have thoughts when the implementation details emerge.
Yes, that's for another thread. Since the discussion was abot using
grammar to control partitions I wanted to get some grammar out. More
details on other stuff soon.
I'm not sure you'll be able to use PARTITION BY since its part of the
SQL Standard for Windowed grouping, which we do hope to implement one
day. It will be confusing to have two completely separate meanings for
the one phrase in our grammar.
I think it's fine. It doesn't cause conflicts in the grammar (in fact,
the Greenplum grammar implements both meanings right now with no
confusion).
Thanks,
Gavin
Hi,
Jeff Cohen wrote:
We did look at allowing general functions for partitioning and this was
one concern. The other is that we want to enforce that a row only gets
inserted into a single partition, so we wanted a declarative syntax
where it was relatively easy to check that range and list specifications
don't overlap.
Why do you need to define a split point so ambiguously at all? Why not
just give the DBA exactly *one* place to define the split point?
I don't think the separation into list, hash and range partitioning is
adequate. What is the system supposed to do, if you try to insert a row
which doesn't fit any of the values in your list or doesn't fit any of
the ranges you defined?
I prefer a partitioning grammar which doesn't interfere with
constraints. We all know how to define constraints. Please don't
introduce a new, ambiguous way. A partitioning definition should be able
to tell the target partition for *every* row which satisfies the
constraints (the real ones, not ambiguous ones).
IMO, a single DDL command should only touch a single split point, i.e.
split a table into two partitions, move the split point or remove the
split point (joining the partitions again). Those are the only basic
commands you need to be able to handle partitioning.
Sorry, but for my taste, the proposed grammar is too long per command,
not flexible enough and instead ambiguous for split points as well as
for constraints. To me it looks like repeating the mistakes of others.
Regards
Markus
On Sat, Jan 12, 2008 at 04:01:19PM +0530, NikhilS wrote:
Hi,
We did look at allowing general functions for partitioning and this
was one concern. The other is that we want to enforce that a row
only gets inserted into a single partition, so we wanted a
declarative syntax where it was relatively easy to check that range
and list specifications don't overlap.Detection of mutually exclusive ranges might not turn out to be so easy
afterall. I think there is some code in the constraint_exclusion area which
might help out in this.
In some prototyping code it didn't seem too difficult but if we've made
a mistake we might have to look at the CE code.
Thanks,
Gavin
Ühel kenal päeval, E, 2008-01-14 kell 10:49, kirjutas Markus
Schiltknecht:
Hi,
Jeff Cohen wrote:
We did look at allowing general functions for partitioning and this was
one concern. The other is that we want to enforce that a row only gets
inserted into a single partition, so we wanted a declarative syntax
where it was relatively easy to check that range and list specifications
don't overlap.Why do you need to define a split point so ambiguously at all? Why not
just give the DBA exactly *one* place to define the split point?I don't think the separation into list, hash and range partitioning is
adequate. What is the system supposed to do, if you try to insert a row
which doesn't fit any of the values in your list or doesn't fit any of
the ranges you defined?
I guess it would go to some "default" partition ?
...
IMO, a single DDL command should only touch a single split point, i.e.
split a table into two partitions, move the split point or remove the
split point (joining the partitions again). Those are the only basic
commands you need to be able to handle partitioning.
sure, but this can become really tedious for 1024 partitions, not to
mention hard for optimiser.
Sorry, but for my taste, the proposed grammar is too long per command,
not flexible enough and instead ambiguous for split points as well as
for constraints. To me it looks like repeating the mistakes of others.
what mistakes ?
-----------------
Hannu
On Jan 14, 2008, at 1:49 AM, Markus Schiltknecht wrote:
I don't think the separation into list, hash and range partitioning
is adequate. What is the system supposed to do, if you try to
insert a row which doesn't fit any of the values in your list or
doesn't fit any of the ranges you defined?
Hi Markus,
If you don't define a "default" partition to handle outliers, the
insert should fail with an error.
I prefer a partitioning grammar which doesn't interfere with
constraints. We all know how to define constraints. Please don't
introduce a new, ambiguous way. A partitioning definition should be
able to tell the target partition for *every* row which satisfies
the constraints (the real ones, not ambiguous ones).IMO, a single DDL command should only touch a single split point,
i.e. split a table into two partitions, move the split point or
remove the split point (joining the partitions again). Those are
the only basic commands you need to be able to handle partitioning.
I can certainly appreciate the simplicity of this approach. It lets
us use a generic check constraint to perform partitioning, so it is
more general than partitioning using hash, list, and range. However,
it achieves this generality at the expense of usability for typical
customer cases. For example, let's look at the case of a table of 1
year of sales data, where we want to create 12 partitions -- one for
each month.
With the generic approach, you start with a single table, and start
by splitting it into two six-month partitions:
ALTER TABLE sales
SPLIT where sales_date > date '2007-06-01'
INTO
(
PARTITION first_half
PARTITION second_half
);
We could implement this approach using check constraints and table
inheritance: the partition second_half is a child table where
sales_date > date '2007-06-01', and the partition first_half has the
complementary constraint NOT(sales_date > date '2007-06-01').
Next, you split each partition:
ALTER TABLE sales
SPLIT PARTITION first_half where sales_date > date '2007-03-01'
INTO
(
PARTITION first_quarter
PARTITION second_quarter
);
So now the child table for first_half itself has two children. As
you continue this process you construct a binary tree of table
inheritance using 12 ALTER statements.
In the "long" grammar you can create and partition the table in one
statement:
CREATE TABLE sales
...
PARTITION BY sales_date
(
start (date '2007-01-01') end (date '2008-01-01')
every (interval '1 month')
);
Sorry, but for my taste, the proposed grammar is too long per
command, not flexible enough and instead ambiguous for split points
as well as for constraints. To me it looks like repeating the
mistakes of others.
Thanks for your feedback. Partitioning the table using series of
splits is a clever solution for situations where the partitioning
operation cannot be described using simple equality (like list,hash)
or ordered comparison (range). But for many common business cases,
the "long" grammar is easier to specify.
kind regards,
Jeff
On Jan 12, 2008, at 9:34 AM, Peter Eisentraut wrote:
Well, with an extensible system such as PostgreSQL you will need to
have a
partitioning scheme that can deal with extensions. Perhaps people
want to
partition by XML, GIS, text-search data, or whatever someone might
come up
with in the future.
Hi Peter,
In the proposed solution, hash and list partitions work for all types
that support an equality operator, and range partitions work for all
types that support fully-ordered comparison.
kind regards,
Jeff
Jeff Cohen <jcohen@greenplum.com> writes:
In the proposed solution, hash and list partitions work for all types
that support an equality operator, and range partitions work for all
types that support fully-ordered comparison.
Surely a hashing method would require a *hashable* equality operator,
ie a hash opclass; likewise range partitions would demand a matching
btree opclass. You could do list partitions with an equality operator
of either kind.
Essentially all of the system's current knowledge about the properties
of specific operators is encoded as operator classes for one of these
two built-in index types. If you want to make assumptions about the
behavior of an operator, it really needs to be founded on these types of
opclasses --- or else you're buying into inventing a comparable amount
of infrastructure for some other organizational concept.
I think Peter's point was that you might want to think about
generalizing your concepts so that other kinds of operator classes could
someday serve as the foundations for other kinds of partitioning rules.
regards, tom lane