Partitioning WIP patch (was: Partitioning: issues/ideas)

Started by Amit Langotealmost 11 years ago25 messages
#1Amit Langote
Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
1 attachment(s)

On 21-01-2015 PM 07:26, Amit Langote wrote:

Ok, I will limit myself to focusing on following things at the moment:

* Provide syntax in CREATE TABLE to declare partition key
* Provide syntax in CREATE TABLE to declare a table as partition of a
partitioned table and values it contains
* Arrange to have partition key and values stored in appropriate
catalogs (existing or new)
* Arrange to cache partitioning info of partitioned tables in relcache

Here is an experimental patch that attempts to implement this.

It implements the following syntax:

* Syntax for defining partition key:
CREATE TABLE table_name(columns)PARTITION BY {RANGE|LIST} ON (key_spec);

where key_spec consists of partition key column names and optional
operator class per column. Currently, there are restrictions on the
key_spec such as allowing only column names (not arbitrary expressions
of them), only one column for list strategy, etc.

* Syntax for declaring a table as partition of a partitioned table:
CREATE TABLE table_name PARTITION OF parent_name FOR VALUES values_clause;

where values_clause can be:

IN (list_of_values), or
BETWEEN (range_lower_bounds) AND (range_upper_bounds);

The semantics for a range is [range_lower_bounds,range_upper_bounds),
that is, lower inclusive, upper exclusive. (this might later change
subject to choice regarding preferred/desired syntax)

Additionally, a partition can itself be further partitioned (though I
have not worked on the implementation details of multilevel partitioning
yet):

CREATE TABLE table_name PARTITION OF parent_name PARTITION BY
{RANGE|LIST} ON(key_columns) FOR VALUES values_clause;

There are two system catalogs pg_partitioned_rel and pg_partition which
store partition key spec and partition value spec, respectively.
(remember to initdb if interested in trying)

Please note that the above syntax and/or catalog may not be very
appealing nor that they won't change/disappear. I am posting the patch
more for examining the approach of internal representation of the
metadata for partitioning and get some general comments.

The approach I have implemented in this patch consists of loading
the partition key info and a list of partitions into the relation
descriptor for a partitioned table. In case of range partitioning, this
list is sorted on the range max bound. To see if that works any good, I
hacked ExecInsert() to make it find a partition for a tuple by
adding a ExecFindPartition() just before heap_insert(). It accepts
resultRelInfo of the parent and a tuple slot. It binary-searches for and
returns the descriptor of the chosen partition which ExecInsert() then
uses to perform heap_insert() and inserting index tuples. If no
partition is found, parent relation itself is used. heap_insert() and
ExecInsertIndexTuples() are the only things for which partition relation
is used. All of this is just experimental and most probably wrong in
details; but is done just to see what kind of performance to expect from
the chosen internal representation. Another thing is the approach that
tuple-routing (& partition-pruning) is a function of partitioned
relation and the tuple (or restrict quals). It will be more significant
when we'll get to implementing a partition-pruning function.

See below an example to show that having an extra ExecFindPartition()
does not degrade the performance of inserting a tuple much:

-- a plain table
CREATE TABLE parent_monthly(year int, month int, day int);

-- a partitioned table
-- xxxxx: number of partitions
CREATE TABLE parent_monthly_xxxxx(LIKE parent_monthly) PARTITION BY
RANGE ON(year, month);

-- partitions
CREATE TABLE parent_monthly_xxxxx_201401 PARTITION OF
parent_monthly_00100_201401 FOR VALUES BETWEEN (2014, 1) AND (2014, 2);

CREATE TABLE parent_monthly_xxxxx_201402 PARTITION OF
parent_monthly_00100_201402 FOR VALUES BETWEEN (2014, 2) AND (2014, 3);

CREATE TABLE parent_monthly_xxxxx_201403 PARTITION OF
parent_monthly_00100_201403 FOR VALUES BETWEEN (2014, 3) AND (2014, 4);

<snip>

CREATE TABLE parent_monthly_xxxxx_yyyymm PARTITION OF
parent_monthly_00100_yyyymm FOR VALUES BETWEEN (yyyy, mm AND (yyyy, mm);

-- insert 1 tuple into the plain table
INSERT INTO parent_monthly VALUES (2013, 12, 01);
INSERT 0 1
Time: 3.303 ms

-- insert 1 tuple into the partitioned table
-- #part: number of partitions
-- case 1: find no valid partition
-- case 2: find a valid partition

#parts case 1 case 2
======== ======== ========
10 3.248 ms 3.509 ms
100 3.546 ms 3.269 ms
500 3.497 ms 3.048 ms
1000 3.364 ms 5.379 ms
10000 4.943 ms 5.076 ms

Thoughts?

Thanks,
Amit

Attachments:

20140224_partitions-tuple-routing-poc_v01.patchtext/x-diff; name=20140224_partitions-tuple-routing-poc_v01.patch
#2Amit Langote
Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#1)
Re: Partitioning WIP patch (was: Partitioning: issues/ideas)

On 24-02-2015 PM 05:13, Amit Langote wrote:

-- partitions
CREATE TABLE parent_monthly_xxxxx_201401 PARTITION OF
parent_monthly_00100_201401 FOR VALUES BETWEEN (2014, 1) AND (2014, 2);

CREATE TABLE parent_monthly_xxxxx_201402 PARTITION OF
parent_monthly_00100_201402 FOR VALUES BETWEEN (2014, 2) AND (2014, 3);

CREATE TABLE parent_monthly_xxxxx_201403 PARTITION OF
parent_monthly_00100_201403 FOR VALUES BETWEEN (2014, 3) AND (2014, 4);

<snip>

CREATE TABLE parent_monthly_xxxxx_yyyymm PARTITION OF
parent_monthly_00100_yyyymm FOR VALUES BETWEEN (yyyy, mm AND (yyyy, mm);

Oops, hand-edited "PARTITION OF parent_monthly_" wrongly. Supposed to be -

CREATE TABLE parent_monthly_xxxxx_201401 PARTITION OF
parent_monthly_xxxxx FOR VALUES BETWEEN (2014, 1) AND (2014, 2);

and so on.

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Amit Langote
Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#1)
Re: Partitioning WIP patch (was: Partitioning: issues/ideas)

On 24-02-2015 PM 05:13, Amit Langote wrote:

Additionally, a partition can itself be further partitioned (though I
have not worked on the implementation details of multilevel partitioning
yet):

CREATE TABLE table_name PARTITION OF parent_name PARTITION BY
{RANGE|LIST} ON(key_columns) FOR VALUES values_clause;

One more blunder, supposed to be:

CREATE TABLE table_name PARTITION OF parent_name FOR VALUES
values_clause PARTITION BY {RANGE|LIST} ON(key_columns);

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Corey Huinker
Corey Huinker
corey.huinker@gmail.com
In reply to: Amit Langote (#3)
Re: Partitioning WIP patch (was: Partitioning: issues/ideas)

I think it's confusing to use BETWEEN to mean [low,high) when it already
means [low,high] in WHERE clauses.

Why not leverage range notation instead?

CREATE TABLE parent_monthly_xxxxx_201401 PARTITION OF
parent_monthly_xxxxx FOR VALUES IN RANGE '[2014-04-01,2014-05-01)'

"IN RANGE" could easily be "WITHIN RANGE" or "WITHIN" or something else.

Clearly, this example above assumes that the partitioning is on a single
column.

For partitioning on a set of columns you're essentially creating a custom
composite type with major-minor collation, could that custom type be
created at table creation time? Could an existing composite type be
declared as the partition key?

CREATE TYPE year_month( year int, month int );

(CREATE OPERATOR... for < = > )

CREATE TABLE parent_monthly(year int, month int, day int) PARTITION BY
RANGE ON year_month(year, month);

On Tue, Feb 24, 2015 at 5:53 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp

Show quoted text

wrote:

On 24-02-2015 PM 05:13, Amit Langote wrote:

Additionally, a partition can itself be further partitioned (though I
have not worked on the implementation details of multilevel partitioning
yet):

CREATE TABLE table_name PARTITION OF parent_name PARTITION BY
{RANGE|LIST} ON(key_columns) FOR VALUES values_clause;

One more blunder, supposed to be:

CREATE TABLE table_name PARTITION OF parent_name FOR VALUES
values_clause PARTITION BY {RANGE|LIST} ON(key_columns);

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Amit Langote
Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Corey Huinker (#4)
Re: Partitioning WIP patch (was: Partitioning: issues/ideas)

On 25-02-2015 AM 01:13, Corey Huinker wrote:

I think it's confusing to use BETWEEN to mean [low,high) when it already
means [low,high] in WHERE clauses.

Yeah, I'm not really attached to that syntax.

Why not leverage range notation instead?

CREATE TABLE parent_monthly_xxxxx_201401 PARTITION OF
parent_monthly_xxxxx FOR VALUES IN RANGE '[2014-04-01,2014-05-01)'

"IN RANGE" could easily be "WITHIN RANGE" or "WITHIN" or something else.

Clearly, this example above assumes that the partitioning is on a single
column.

For partitioning on a set of columns you're essentially creating a custom
composite type with major-minor collation, could that custom type be
created at table creation time? Could an existing composite type be
declared as the partition key?

The answer to the latter is yes as long as there is an operator class
that supports a strategy compatible with the chosen partitioning
strategy. For example, record/composite type has built-in support for
various btree strategies via record_ops.

As for the former, I tend to think creating new user-space
types/operators transparently might not be a very good idea. One concern
would be pg_dump related. Though I also agree there is some duplication
of major-minor collation logic in case of multi-column keys.

CREATE TYPE year_month( year int, month int );

(CREATE OPERATOR... for < = > )

CREATE TABLE parent_monthly(year int, month int, day int) PARTITION BY
RANGE ON year_month(year, month);

Regards,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Amit Langote
Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#1)
2 attachment(s)
Re: Partitioning WIP patch (was: Partitioning: issues/ideas)

On 24-02-2015 PM 05:13, Amit Langote wrote:

On 21-01-2015 PM 07:26, Amit Langote wrote:

Ok, I will limit myself to focusing on following things at the moment:

* Provide syntax in CREATE TABLE to declare partition key
* Provide syntax in CREATE TABLE to declare a table as partition of a
partitioned table and values it contains
* Arrange to have partition key and values stored in appropriate
catalogs (existing or new)
* Arrange to cache partitioning info of partitioned tables in relcache

Here is an experimental patch that attempts to implement this.

I divided the patch into two for convenience:

1) 0001_partition_syntax_catalog - adds commands, catalog and
partitioned table relation descriptor related WIP code

2) 0002_tuple-routing-poc - an experimental patch to test how well
binary search approach works for tuple routing in ExecInsert().

Please take a look.

Thanks,
Amit

Attachments:

0001_partition_syntax_catalog_v001.patchtext/x-diff; name=0001_partition_syntax_catalog_v001.patch
0002_tuple-routing-poc_v001.patchtext/x-diff; name=0002_tuple-routing-poc_v001.patch
#7Josh Berkus
Josh Berkus
josh@agliodbs.com
In reply to: Amit Langote (#1)
Re: Partitioning WIP patch

On 02/24/2015 12:13 AM, Amit Langote wrote:

Here is an experimental patch that attempts to implement this.

This looks awesome. I would love to have it for 9.5, but I guess the
patch isn't nearly baked enough for that?

It implements the following syntax:

* Syntax for defining partition key:
CREATE TABLE table_name(columns)PARTITION BY {RANGE|LIST} ON (key_spec);

where key_spec consists of partition key column names and optional
operator class per column. Currently, there are restrictions on the
key_spec such as allowing only column names (not arbitrary expressions
of them), only one column for list strategy, etc.

What's the obstacle to supporting expressions and/or IMMUTABLE
functions? I think it's fine to add this feature without them
initially, I'm just asking about the roadmap for eventually supporting
expressions in the key spec.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Jim Nasby
Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Amit Langote (#1)
Re: Partitioning WIP patch (was: Partitioning: issues/ideas)

On 2/24/15 2:13 AM, Amit Langote wrote:

-- a plain table
CREATE TABLE parent_monthly(year int, month int, day int);

-- a partitioned table
-- xxxxx: number of partitions
CREATE TABLE parent_monthly_xxxxx(LIKE parent_monthly) PARTITION BY
RANGE ON(year, month);

To be clear, in this example parent_table_xxxxx is in no way related to
parent_monthly, just like a normal CREATE TABLE (LIKE table), right?

-- partitions
CREATE TABLE parent_monthly_xxxxx_201401 PARTITION OF
parent_monthly_00100_201401 FOR VALUES BETWEEN (2014, 1) AND (2014, 2);

And the partitions are still inheritance children?

Does ALTER TABLE parent_monthly_xxxxx_201401 ADD COLUMN foo still
operate the same as today? I'd like to see us continue to support that,
but perhaps it would be wise to not paint ourselves into that corner
just yet.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Amit Langote
Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Jim Nasby (#8)
Re: Partitioning WIP patch

On 26-02-2015 AM 09:28, Jim Nasby wrote:

On 2/24/15 2:13 AM, Amit Langote wrote:

-- a plain table
CREATE TABLE parent_monthly(year int, month int, day int);

-- a partitioned table
-- xxxxx: number of partitions
CREATE TABLE parent_monthly_xxxxx(LIKE parent_monthly) PARTITION BY
RANGE ON(year, month);

To be clear, in this example parent_table_xxxxx is in no way related to
parent_monthly, just like a normal CREATE TABLE (LIKE table), right?

Yes, there is no relation at all. I was maybe just trying to save few
keystrokes. Sorry, that may be confusing.

parent_monthly is just a regular table, part of the example.

-- partitions
CREATE TABLE parent_monthly_xxxxx_201401 PARTITION OF
parent_monthly_00100_201401 FOR VALUES BETWEEN (2014, 1) AND (2014, 2);

And the partitions are still inheritance children?

At this point, they *almost* are; more for the sake of Append. Though
sooner than later, we will have to invent a version of Append for
partitioned tables that does more than just append the outputs of
underlying plans. For example, it would use partitioninfo cached in
relation descriptor of the parent to drive partition-pruning for
starters. Pruning child relations individually by way of
constraint_exclusion doesn't scale as is well known.

To clarify things a bit more, transformCreateStmt() transforms PARTITION
OF into LIKE INCLUDING ALL, not INHERITS. And as mentioned, for Append
to work, I have made ATExecAddInherit() to do some of the things
ATExecAttachPartition() does. Again, that is a temporary arrangement.

Does ALTER TABLE parent_monthly_xxxxx_201401 ADD COLUMN foo still
operate the same as today? I'd like to see us continue to support that,
but perhaps it would be wise to not paint ourselves into that corner
just yet.

Nothing prevents that from working, at least at the moment.

CREATE TABLE parent_monthly_00012(LIKE parent_monthly) PARTITION BY
RANGE ON(year, month);

CREATE TABLE parent_monthly_00012_201401 PARTITION OF
parent_monthly_00012 FOR VALUES BETWEEN (2014, 1) AND (2014, 2);

<snip>

CREATE TABLE parent_monthly_00012_201412 PARTITION OF
parent_monthly_00012 FOR VALUES BETWEEN (2014, 12) AND (2015, 1);

# INSERT INTO parent_monthly_00012 VALUES (2014, 07, 01);
INSERT 0 1

# SELECT * FROM parent_monthly_00012;
year | month | day
------+-------+-----
2014 | 7 | 1
(1 row)

# INSERT INTO parent_monthly_00012 VALUES (2014, 08, 01);
INSERT 0 1

# ALTER TABLE parent_monthly_00012_201408 ADD COLUMN hour int;
ALTER TABLE

# INSERT INTO parent_monthly_00012_201408 VALUES (2014, 08, 01, 10);
INSERT 0 1

# SELECT * FROM parent_monthly_00012;
year | month | day
------+-------+-----
2014 | 7 | 1
2014 | 8 | 1
2014 | 8 | 1
(3 rows)

# SELECT * FROM parent_monthly_00012_201408;
year | month | day | hour
------+-------+-----+------
2014 | 8 | 1 |
2014 | 8 | 1 | 10
(2 rows)

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Amit Langote
Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#9)
Re: Partitioning WIP patch

On 26-02-2015 AM 10:24, Amit Langote wrote:

To clarify things a bit more, transformCreateStmt() transforms PARTITION
OF into LIKE INCLUDING ALL, not INHERITS. And as mentioned, for Append
to work, I have made ATExecAddInherit() to do some of the things
ATExecAttachPartition() does. Again, that is a temporary arrangement.

I misspoke. Should have said:

... for Append to work, I have made ATExecAttachPartition() to do some
of the things ATExecAddInherit() does. Again, that is a temporary
arrangement.

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Jim Nasby
Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Amit Langote (#9)
Re: Partitioning WIP patch

On 2/25/15 7:24 PM, Amit Langote wrote:

Does ALTER TABLE parent_monthly_xxxxx_201401 ADD COLUMN foo still
operate the same as today? I'd like to see us continue to support that,
but perhaps it would be wise to not paint ourselves into that corner
just yet.

Nothing prevents that from working, at least at the moment.

Ok, but is that what we really want? If we release it that way we'll be
stuck with it forever.

I would certainly prefer that we support the capabilities of inheritance
along with partitioning (because in some cases you want both). But it's
going to limit what we can do internally.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Amit Langote
Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Jim Nasby (#11)
Re: Partitioning WIP patch

On 26-02-2015 AM 10:31, Jim Nasby wrote:

On 2/25/15 7:24 PM, Amit Langote wrote:

Does ALTER TABLE parent_monthly_xxxxx_201401 ADD COLUMN foo still
operate the same as today? I'd like to see us continue to support that,
but perhaps it would be wise to not paint ourselves into that corner
just yet.

Nothing prevents that from working, at least at the moment.

Ok, but is that what we really want? If we release it that way we'll be
stuck with it forever.

AIUI, as far as we stay with a design where partitions (children) are
individually planned, that might be OK. But, I guess things will get
more complicated. I think the role of a parent in planning would remain
limited to drive partition-pruning. Am I missing something?

I would certainly prefer that we support the capabilities of inheritance
along with partitioning (because in some cases you want both). But it's
going to limit what we can do internally.

Just to clarify are you referring to inheritance relationship between a
partitioned table and partitions?

With explicit partitioning, shouldn't we go in direction where we remove
some restrictions imposed by inheritance (think multiple inheritance)? I
recall a link Alvaro had started the discussion with think link to a
Tom's remark about something very related:

/messages/by-id/1598.1399826841@sss.pgh.pa.us

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Amit Langote
Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Josh Berkus (#7)
Re: Partitioning WIP patch

On 26-02-2015 AM 05:15, Josh Berkus wrote:

On 02/24/2015 12:13 AM, Amit Langote wrote:

Here is an experimental patch that attempts to implement this.

This looks awesome.

Thanks!

I would love to have it for 9.5, but I guess the
patch isn't nearly baked enough for that?

I'm not quite sure what would qualify as baked enough for 9.5 though we
can surely try to reach some consensus on various implementation aspects
and perhaps even get it ready in time for 9.5.

where key_spec consists of partition key column names and optional
operator class per column. Currently, there are restrictions on the
key_spec such as allowing only column names (not arbitrary expressions
of them), only one column for list strategy, etc.

What's the obstacle to supporting expressions and/or IMMUTABLE
functions? I think it's fine to add this feature without them
initially, I'm just asking about the roadmap for eventually supporting
expressions in the key spec.

Only one concern I can remember someone had raised is that having to
evaluate an expression for every row during bulk-inserts may end up
being pretty expensive. Though, we might have to live with that.

I think one idea is to learn from ability to use expressions in indexes.

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Jim Nasby
Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Amit Langote (#12)
Re: Partitioning WIP patch

On 2/25/15 7:57 PM, Amit Langote wrote:

On 26-02-2015 AM 10:31, Jim Nasby wrote:

On 2/25/15 7:24 PM, Amit Langote wrote:

Does ALTER TABLE parent_monthly_xxxxx_201401 ADD COLUMN foo still
operate the same as today? I'd like to see us continue to support that,
but perhaps it would be wise to not paint ourselves into that corner
just yet.

Nothing prevents that from working, at least at the moment.

Ok, but is that what we really want? If we release it that way we'll be
stuck with it forever.

AIUI, as far as we stay with a design where partitions (children) are
individually planned, that might be OK. But, I guess things will get
more complicated. I think the role of a parent in planning would remain
limited to drive partition-pruning. Am I missing something?

Isn't the point of adding explicit partitioning to make it faster than
plain inheritance? Presumably as part of that we'll eventually want to
NOT plan children individually.

I would certainly prefer that we support the capabilities of inheritance
along with partitioning (because in some cases you want both). But it's
going to limit what we can do internally.

Just to clarify are you referring to inheritance relationship between a
partitioned table and partitions?

Yes. If it helps, the exact use-case I have in mind is using list-based
partitioning + additional columns in some/all children (different
between children). For example, if you need to track different types of
customer payment methods, you'd have a payment parent table, a list
partition for credit & debit cards, a different list partition for bank
accounts, etc.

The reason I'd like to do this with partitioning vs plain inheritance is
presumably as we build out partitioning we'll get very useful things
like the ability to have FKs to properly partitioned tables. Insert
tuple routing could also be useful.

With explicit partitioning, shouldn't we go in direction where we remove
some restrictions imposed by inheritance (think multiple inheritance)? I
recall a link Alvaro had started the discussion with think link to a
Tom's remark about something very related:

/messages/by-id/1598.1399826841@sss.pgh.pa.us

That post looks like Tom figured out a way to eliminate a problem that
hurts inheritance, so that's good.

My fear is that at some point we'll hit a problem with partitioning that
we can't solve in the inheritance model. If we allow inheritance
features into partitioning now we'll painted into a corner. If we
disallow those features now we can always re-enable them if we get to
the point where we're in the clear.

Does that make sense?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Amit Langote
Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Jim Nasby (#14)
Re: Partitioning WIP patch

On 26-02-2015 PM 05:20, Jim Nasby wrote:

On 2/25/15 7:57 PM, Amit Langote wrote:

AIUI, as far as we stay with a design where partitions (children) are
individually planned, that might be OK. But, I guess things will get
more complicated. I think the role of a parent in planning would remain
limited to drive partition-pruning. Am I missing something?

Isn't the point of adding explicit partitioning to make it faster than
plain inheritance? Presumably as part of that we'll eventually want to
NOT plan children individually.

Yes, we'd definitely want to get to a point where planning children
individually is not necessary. But I am afraid we will have to get there
a step at a time. IMHO, solving one problem of partition-pruning would
be a good start. And that will definitely be part of parent's planning
using partition bounds list (not pruning children one-by-one with
relation_excluded_by_constraints()).

I would certainly prefer that we support the capabilities of inheritance
along with partitioning (because in some cases you want both). But it's
going to limit what we can do internally.

Just to clarify are you referring to inheritance relationship between a
partitioned table and partitions?

Yes. If it helps, the exact use-case I have in mind is using list-based
partitioning + additional columns in some/all children (different
between children). For example, if you need to track different types of
customer payment methods, you'd have a payment parent table, a list
partition for credit & debit cards, a different list partition for bank
accounts, etc.

The reason I'd like to do this with partitioning vs plain inheritance is
presumably as we build out partitioning we'll get very useful things
like the ability to have FKs to properly partitioned tables. Insert
tuple routing could also be useful.

Unless I'm missing something again, isn't allowing partitions to have
heterogeneous rowtypes a problem in the long run? I'm afraid I'm
confused as to your stand regarding inheritance vs. new partitioning. To
be specific, children with heterogeneous schemas sounds much like what
inheritance would be good for as you say. But then isn't that why we
have to plan children individually which you said new partitioning
should get away from?

With explicit partitioning, shouldn't we go in direction where we remove
some restrictions imposed by inheritance (think multiple inheritance)? I
recall a link Alvaro had started the discussion with think link to a
Tom's remark about something very related:

/messages/by-id/1598.1399826841@sss.pgh.pa.us

That post looks like Tom figured out a way to eliminate a problem that
hurts inheritance, so that's good.

My fear is that at some point we'll hit a problem with partitioning that
we can't solve in the inheritance model. If we allow inheritance
features into partitioning now we'll painted into a corner. If we
disallow those features now we can always re-enable them if we get to
the point where we're in the clear.

Does that make sense?

Yes, it does. In fact, I do intend to keep them separate the first
attempt of which is to choose to NOT transform a PARTITION OF parent
clause into INHERITS parent. Any code that may look like it's trying to
do that is because the patch is not fully baked yet.

Regards,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Andres Freund
Andres Freund
andres@2ndquadrant.com
In reply to: Jim Nasby (#14)
Re: Partitioning WIP patch

On 2015-02-26 02:20:21 -0600, Jim Nasby wrote:

The reason I'd like to do this with partitioning vs plain inheritance is
presumably as we build out partitioning we'll get very useful things like
the ability to have FKs to properly partitioned tables. Insert tuple routing
could also be useful.

The problem there imo isn't so much inheritance, but lack of working
unique checks across partitions. That's something we can implement
independent of this, it's just not trivial.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Jim Nasby
Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Amit Langote (#15)
Re: Partitioning WIP patch

On 2/26/15 3:09 AM, Amit Langote wrote:

Yes. If it helps, the exact use-case I have in mind is using list-based

partitioning + additional columns in some/all children (different
between children). For example, if you need to track different types of
customer payment methods, you'd have a payment parent table, a list
partition for credit & debit cards, a different list partition for bank
accounts, etc.

The reason I'd like to do this with partitioning vs plain inheritance is
presumably as we build out partitioning we'll get very useful things
like the ability to have FKs to properly partitioned tables. Insert
tuple routing could also be useful.

Unless I'm missing something again, isn't allowing partitions to have
heterogeneous rowtypes a problem in the long run? I'm afraid I'm
confused as to your stand regarding inheritance vs. new partitioning. To
be specific, children with heterogeneous schemas sounds much like what
inheritance would be good for as you say. But then isn't that why we
have to plan children individually which you said new partitioning
should get away from?

Apologies if I haven't been clear enough. What I'd like to see is the
best of both worlds; fast partitioning when not using inheritance, and
perhaps somewhat slower when using inheritance, but still with the
features partitioning gives you.

But my bigger concern from a project standpoint is that we not put
ourselves in a position of supporting something that we really don't
want to support (a partitioning system that's got inheritance mixed in).
As much as I'd personally like to have both features together, I think
it would be bad for the community to go down that road without careful
thought.

With explicit partitioning, shouldn't we go in direction where we remove
some restrictions imposed by inheritance (think multiple inheritance)? I
recall a link Alvaro had started the discussion with think link to a
Tom's remark about something very related:

/messages/by-id/1598.1399826841@sss.pgh.pa.us

That post looks like Tom figured out a way to eliminate a problem that
hurts inheritance, so that's good.

My fear is that at some point we'll hit a problem with partitioning that
we can't solve in the inheritance model. If we allow inheritance
features into partitioning now we'll painted into a corner. If we
disallow those features now we can always re-enable them if we get to
the point where we're in the clear.

Does that make sense?

Yes, it does. In fact, I do intend to keep them separate the first
attempt of which is to choose to NOT transform a PARTITION OF parent
clause into INHERITS parent. Any code that may look like it's trying to
do that is because the patch is not fully baked yet.

Ok, good to know. That's why I was asking about ALTER TABLE ADD COLUMN
on a partition. If we release something without that being restricted
it'll probably cause trouble later on.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Jim Nasby
Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Andres Freund (#16)
Re: Partitioning WIP patch

On 2/26/15 3:22 AM, Andres Freund wrote:

On 2015-02-26 02:20:21 -0600, Jim Nasby wrote:

The reason I'd like to do this with partitioning vs plain inheritance is
presumably as we build out partitioning we'll get very useful things like
the ability to have FKs to properly partitioned tables. Insert tuple routing
could also be useful.

The problem there imo isn't so much inheritance, but lack of working
unique checks across partitions. That's something we can implement
independent of this, it's just not trivial.

There's been discussion of allowing for uniqueness when we can guarantee
no overlap between partitions, and the partition key is part of the
unique constraint. That's the particular use case I was thinking of.

I suspect there's other partitioning features that would be useful in a
generic inheritance setup as well; that's why I'd love to see both
features work together... but I fear there's enough work to get there
that it may not happen, and I don't want us to accidentally start mixing
the two and have users start relying on it.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Josh Berkus
Josh Berkus
josh@agliodbs.com
In reply to: Amit Langote (#1)
Re: Partitioning WIP patch

On 02/25/2015 07:15 PM, Amit Langote wrote:

On 26-02-2015 AM 05:15, Josh Berkus wrote:

On 02/24/2015 12:13 AM, Amit Langote wrote:

Here is an experimental patch that attempts to implement this.

This looks awesome.

Thanks!

I would love to have it for 9.5, but I guess the
patch isn't nearly baked enough for that?

I'm not quite sure what would qualify as baked enough for 9.5 though we
can surely try to reach some consensus on various implementation aspects
and perhaps even get it ready in time for 9.5.

Well, we don't have long at all to do that. I guess I'm asking what
kind of completeness of code we have; is this basically done pending API
changes and bugs, or are there major bits (like, say, pg_dump and
EXPLAIN support) which are completely unimplemented?

where key_spec consists of partition key column names and optional
operator class per column. Currently, there are restrictions on the
key_spec such as allowing only column names (not arbitrary expressions
of them), only one column for list strategy, etc.

What's the obstacle to supporting expressions and/or IMMUTABLE
functions? I think it's fine to add this feature without them
initially, I'm just asking about the roadmap for eventually supporting
expressions in the key spec.

Only one concern I can remember someone had raised is that having to
evaluate an expression for every row during bulk-inserts may end up
being pretty expensive. Though, we might have to live with that.

Well, it's not more expensive than having to materialize the value from
a trigger and store it on disk. The leading one here would be functions
over timestamp; for example, the data has a timestamptz, but you want to
partition by week.

I think one idea is to learn from ability to use expressions in indexes.

Sure. So a feature to implement for the 2nd release.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Andres Freund
Andres Freund
andres@2ndquadrant.com
In reply to: Amit Langote (#13)
Re: Partitioning WIP patch

On 2015-02-26 12:15:17 +0900, Amit Langote wrote:

On 26-02-2015 AM 05:15, Josh Berkus wrote:

On 02/24/2015 12:13 AM, Amit Langote wrote:

Here is an experimental patch that attempts to implement this.

I would love to have it for 9.5, but I guess the
patch isn't nearly baked enough for that?

I'm not quite sure what would qualify as baked enough for 9.5 though we
can surely try to reach some consensus on various implementation aspects
and perhaps even get it ready in time for 9.5.

I think it's absolutely unrealistic to get this into 9.5. There's barely
been any progress on the current (last!) commitfest - where on earth
should the energy come to make this patch ready? And why would that be
fair against all the others that have submitted in time?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Amit Langote
Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Andres Freund (#20)
Re: Partitioning WIP patch

On 27-02-2015 AM 03:24, Andres Freund wrote:

On 2015-02-26 12:15:17 +0900, Amit Langote wrote:

On 26-02-2015 AM 05:15, Josh Berkus wrote:

I would love to have it for 9.5, but I guess the
patch isn't nearly baked enough for that?

I'm not quite sure what would qualify as baked enough for 9.5 though we
can surely try to reach some consensus on various implementation aspects
and perhaps even get it ready in time for 9.5.

I think it's absolutely unrealistic to get this into 9.5. There's barely
been any progress on the current (last!) commitfest - where on earth
should the energy come to make this patch ready? And why would that be
fair against all the others that have submitted in time?

I realize and I apologize that it was irresponsible of me to have said
that; maybe got a bit too excited. I do not want to unduly draw people's
time on something that's not quite ready while there are other things
people have worked hard on to get in time. In all earnestness, I say we
spend time perfecting those things.

I'll add this into CF-JUN'15. I will keep posting updates meanwhile so
that when that commitfest finally starts, we will have something worth
considering.

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Amit Langote
Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Josh Berkus (#19)
Re: Partitioning WIP patch

On 27-02-2015 AM 03:18, Josh Berkus wrote:

On 02/25/2015 07:15 PM, Amit Langote wrote:

I'm not quite sure what would qualify as baked enough for 9.5 though we
can surely try to reach some consensus on various implementation aspects
and perhaps even get it ready in time for 9.5.

Well, we don't have long at all to do that. I guess I'm asking what
kind of completeness of code we have; is this basically done pending API
changes and bugs, or are there major bits (like, say, pg_dump and
EXPLAIN support) which are completely unimplemented?

I would say I am not entirely sure/satisfied about some decisions I have
made (or not) when writing even the basic patch. Yes,
pg_dump/EXPLAIN/psql, etc. are not touched. So, it seems it might not be
fair to claim it's actually something for 9.5. Let me just call it WIP
for a while while keep I working on it and receive feedback.

Only one concern I can remember someone had raised is that having to
evaluate an expression for every row during bulk-inserts may end up
being pretty expensive. Though, we might have to live with that.

Well, it's not more expensive than having to materialize the value from
a trigger and store it on disk. The leading one here would be functions
over timestamp; for example, the data has a timestamptz, but you want to
partition by week.

I think one idea is to learn from ability to use expressions in indexes.

Sure. So a feature to implement for the 2nd release.

Actually, I'm trying to add that and see how it works. I will post an
updated patch soon if it looks good enough.

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Michael Paquier
Michael Paquier
michael.paquier@gmail.com
In reply to: Andres Freund (#20)
Re: Partitioning WIP patch

On Fri, Feb 27, 2015 at 3:24 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2015-02-26 12:15:17 +0900, Amit Langote wrote:

On 26-02-2015 AM 05:15, Josh Berkus wrote:

On 02/24/2015 12:13 AM, Amit Langote wrote:

Here is an experimental patch that attempts to implement this.

I would love to have it for 9.5, but I guess the
patch isn't nearly baked enough for that?

I'm not quite sure what would qualify as baked enough for 9.5 though we
can surely try to reach some consensus on various implementation aspects
and perhaps even get it ready in time for 9.5.

I think it's absolutely unrealistic to get this into 9.5. There's barely
been any progress on the current (last!) commitfest - where on earth
should the energy come to make this patch ready? And why would that be
fair against all the others that have submitted in time?

+1. There are many other patches pending the in CF app waiting for
feedback, while this one showed up after the last CF deadline for 9.5
and needs design and spec decisions that should not be taken lightly
at the end of a major release development cycle. Please let's not rush
into something we may regret.
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Amit Langote
Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Jim Nasby (#17)
Re: Partitioning WIP patch

On 27-02-2015 AM 03:01, Jim Nasby wrote:

On 2/26/15 3:09 AM, Amit Langote wrote:

Unless I'm missing something again, isn't allowing partitions to have
heterogeneous rowtypes a problem in the long run? I'm afraid I'm
confused as to your stand regarding inheritance vs. new partitioning. To
be specific, children with heterogeneous schemas sounds much like what
inheritance would be good for as you say. But then isn't that why we
have to plan children individually which you said new partitioning
should get away from?

Apologies if I haven't been clear enough. What I'd like to see is the
best of both worlds; fast partitioning when not using inheritance, and
perhaps somewhat slower when using inheritance, but still with the
features partitioning gives you.

I get the distinction, thanks.

Actually I wasn't quite thinking of altering the way any part of the
current partitioning based on inheritance works nor am I proposing to
get rid of it. It all stays as is. Not sure how we could say if it will
support features of the new partitioning before those features actually
begin to materialize.

But my bigger concern from a project standpoint is that we not put
ourselves in a position of supporting something that we really don't
want to support (a partitioning system that's got inheritance mixed in).
As much as I'd personally like to have both features together, I think
it would be bad for the community to go down that road without careful
thought.

Yes, it does. In fact, I do intend to keep them separate the first
attempt of which is to choose to NOT transform a PARTITION OF parent
clause into INHERITS parent. Any code that may look like it's trying to
do that is because the patch is not fully baked yet.

Ok, good to know. That's why I was asking about ALTER TABLE ADD COLUMN
on a partition. If we release something without that being restricted
it'll probably cause trouble later on.

Yes, I agree. More generally, I think the patch/approach is in need of a
clear separation of internal implementation concerns and user-facing
notions even at this point. This may be one of them. For example, with
the patch, a partition is defined as:

CREATE "TABLE" name PARTITION OF parent ...

Unless that turns into something like:

CREATE PARTITION name OF parent ...

we may not be able to put all the restrictions we'd want to put on a
partition for the sake of what would be partitioning internals.

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Bruce Momjian
Bruce Momjian
bruce@momjian.us
In reply to: Amit Langote (#21)
Re: Partitioning WIP patch

On Fri, Feb 27, 2015 at 09:09:35AM +0900, Amit Langote wrote:

On 27-02-2015 AM 03:24, Andres Freund wrote:

On 2015-02-26 12:15:17 +0900, Amit Langote wrote:

On 26-02-2015 AM 05:15, Josh Berkus wrote:

I would love to have it for 9.5, but I guess the
patch isn't nearly baked enough for that?

I'm not quite sure what would qualify as baked enough for 9.5 though we
can surely try to reach some consensus on various implementation aspects
and perhaps even get it ready in time for 9.5.

I think it's absolutely unrealistic to get this into 9.5. There's barely
been any progress on the current (last!) commitfest - where on earth
should the energy come to make this patch ready? And why would that be
fair against all the others that have submitted in time?

I realize and I apologize that it was irresponsible of me to have said
that; maybe got a bit too excited. I do not want to unduly draw people's
time on something that's not quite ready while there are other things
people have worked hard on to get in time. In all earnestness, I say we
spend time perfecting those things.

I'll add this into CF-JUN'15. I will keep posting updates meanwhile so
that when that commitfest finally starts, we will have something worth
considering.

I am _very_ glad you have started on this. There is a huge need for
this, and I am certainly excited about it.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers