pg_partman 3.0.0 - real-world usage of native partitioning and a case for native default

Started by Keith Fiskealmost 9 years ago5 messages

keith@omniti.com

almost 9 years ago

I've gotten pg_partman working with native partitioning already so I can
hopefully have things ready to work when 10 is released. I've got a branch
on github with this version for anyone to test and I'll hopefully have this
released in the next few weeks after I finish some more testing myself. Any
feedback would be appreciated!

https://github.com/keithf4/pg_partman/tree/v3.0.0

Thankfully since native partitioning still uses inheritance internally for
the most part, pg_partman works pretty well without nearly as much change
as I thought I would need. The biggest deficiency I'm seeing has to do with
not having a "default" partition to put data that doesn't match any
children. The fact that it throws an error is a concern, but it's not where
I see the main problem. Where this really comes into play is when someone
wants to make an existing table into a partitioned table. There's really no
easy way to do this outside of making a completely brand new partition set
and copying/moving the data from the old to the new. Yes there are
technically ways to do this fairly seamlessly to the user of the
partitioned table, but the complexity of those methods seems absurd in the
face of just allowing a default partition.

A default would basically allow a method similar to what pg_partman does
prior to native. Before I would just make the old table the parent and then
the user could move data to the children as needed, eventually leaving the
parent empty. All data is still accessible during this period and new
writes go to the new children. The old table can't be made the parent with
native obviously, but being able to define a default partition would allow
defining the old table as the default and pretty much give the exact same,
easy migration path. Without defining it as a default, you really can't
attach the old table since it would require a constraint interval that
would likely interfere with other children.

I recall reading before that there are performance implications of having
the default. I think as long as those performance issues are clearly
documented and have no affect if there is no default, it shouldn't be a
concern that would hold this feature up. I believe the benefit of the
default partition to the migration process makes it more than worth it and
I'm hoping it's possible to get into 10 so users can more easily use this
new feature without having to wait for the next major version. Spoke with
several core members at PGConf this year and they asked me to send an email
to hackers making my case, so here it is!

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Keith Fiske (#1)

Re: pg_partman 3.0.0 - real-world usage of native partitioning and a case for native default

On Fri, Mar 31, 2017 at 9:00 PM, Keith Fiske <keith@omniti.com> wrote:

I've gotten pg_partman working with native partitioning already so I can
hopefully have things ready to work when 10 is released. I've got a branch
on github with this version for anyone to test and I'll hopefully have this
released in the next few weeks after I finish some more testing myself. Any
feedback would be appreciated!

https://github.com/keithf4/pg_partman/tree/v3.0.0

There's already a proposal to support default partition as [1]/messages/by-id/CAH2L28tbN4SYyhS7YV1YBWcitkqbhSWfQCy0G=apRcC_PEO-bg@mail.gmail.com -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company. That
proposal talks about default partition in list partitioned tables. For
range partitioned tables, we expect that a single partition with unbounded
bounds would serve as default partition.

Thankfully since native partitioning still uses inheritance internally for
the most part, pg_partman works pretty well without nearly as much change
as I thought I would need. The biggest deficiency I'm seeing has to do with
not having a "default" partition to put data that doesn't match any
children. The fact that it throws an error is a concern, but it's not where
I see the main problem. Where this really comes into play is when someone
wants to make an existing table into a partitioned table. There's really no
easy way to do this outside of making a completely brand new partition set
and copying/moving the data from the old to the new.

If there are multiple partitions, there is likely to be more data that
needs to be moved that is retained in the old table. So, creating complete
brand new partitioning and copying/moving data is annoying but not as much
as it sounds. Obviously, if we could avoid it, we should try to.

Yes there are technically ways to do this fairly seamlessly to the user of
the partitioned table, but the complexity of those methods seems absurd in
the face of just allowing a default partition.

A default would basically allow a method similar to what pg_partman does
prior to native. Before I would just make the old table the parent and then
the user could move data to the children as needed, eventually leaving the
parent empty. All data is still accessible during this period and new
writes go to the new children. The old table can't be made the parent with
native obviously, but being able to define a default partition would allow
defining the old table as the default and pretty much give the exact same,
easy migration path. Without defining it as a default, you really can't
attach the old table since it would require a constraint interval that
would likely interfere with other children.

Yes, I agree that default partition would make it easy to migrate the data.
But, I think, we need ability to SPLIT the partition along-with default
partitioning to convert an unpartitioned table to a partitioned one. We
don't have a SPLIT partition support right now.

I recall reading before that there are performance implications of having
the default. I think as long as those performance issues are clearly
documented and have no affect if there is no default, it shouldn't be a
concern that would hold this feature up. I believe the benefit of the
default partition to the migration process makes it more than worth it and
I'm hoping it's possible to get into 10 so users can more easily use this
new feature without having to wait for the next major version. Spoke with
several core members at PGConf this year and they asked me to send an email
to hackers making my case, so here it is!

I have not followed that mail thread closely, but it will help if you (if
interested) review/update the patch on thread and help the patch to get
into a committable state.

[1]: /messages/by-id/CAH2L28tbN4SYyhS7YV1YBWcitkqbhSWfQCy0G=apRcC_PEO-bg@mail.gmail.com -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
/messages/by-id/CAH2L28tbN4SYyhS7YV1YBWcitkqbhSWfQCy0G=apRcC_PEO-bg@mail.gmail.com
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Keith Fiske

keith@omniti.com

almost 9 years ago

In reply to: Ashutosh Bapat (#2)

Re: pg_partman 3.0.0 - real-world usage of native partitioning and a case for native default

On Mon, Apr 3, 2017 at 5:13 AM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

On Fri, Mar 31, 2017 at 9:00 PM, Keith Fiske <keith@omniti.com> wrote:

I've gotten pg_partman working with native partitioning already so I can
hopefully have things ready to work when 10 is released. I've got a branch
on github with this version for anyone to test and I'll hopefully have this
released in the next few weeks after I finish some more testing myself. Any
feedback would be appreciated!

https://github.com/keithf4/pg_partman/tree/v3.0.0

There's already a proposal to support default partition as [1]. That
proposal talks about default partition in list partitioned tables. For
range partitioned tables, we expect that a single partition with unbounded
bounds would serve as default partition.

This would not work. The completely unbounded partition would overlap all
other possible partitions. How would it decide which child table to put
data in? Looks like this is stopped right from the start anyway.

keith@keith=# create table testing_range (id int, created_at timestamptz)
partition by range (created_at);
CREATE TABLE
Time: 41.987 ms

keith@keith=# create table testing_range_default partition of testing_range
for values from (unbounded) to (unbounded);
CREATE TABLE
Time: 8.625 ms

keith@keith=# create table testing_range_p2017_04 partition of
testing_range for values from ('2017-04-01 00:00:00') to ('2017-05-01
00:00:00');
ERROR: partition "testing_range_p2017_04" would overlap partition
"testing_range_default"
Time: 4.516 ms

Thankfully since native partitioning still uses inheritance internally
for the most part, pg_partman works pretty well without nearly as much
change as I thought I would need. The biggest deficiency I'm seeing has to
do with not having a "default" partition to put data that doesn't match any
children. The fact that it throws an error is a concern, but it's not where
I see the main problem. Where this really comes into play is when someone
wants to make an existing table into a partitioned table. There's really no
easy way to do this outside of making a completely brand new partition set
and copying/moving the data from the old to the new.

If there are multiple partitions, there is likely to be more data that
needs to be moved that is retained in the old table. So, creating complete
brand new partitioning and copying/moving data is annoying but not as much
as it sounds. Obviously, if we could avoid it, we should try to.

Not sure I follow what you're saying here. With pg_partman, making the old
table the parent and still containing all the data has caused no issues
when I've migrated clients to it, nor has anyone reported any issues to me
with this system. New data goes to the child tables as they should and old
data is then moved when convenient. It makes things work very smoothly and
the only outage encountered is a brief lock at creation time.

Yes there are technically ways to do this fairly seamlessly to the user
of the partitioned table, but the complexity of those methods seems absurd
in the face of just allowing a default partition.

A default would basically allow a method similar to what pg_partman does
prior to native. Before I would just make the old table the parent and then
the user could move data to the children as needed, eventually leaving the
parent empty. All data is still accessible during this period and new
writes go to the new children. The old table can't be made the parent with
native obviously, but being able to define a default partition would allow
defining the old table as the default and pretty much give the exact same,
easy migration path. Without defining it as a default, you really can't
attach the old table since it would require a constraint interval that
would likely interfere with other children.

Yes, I agree that default partition would make it easy to migrate the
data. But, I think, we need ability to SPLIT the partition along-with
default partitioning to convert an unpartitioned table to a partitioned
one. We don't have a SPLIT partition support right now.

SPLIT would be fantastic, I agree. But also not realistic to make the v10
code freeze, which is what I'm trying to make a case for here.

I recall reading before that there are performance implications of having
the default. I think as long as those performance issues are clearly
documented and have no affect if there is no default, it shouldn't be a
concern that would hold this feature up. I believe the benefit of the
default partition to the migration process makes it more than worth it and
I'm hoping it's possible to get into 10 so users can more easily use this
new feature without having to wait for the next major version. Spoke with
several core members at PGConf this year and they asked me to send an email
to hackers making my case, so here it is!

I have not followed that mail thread closely, but it will help if you (if
interested) review/update the patch on thread and help the patch to get
into a committable state.

[1] /messages/by-id/CAH2L28tbN4SYyhS7YV1YBWcitkqbh
SWfQCy0G=apRcC_PEO-bg@mail.gmail.com
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

I've been watching that thread as well and as soon as a fix is posted about
the latest concerns, I'll gladly look into reviewing it.

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Keith Fiske (#3)

Re: pg_partman 3.0.0 - real-world usage of native partitioning and a case for native default

On Mon, Apr 3, 2017 at 10:45 PM, Keith Fiske <keith@omniti.com> wrote:

On Mon, Apr 3, 2017 at 5:13 AM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

On Fri, Mar 31, 2017 at 9:00 PM, Keith Fiske <keith@omniti.com> wrote:

I've gotten pg_partman working with native partitioning already so I can
hopefully have things ready to work when 10 is released. I've got a branch
on github with this version for anyone to test and I'll hopefully have this
released in the next few weeks after I finish some more testing myself. Any
feedback would be appreciated!

https://github.com/keithf4/pg_partman/tree/v3.0.0

There's already a proposal to support default partition as [1]. That
proposal talks about default partition in list partitioned tables. For
range partitioned tables, we expect that a single partition with unbounded
bounds would serve as default partition.

This would not work. The completely unbounded partition would overlap all
other possible partitions. How would it decide which child table to put
data in? Looks like this is stopped right from the start anyway.

keith@keith=# create table testing_range (id int, created_at timestamptz)
partition by range (created_at);
CREATE TABLE
Time: 41.987 ms

keith@keith=# create table testing_range_default partition of
testing_range for values from (unbounded) to (unbounded);
CREATE TABLE
Time: 8.625 ms

keith@keith=# create table testing_range_p2017_04 partition of
testing_range for values from ('2017-04-01 00:00:00') to ('2017-05-01
00:00:00');
ERROR: partition "testing_range_p2017_04" would overlap partition
"testing_range_default"
Time: 4.516 ms

Hmm, looks like default partition for range would be helpful 1. in these
case and 2. to hold data in the holes in the existing partitioning scheme.

Thankfully since native partitioning still uses inheritance internally
for the most part, pg_partman works pretty well without nearly as much
change as I thought I would need. The biggest deficiency I'm seeing has to
do with not having a "default" partition to put data that doesn't match any
children. The fact that it throws an error is a concern, but it's not where
I see the main problem. Where this really comes into play is when someone
wants to make an existing table into a partitioned table. There's really no
easy way to do this outside of making a completely brand new partition set
and copying/moving the data from the old to the new.

If there are multiple partitions, there is likely to be more data that
needs to be moved that is retained in the old table. So, creating complete
brand new partitioning and copying/moving data is annoying but not as much
as it sounds. Obviously, if we could avoid it, we should try to.

Not sure I follow what you're saying here. With pg_partman, making the old
table the parent and still containing all the data has caused no issues
when I've migrated clients to it, nor has anyone reported any issues to me
with this system. New data goes to the child tables as they should and old
data is then moved when convenient. It makes things work very smoothly and
the only outage encountered is a brief lock at creation time.

In partitioning, partitioned table doesn't have any storage. Data that
belongs to a given partition is expected to be there from day one.

I've been watching that thread as well and as soon as a fix is posted
about the latest concerns, I'll gladly look into reviewing it.

Thanks.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Keith Fiske

keith@omniti.com

almost 9 years ago

In reply to: Ashutosh Bapat (#4)

Re: pg_partman 3.0.0 - real-world usage of native partitioning and a case for native default

On Mon, Apr 3, 2017 at 11:33 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

Thankfully since native partitioning still uses inheritance internally
for the most part, pg_partman works pretty well without nearly as much
change as I thought I would need. The biggest deficiency I'm seeing has to
do with not having a "default" partition to put data that doesn't match any
children. The fact that it throws an error is a concern, but it's not where
I see the main problem. Where this really comes into play is when someone
wants to make an existing table into a partitioned table. There's really no
easy way to do this outside of making a completely brand new partition set
and copying/moving the data from the old to the new.

If there are multiple partitions, there is likely to be more data that
needs to be moved that is retained in the old table. So, creating complete
brand new partitioning and copying/moving data is annoying but not as much
as it sounds. Obviously, if we could avoid it, we should try to.

Not sure I follow what you're saying here. With pg_partman, making the
old table the parent and still containing all the data has caused no issues
when I've migrated clients to it, nor has anyone reported any issues to me
with this system. New data goes to the child tables as they should and old
data is then moved when convenient. It makes things work very smoothly and
the only outage encountered is a brief lock at creation time.

In partitioning, partitioned table doesn't have any storage. Data that
belongs to a given partition is expected to be there from day one.

--

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

I understand that with native, the parent has no data and never will. I'm
just using pg_partman's method as an example. The DEFAULT would take the
place of the parent table in this situation. Yes, in an ideal world, all
the data would be in the children right from the beginning when you declare
the parent. But that hardly seems realistic, especially when you have to
partition an existing billion+ row table and keep it running at the same
time. Yes, the basic purpose of the default is to catch data that gets
inserted outside the current child existence. That's what I also do with
the parent in pg_partman. But it can also serve as a method to ease
migration, as the parent also does in pg_partman's trigger-based method.

Keith