reload-through-the-top-parent switch the partition table
/messages/by-id/CA+TgmoZFn7TJ7QBsFat
nuEE%3DGYGdZSNXqr9489n5JBsdy5rFfA%40mail.gmail.com
Above thread, it's been pointed out as important consideration
about adding reload-through-the-top-parent switch the partition
table. One small step toward making use of hash function was
adding a switch into pg_dump which reload through the top
parent for the partition table.
Attach patch does the implementation for the same:
- Added switch reload-through-root: (Open for suggestion for the switch
name)
- Fix dumpTableData to COPY to the Root table with the reload-through-root
switch.
- Fix dumpTableData_insert - to generate the proper insert statement with
the reload-through-root switch.
- Added switch reload-through-root into pg_dumpall
Testing:
- Performed test with multi level partition + different schema combination.
- Tested a case with parent and child attributes in different order.
Attaching the pg_dump output for below test with --reload-through-root for
COPY and INSERTS.
Testcase:
create schema a;
create schema b;
create schema c;
create table a.t1 ( a int, b int ) partition by list(a);
create table b.t1_p1 partition of a.t1 FOR VALUES in ( 1, 2, 3) partition
by list(b);
create table c.t1_p1_p1 partition of b.t1_p1 FOR VALUES in (10, 20 );
insert into a.t1 values ( 2 , 10 );
insert into a.t1 values ( 1 , 20 );
My colleague Robert and I had doubt about the order in of TABLE
and TABLE_DATA. We thought earlier that reload-thought-root might
might not solve the purpose which has been discussed in the above
mentioned thread. But later looking into code I realize the sort order
for DO_TABLE and DO_TABLE_DATA are different, so we don't need
to worry about that issue.
TODOs:
- Update documentation for pg_dump & pg_dumpall
Thanks,
Rushabh Lathia
www.EnterpriseDB.com
Attachments:
reload_through_root_inserts.sqltext/x-sql; charset=US-ASCII; name=reload_through_root_inserts.sqlDownload
reload_through_root_copy.sqltext/x-sql; charset=US-ASCII; name=reload_through_root_copy.sqlDownload
reload_through_root_v1_0001.patchtext/x-patch; charset=US-ASCII; name=reload_through_root_v1_0001.patchDownload+73-3
On Tue, Aug 1, 2017 at 5:34 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
My colleague Robert and I had doubt about the order in of TABLE
and TABLE_DATA. We thought earlier that reload-thought-root might
might not solve the purpose which has been discussed in the above
mentioned thread. But later looking into code I realize the sort order
for DO_TABLE and DO_TABLE_DATA are different, so we don't need
to worry about that issue.
Hmm. Does that mean that table data restoration will *absolutely
always* follow all CREATE TABLE commands, or is that something that's
*normally* true but potentially false if dependency sorting switches
things around?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 2, 2017 at 3:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 1, 2017 at 5:34 AM, Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:My colleague Robert and I had doubt about the order in of TABLE
and TABLE_DATA. We thought earlier that reload-thought-root might
might not solve the purpose which has been discussed in the above
mentioned thread. But later looking into code I realize the sort order
for DO_TABLE and DO_TABLE_DATA are different, so we don't need
to worry about that issue.Hmm. Does that mean that table data restoration will *absolutely
always* follow all CREATE TABLE commands, or is that something that's
*normally* true but potentially false if dependency sorting switches
things around?
Looking at the dbObjectTypePriority comments that seems like data
restoration
will *absolutely always* follow all CREATE TABLE commands.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Thanks,
Rushabh Lathia
www.EnterpriseDB.com
On Wed, Aug 2, 2017 at 1:01 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
Looking at the dbObjectTypePriority comments that seems like data
restoration
will *absolutely always* follow all CREATE TABLE commands.
Hmm. I wasn't very convinced by those comments, but Tom's commit
a1ef01fe163b304760088e3e30eb22036910a495 convinces me that it has to
work that way. So I think we are OK on that score.
The patch itself looks just fine on a quick glance, modulo the lack of
documentation, but I think we need to bikeshed the name of the flag.
--reload-through-root is clear as daylight to me, but I'm not sure
users will agree. The lack of the word "partition" is perhaps a
significant flaw, and pg_dump doesn't really reload anything; it just
dumps.
The best thing I can come up with after brief thought is
--partition-data-via-root, but maybe somebody else has a better idea?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
The patch itself looks just fine on a quick glance, modulo the lack of
documentation, but I think we need to bikeshed the name of the flag.
--reload-through-root is clear as daylight to me, but I'm not sure
users will agree. The lack of the word "partition" is perhaps a
significant flaw, and pg_dump doesn't really reload anything; it just
dumps.
The best thing I can come up with after brief thought is
--partition-data-via-root, but maybe somebody else has a better idea?
--restore-via-partition-root ?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 2, 2017 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
The patch itself looks just fine on a quick glance, modulo the lack of
documentation, but I think we need to bikeshed the name of the flag.
--reload-through-root is clear as daylight to me, but I'm not sure
users will agree. The lack of the word "partition" is perhaps a
significant flaw, and pg_dump doesn't really reload anything; it just
dumps.The best thing I can come up with after brief thought is
--partition-data-via-root, but maybe somebody else has a better idea?--restore-via-partition-root ?
I worry someone will think that pg_dump is now restoring stuff, but it isn't.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Aug 2, 2017 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
--restore-via-partition-root ?
I worry someone will think that pg_dump is now restoring stuff, but it isn't.
Well, the point is that the commands it emits will cause the eventual
restore to go through the root. Anyway, I think trying to avoid using
a verb altogether is going to result in a very stilted option name.
I notice that the option list already includes some references to
"insert", so maybe "--insert-via-partition-root"? Although you could
argue that that's confusing when we're using COPY.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 2, 2017 at 10:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Aug 2, 2017 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
--restore-via-partition-root ?
I worry someone will think that pg_dump is now restoring stuff, but it
isn't.
Well, the point is that the commands it emits will cause the eventual
restore to go through the root. Anyway, I think trying to avoid using
a verb altogether is going to result in a very stilted option name.I notice that the option list already includes some references to
"insert", so maybe "--insert-via-partition-root"? Although you could
argue that that's confusing when we're using COPY.
--use-partitioned-table [partition-name, ...] # if names are omitted it
defaults to all partitioned tables
I don't know that we need to use "root" in the argument name to communicate
"the top-most if partitioned tables are nested". We have the docs to
describe exactly what it does. "Partitioned Table" is what we are calling
the main routing table in the docs. "Use" seems adequate.
FWIW my first thought was "--insert-via-partition-root" and I didn't mind
that "COPY" commands would be affected implicitly.
David J.
On Wed, Aug 2, 2017 at 1:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Aug 2, 2017 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
--restore-via-partition-root ?
I worry someone will think that pg_dump is now restoring stuff, but it isn't.
Well, the point is that the commands it emits will cause the eventual
restore to go through the root. Anyway, I think trying to avoid using
a verb altogether is going to result in a very stilted option name.I notice that the option list already includes some references to
"insert", so maybe "--insert-via-partition-root"? Although you could
argue that that's confusing when we're using COPY.
Yeah, that's definitely confusing. I realize that my verbless version
is a little odd, but there are numerous precedents for it: --inserts,
--column-inserts, --if-exists, --strict-names, ...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 8/2/17 13:58, Tom Lane wrote:
I notice that the option list already includes some references to
"insert", so maybe "--insert-via-partition-root"? Although you could
argue that that's confusing when we're using COPY.
"load" could be more general. But I'm also OK with "restore".
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 2, 2017 at 11:47 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
On Wed, Aug 2, 2017 at 10:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Aug 2, 2017 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
--restore-via-partition-root ?
I worry someone will think that pg_dump is now restoring stuff, but it
isn't.Well, the point is that the commands it emits will cause the eventual
restore to go through the root. Anyway, I think trying to avoid using
a verb altogether is going to result in a very stilted option name.I notice that the option list already includes some references to
"insert", so maybe "--insert-via-partition-root"? Although you could
argue that that's confusing when we're using COPY.--use-partitioned-table [partition-name, ...] # if names are omitted it
defaults to all partitioned tables
I like this idea since it allows using this feature for selected
tables e.g. hash tables. Otherwise, users will be forced to use this
option even when there is only a single hash partitioned table and
many other list/range partitioned tables.
What we are trying to do here is dump the data in a partitioned table
as if it's not partitioned. Combine that with --data-only dumps, and
one could use it to load partitioned data into unpartitioned or
differently partitioned table. So, how about naming the option as
--unpartition-partitioned-table [partitioned-table-name, ....] # if
names are omitted it defaults to all the partitioned tables.
That really says what dump is really doing without focusing on how the
data will be used like restoring/inserting/copying etc.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 3, 2017 at 3:39 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:
On Wed, Aug 2, 2017 at 11:47 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:On Wed, Aug 2, 2017 at 10:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Aug 2, 2017 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
--restore-via-partition-root ?
I worry someone will think that pg_dump is now restoring stuff, but it
isn't.Well, the point is that the commands it emits will cause the eventual
restore to go through the root. Anyway, I think trying to avoid using
a verb altogether is going to result in a very stilted option name.I notice that the option list already includes some references to
"insert", so maybe "--insert-via-partition-root"? Although you could
argue that that's confusing when we're using COPY.--use-partitioned-table [partition-name, ...] # if names are omitted it
defaults to all partitioned tablesI like this idea since it allows using this feature for selected
tables e.g. hash tables. Otherwise, users will be forced to use this
option even when there is only a single hash partitioned table and
many other list/range partitioned tables.
+1.
What we are trying to do here is dump the data in a partitioned table
as if it's not partitioned. Combine that with --data-only dumps, and
one could use it to load partitioned data into unpartitioned or
differently partitioned table. So, how about naming the option as--unpartition-partitioned-table [partitioned-table-name, ....] # if
names are omitted it defaults to all the partitioned tables.
--unpartition-partitioned-table is very confusing.
I liked the previous option which is --use-partitioned-table
[partition-name, ...].
The Only problem with --use-partitioned-table is, a user needs to specify
the
partition-name in the options list. Imagine if someone having 100's of
partitions then specifying those name is pg_dump option is a pain.
Rather than that:
--use-partitioned-table [partitioned_name, ...] # if
names are omitted it defaults to all the partitioned tables.
Here user need to specify the root relation name in the option - and any
partition table have that as a ROOT, will load the data through
top-parent-relation.
That really says what dump is really doing without focusing on how the
data will be used like restoring/inserting/copying etc.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Rushabh Lathia
On Thursday, August 3, 2017, Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:
--use-partitioned-table [partitioned_name, ...] # if
names are omitted it defaults to all the partitioned tables.Here user need to specify the root relation name in the option - and any
partition table have that as a ROOT, will load the data through
top-parent-relation.
This is indeed what I intended to convey. Only specify "root" names.
David J.
On Thu, Aug 3, 2017 at 9:01 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
--unpartition-partitioned-table is very confusing.
+1.
I liked the previous option which is --use-partitioned-table
[partition-name, ...].
The Only problem with --use-partitioned-table is, a user needs to specify
the
partition-name in the options list. Imagine if someone having 100's of
partitions then specifying those name is pg_dump option is a pain.
Yeah, that won't work.
Rather than that:
--use-partitioned-table [partitioned_name, ...] # if
names are omitted it defaults to all the partitioned tables.Here user need to specify the root relation name in the option - and any
partition table have that as a ROOT, will load the data through
top-parent-relation.
We could do that, but I'm not sure it's a good idea to use
getopt_long() with optional options. Sometimes that creates confusion
-- is pg_dump --use-partitioned-table salad an attempt to dump the
salad database with the --use-partitioned-table option, or an attempt
to apply --use-partitioned-table only to partitions whose parent is
the salad table? getopt_long() has an answer, but some people may
guess incorrectly about what it is.
I would be more inclined to make this a global option than something
that modifies the behavior for certain tables; the only per-table
flags we have right now are just to include/exclude individual tables.
You could make --inserts or --no-unlogged-table-data apply to some but
not all tables, but we didn't; why start here?
I don't like the specific name --use-partitioned-table much either.
Use it for what?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 2, 2017 at 4:08 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 8/2/17 13:58, Tom Lane wrote:
I notice that the option list already includes some references to
"insert", so maybe "--insert-via-partition-root"? Although you could
argue that that's confusing when we're using COPY."load" could be more general. But I'm also OK with "restore".
"load" seems better than "restore" to me, both because it's shorter
and because it sounds less like pg_dump will be doing the job of
pg_restore.
So maybe --load-via-partition-root if nobody likes my previous
suggestion of --partition-data-via-root ?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
So maybe --load-via-partition-root if nobody likes my previous
suggestion of --partition-data-via-root ?
WFM.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 3, 2017 at 8:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
So maybe --load-via-partition-root if nobody likes my previous
suggestion of --partition-data-via-root ?WFM.
+1
David J.
On 2017/08/04 1:08, David G. Johnston wrote:
On Thu, Aug 3, 2017 at 8:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
So maybe --load-via-partition-root if nobody likes my previous
suggestion of --partition-data-via-root ?WFM.
+1
+1.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Here is an update patch, now renamed the switch to
--load-via-partition-root
and also added the documentation for the new switch into pg_dump as well
as pg_dumpall.
On Fri, Aug 4, 2017 at 7:13 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
wrote:
On 2017/08/04 1:08, David G. Johnston wrote:
On Thu, Aug 3, 2017 at 8:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
So maybe --load-via-partition-root if nobody likes my previous
suggestion of --partition-data-via-root ?WFM.
+1
+1.
Thanks,
Amit
Thanks,
Rushabh Lathia
www.EnterpriseDB.com
Attachments:
load_via_partition_root_v2.patchtext/x-patch; charset=US-ASCII; name=load_via_partition_root_v2.patchDownload+93-3
Hi Rushabh,
While testing latest v2 patch, I got a crash when using
--load-via-partition-root with --schema options. Below are steps to
reproduce.
--create below test data
create schema a;
create schema b;
create schema c;
create table t1 (a int,b text) partition by list(a);
create table a.t1_p1 partition of t1 FOR VALUES in (1,2,3,4) partition by
list(a);
create table b.t1_p1_p1 partition of a.t1_p1 FOR VALUES in (1,2);
create table c.t1_p1_p2 partition of a.t1_p1 FOR VALUES in (3,4);
create table b.t1_p2 partition of t1 FOR VALUES in (5,6,7,8) partition by
list(a);
create table a.t1_p2_p1 partition of b.t1_p2 FOR VALUES in (5,6);
create table t1_p2_p2 partition of b.t1_p2 FOR VALUES in (7,8);
insert into t1 values (8,'t1');
insert into a.t1_p1 values (2,'a.t1_p1');
insert into b.t1_p1_p1 values (1,'b.t1_p1_p1');
insert into c.t1_p1_p2 values (3,'c.t1_p1_p2');
insert into b.t1_p2 values (6,'b.t1_p2');
insert into a.t1_p2_p1 values (5,'a.t1_p2_p1');
insert into t1_p2_p2 values (7,'t1_p2_p1');
insert into t1 values (4 ,'t1');
--trying to take pg_dump
[edb@localhost bin]$ ./pg_dump -d postgres --schema=a -f d1.dump -Fp
[edb@localhost bin]$ ./pg_dump -d postgres --load-via-partition-root -f
d2.dump -Fp
[edb@localhost bin]$ ./pg_dump -d postgres --load-via-partition-root
--schema=a -f d3.dump -Fp
pg_dump: pg_dump.c:2063: getRootTableInfo: Assertion `tbinfo->numParents ==
1' failed.
Aborted (core dumped)
Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
On Fri, Aug 4, 2017 at 3:01 PM, Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:
Show quoted text
Here is an update patch, now renamed the switch to
--load-via-partition-root
and also added the documentation for the new switch into pg_dump as well
as pg_dumpall.On Fri, Aug 4, 2017 at 7:13 AM, Amit Langote <
Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/08/04 1:08, David G. Johnston wrote:
On Thu, Aug 3, 2017 at 8:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
So maybe --load-via-partition-root if nobody likes my previous
suggestion of --partition-data-via-root ?WFM.
+1
+1.
Thanks,
AmitThanks,
Rushabh Lathia
www.EnterpriseDB.com--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers