speedup COPY TO for partitioned table.

Started by jian heover 1 year ago58 messageshackers
Jump to latest
#1jian he
jian.universality@gmail.com

hi.

COPY (select_query) generally slower than
table_beginscan.. table_scan_getnextslot ..table_endscan,
especially for partitioned tables.
so in the function DoCopyTo
trying to use table_beginscan.. table_scan_getnextslot ..table_endscan
for COPY TO when source table is a partitioned table.

----setup-----
CREATE TABLE t3 (a INT, b int ) PARTITION BY RANGE (a);
create table t3_1 partition of t3 for values from (1) to (11);
create table t3_2 partition of t3 for values from (11) to (15);
insert into t3 select g from generate_series(1, 3) g;
insert into t3 select g from generate_series(11, 11) g;

so now you can do:
copy t3 to stdout;

in the master, you will get:
ERROR: cannot copy from partitioned table "t3"
HINT: Try the COPY (SELECT ...) TO variant.

attached copy_par_regress_test.sql is a simple benchmark sql file,
a partitioned table with 10 partitions, 2 levels of indirection.
The simple benchmark shows around 7.7% improvement in my local environment.

local environment:
PostgreSQL 18devel_debug_build_382092a0cd on x86_64-linux, compiled by
gcc-14.1.0, 64-bit

Attachments:

v1-0001-speedup-COPY-TO-for-partitioned-table.patchtext/x-patch; charset=US-ASCII; name=v1-0001-speedup-COPY-TO-for-partitioned-table.patchDownload+61-9
copy_par_regress_test.sqlapplication/sql; name=copy_par_regress_test.sqlDownload
#2Melih Mutlu
m.melihmutlu@gmail.com
In reply to: jian he (#1)
Re: speedup COPY TO for partitioned table.

Hi Jian,

Thanks for the patch.

jian he <jian.universality@gmail.com>, 19 Ara 2024 Per, 15:03 tarihinde
şunu yazdı:

attached copy_par_regress_test.sql is a simple benchmark sql file,
a partitioned table with 10 partitions, 2 levels of indirection.
The simple benchmark shows around 7.7% improvement in my local environment.

I confirm that the patch introduces some improvement in simple cases like
the one you shared. I looked around a bit to understand whether there is an
obvious reason why copying from a partitioned table is not allowed, but
couldn't find one. It seems ok to me.
I realized that while both "COPY <partitioned_table> TO..." and "COPY
(SELECT..) TO..." can return the same set of rows, their orders may not be
the same. I guess that it's hard to guess in which
order find_all_inheritors() would return tables, and that might be
something we should be worried about with the patch. What do you think?

Thanks,
--
Melih Mutlu
Microsoft

#3jian he
jian.universality@gmail.com
In reply to: Melih Mutlu (#2)
Re: speedup COPY TO for partitioned table.

On Wed, Jan 22, 2025 at 6:54 AM Melih Mutlu <m.melihmutlu@gmail.com> wrote:

Hi Jian,

Thanks for the patch.

jian he <jian.universality@gmail.com>, 19 Ara 2024 Per, 15:03 tarihinde şunu yazdı:

attached copy_par_regress_test.sql is a simple benchmark sql file,
a partitioned table with 10 partitions, 2 levels of indirection.
The simple benchmark shows around 7.7% improvement in my local environment.

I confirm that the patch introduces some improvement in simple cases like the one you shared. I looked around a bit to understand whether there is an obvious reason why copying from a partitioned table is not allowed, but couldn't find one. It seems ok to me.

hi. melih mutlu
thanks for confirmation.

I realized that while both "COPY <partitioned_table> TO..." and "COPY (SELECT..) TO..." can return the same set of rows, their orders may not be the same. I guess that it's hard to guess in which order find_all_inheritors() would return tables, and that might be something we should be worried about with the patch. What do you think?

in the
find_all_inheritors->find_inheritance_children->find_inheritance_children_extended

find_inheritance_children_extended we have
"""
if (numoids > 1)
qsort(oidarr, numoids, sizeof(Oid), oid_cmp);
"""

so the find_all_inheritors output order is deterministic?

#4Nathan Bossart
nathandbossart@gmail.com
In reply to: Melih Mutlu (#2)
Re: speedup COPY TO for partitioned table.

On Wed, Jan 22, 2025 at 01:54:32AM +0300, Melih Mutlu wrote:

I confirm that the patch introduces some improvement in simple cases like
the one you shared. I looked around a bit to understand whether there is an
obvious reason why copying from a partitioned table is not allowed, but
couldn't find one. It seems ok to me.

From the original discussion [0]/messages/by-id/CAA4eK1LqTqZkPSoonF5_cOz94OUZG9j0PNfLdhi_nPtW82fFVA@mail.gmail.com, it seems like it was considered a
nonessential part of an otherwise massive patch set. Perhaps it's time to
revisit it.

[0]: /messages/by-id/CAA4eK1LqTqZkPSoonF5_cOz94OUZG9j0PNfLdhi_nPtW82fFVA@mail.gmail.com

--
nathan

#5Melih Mutlu
m.melihmutlu@gmail.com
In reply to: jian he (#3)
Re: speedup COPY TO for partitioned table.

Hi,

jian he <jian.universality@gmail.com>, 27 Oca 2025 Pzt, 04:47 tarihinde
şunu yazdı:

in the

find_all_inheritors->find_inheritance_children->find_inheritance_children_extended

find_inheritance_children_extended we have
"""
if (numoids > 1)
qsort(oidarr, numoids, sizeof(Oid), oid_cmp);
"""

so the find_all_inheritors output order is deterministic?

You're right that order in find_all_inheritors is deterministic. But it's
not always the same with the order of SELECT output. You can quickly see
what I mean by running a slightly modified version of the example that you
shared in your first email:

CREATE TABLE t3 (a INT, b int ) PARTITION BY RANGE (a);
-- change the order. first create t3_2 then t3_1
create table t3_2 partition of t3 for values from (11) to (15);
create table t3_1 partition of t3 for values from (1) to (11);
insert into t3 select g from generate_series(1, 3) g;
insert into t3 select g from generate_series(11, 11) g;

And the results of the two different COPY approaches would be:
postgres=# COPY t3 TO STDOUT;
11 \N
1 \N
2 \N
3 \N
postgres=# COPY (SELECT * FROM t3) TO STDOUT;
1 \N
2 \N
3 \N
11 \N

Notice that "COPY t3 TO STDOUT" changes the order since the partition t3_2
has been created first, hence it has a smaller OID. On the other hand,
SELECT sorts the partitions based on partition boundaries, not OIDs. That's
why we should always see the same order regardless of the OIDs of
partitions (you can see create_range_bounds() in partbounds.c if interested
in more details). One thing that might be useful in the COPY case would be
using a partition descriptor to access the correct order of partitions. I
believe something like (PartitionDesc) partdesc->oid should give us the
partition OIDs in order.

Thanks,
--
Melih Mutlu
Microsoft

#6David Rowley
dgrowleyml@gmail.com
In reply to: Melih Mutlu (#5)
Re: speedup COPY TO for partitioned table.

On Tue, 11 Feb 2025 at 08:10, Melih Mutlu <m.melihmutlu@gmail.com> wrote:

jian he <jian.universality@gmail.com>, 27 Oca 2025 Pzt, 04:47 tarihinde şunu yazdı:

so the find_all_inheritors output order is deterministic?

You're right that order in find_all_inheritors is deterministic. But it's not always the same with the order of SELECT output. You can quickly see what I mean by running a slightly modified version of the example that you shared in your first email:

I think it's fine to raise the question as to whether the order
changing matters, however, I don't personally think there should be
any concerns with this. The main reason I think this is that the
command isn't the same, so the user shouldn't expect the same
behaviour. They'll need to adjust their commands to get the new
behaviour and possibly a different order.

Another few reasons are:

1) In the subquery version, the Append children are sorted by cost, so
the order isn't that predictable in the first place. (See
create_append_path() -> list_sort(subpaths,
append_total_cost_compare))
2) The order tuples are copied with COPY TO on non-partitioned tables
isn't that well defined in the first place. Two reasons for this, a)
the heap is a heap and has no defined order; and b) sync scans might
be used and the scan might start at any point in the heap and circle
back around again to the page prior to the page where the scan
started. See (table_beginscan() adds SO_ALLOW_SYNC to the flags).

I think the main thing to be concerned about regarding order is to
ensure that all rows from the same partition are copied consecutively,
and that does not seem to be at risk of changing here. This is
important as 3592e0ff9 added caching of the last found partition when
partition lookups continually find the same partition.

David

#7jian he
jian.universality@gmail.com
In reply to: David Rowley (#6)
Re: speedup COPY TO for partitioned table.

hi.

rebased and polished patch attached, test case added.
However there is a case (the following) where
``COPY(partitioned_table)`` is much slower
(around 25% in some cases) than ``COPY (select * from partitioned_table)``.

If the partition attribute order is not the same as the partitioned table,
then for each output row, we need to create a template TupleTableSlot
and call execute_attr_map_slot,
i didn't find a work around to reduce the inefficiency.

Since the master doesn't have ``COPY(partitioned_table)``,
I guess this slowness case is allowed?

----------- the follow case is far slower than ``COPY(select * From pp) TO ``
drop table if exists pp;
CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
create table pp_1 (val int, id int);
create table pp_2 (val int, id int);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
insert into pp select g, 10 + g from generate_series(1,9) g;
copy pp to stdout(header);

Attachments:

v2-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=US-ASCII; name=v2-0001-support-COPY-partitioned_table-TO.patchDownload+101-7
#8jian he
jian.universality@gmail.com
In reply to: jian he (#7)
Re: speedup COPY TO for partitioned table.

On Fri, Mar 7, 2025 at 6:41 PM jian he <jian.universality@gmail.com> wrote:

hi.

rebased and polished patch attached, test case added.

hi.

I realized I need to change the doc/src/sgml/ref/copy.sgml
<title>Notes</title> section.

current doc note section:
COPY TO can be used only with plain tables, not views, and does not
copy rows from child tables or child partitions.
For example, COPY table TO copies the same rows as SELECT * FROM ONLY table.
The syntax COPY (SELECT * FROM table) TO ... can be used to dump all
of the rows in an inheritance hierarchy, partitioned table, or view.

after my change:
------------
COPY TO can be used only with plain tables, not views, and does not
copy rows from child tables,
however COPY TO can be used with partitioned tables.
For example, in a table inheritance hierarchy, COPY table TO copies
the same rows as SELECT * FROM ONLY table.
The syntax COPY (SELECT * FROM table) TO ... can be used to dump all
of the rows in an inheritance hierarchy, or view.
------------

Attachments:

v3-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=US-ASCII; name=v3-0001-support-COPY-partitioned_table-TO.patchDownload+105-11
#9newtglobal postgresql_contributors
postgresql_contributors@newtglobalcorp.com
In reply to: jian he (#8)
Re: speedup COPY TO for partitioned table.

Hi Jian,
Tested this patch with COPY sales TO STDOUT; ~ 1.909ms, improving performance over the older COPY (SELECT * FROM sales) TO STDOUT; ~ 3.80ms method. This eliminates query planning overhead and significantly speeds up data export from partitioned tables.
Our test setup involved creating a partitioned table(sales), inserted 500 records, and comparing execution times.

-- Step 1: Create Partitioned Parent Table
CREATE TABLE sales (
id SERIAL NOT NULL,
sale_date DATE NOT NULL,
region TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL,
category TEXT NOT NULL,
PRIMARY KEY (id, sale_date,region)
) PARTITION BY RANGE (sale_date);

-- Step 2: Create Range Partitions (2023 & 2024)
CREATE TABLE sales_2023 PARTITION OF sales
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01')
PARTITION BY HASH (region);

CREATE TABLE sales_2024 PARTITION OF sales
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
PARTITION BY HASH (region);

-- Step 3: Create Hash Partitions for sales_2023
CREATE TABLE sales_2023_part1 PARTITION OF sales_2023
FOR VALUES WITH (MODULUS 2, REMAINDER 0);

CREATE TABLE sales_2023_part2 PARTITION OF sales_2023
FOR VALUES WITH (MODULUS 2, REMAINDER 1);

-- Step 4: Create Hash Partitions for sales_2024
CREATE TABLE sales_2024_part1 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 2, REMAINDER 0);

CREATE TABLE sales_2024_part2 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 2, REMAINDER 1);

-- Step 5: Insert Data **AFTER** Creating Partitions
INSERT INTO sales (sale_date, region, amount, category)
SELECT
('2023-01-01'::DATE + (random() * 730)::int) AS sale_date, -- Random date in 2023-2024 range
CASE WHEN random() > 0.5 THEN 'North' ELSE 'South' END AS region, -- Random region
(random() * 1000)::NUMERIC(10,2) AS amount, -- Random amount (0 to 1000)
CASE WHEN random() > 0.5 THEN 'Electronics' ELSE 'Furniture' END AS category -- Random category
FROM generate_series(1, 500);

COPY (SELECT * FROM SALES) TO STDOUT; ~ 1.909ms

COPY SALES TO STDOUT; ~ 3.80ms

This change is recommended for better performance in PostgreSQL partitioned tables.

#10vignesh C
vignesh21@gmail.com
In reply to: jian he (#8)
Re: speedup COPY TO for partitioned table.

On Tue, 11 Mar 2025 at 18:24, jian he <jian.universality@gmail.com> wrote:

after my change:
------------
COPY TO can be used only with plain tables, not views, and does not
copy rows from child tables,
however COPY TO can be used with partitioned tables.
For example, in a table inheritance hierarchy, COPY table TO copies
the same rows as SELECT * FROM ONLY table.
The syntax COPY (SELECT * FROM table) TO ... can be used to dump all
of the rows in an inheritance hierarchy, or view.
------------

I find an issue with the patch:

-- Setup
CREATE SERVER myserver FOREIGN DATA WRAPPER postgres_fdw OPTIONS
(dbname 'testdb', port '5432');
CREATE TABLE t1(id int) PARTITION BY RANGE(id);
CREATE TABLE part1 PARTITION OF t1 FOR VALUES FROM (0) TO (5);
CREATE TABLE part2 PARTITION OF t1 FOR VALUES FROM (5) TO (15)
PARTITION BY RANGE(id);
CREATE FOREIGN TABLE part2_1 PARTITION OF part2 FOR VALUES FROM (10)
TO (15) SERVER myserver;

-- Create table in testdb
create table part2_1(id int);

-- Copy partitioned table data
postgres=# copy t1 to stdout(header);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Stack trace for the same is:
#0 table_beginscan (rel=0x72b109f9aad8, snapshot=0x5daafa77e000,
nkeys=0, key=0x0) at ../../../src/include/access/tableam.h:883
#1 0x00005daadf89eb9b in DoCopyTo (cstate=0x5daafa71e278) at copyto.c:1105
#2 0x00005daadf8913f4 in DoCopy (pstate=0x5daafa6c5fc0,
stmt=0x5daafa6f20c8, stmt_location=0, stmt_len=25,
processed=0x7ffd3799c2f0) at copy.c:316
#3 0x00005daadfc7a770 in standard_ProcessUtility
(pstmt=0x5daafa6f21e8, queryString=0x5daafa6f15c0 "copy t1 to
stdout(header);", readOnlyTree=false,
context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, queryEnv=0x0, dest=0x5daafa6f25a8, qc=0x7ffd3799c660)
at utility.c:738

(gdb) f 0
#0 table_beginscan (rel=0x72b109f9aad8, snapshot=0x5daafa77e000,
nkeys=0, key=0x0) at ../../../src/include/access/tableam.h:883
883 return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);

The table access method is not available in this care
(gdb) p *rel->rd_tableam
Cannot access memory at address 0x0

This failure happens when we do table_beginscan on scan part2_1 table

Regards,
Vignesh

#11jian he
jian.universality@gmail.com
In reply to: vignesh C (#10)
Re: speedup COPY TO for partitioned table.

On Fri, Mar 21, 2025 at 6:13 PM vignesh C <vignesh21@gmail.com> wrote:

I find an issue with the patch:

-- Setup
CREATE SERVER myserver FOREIGN DATA WRAPPER postgres_fdw OPTIONS
(dbname 'testdb', port '5432');
CREATE TABLE t1(id int) PARTITION BY RANGE(id);
CREATE TABLE part1 PARTITION OF t1 FOR VALUES FROM (0) TO (5);
CREATE TABLE part2 PARTITION OF t1 FOR VALUES FROM (5) TO (15)
PARTITION BY RANGE(id);
CREATE FOREIGN TABLE part2_1 PARTITION OF part2 FOR VALUES FROM (10)
TO (15) SERVER myserver;

-- Create table in testdb
create table part2_1(id int);

-- Copy partitioned table data
postgres=# copy t1 to stdout(header);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

I manually tested:
sequence, temp table, materialized view, index, view,
composite types, partitioned indexes.
all these above can not attach to partitioned tables.

We should care about the unlogged table, foreign table attached to the
partition.
an unlogged table should work just fine.
we should error out foreign tables.

so:
copy t1 to stdout(header);
ERROR: cannot copy from foreign table "t1"
DETAIL: partition "t1" is a foreign table
HINT: Try the COPY (SELECT ...) TO variant.

Attachments:

v4-0001-support-COPY-partitioned_table-TO.patchapplication/x-patch; name=v4-0001-support-COPY-partitioned_table-TO.patchDownload+114-11
#12jian he
jian.universality@gmail.com
In reply to: jian he (#11)
Re: speedup COPY TO for partitioned table.

hi.

I made a mistake.
The regress test sql file should have a new line at the end of the file.

Attachments:

v5-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=US-ASCII; name=v5-0001-support-COPY-partitioned_table-TO.patchDownload+114-11
#13vignesh C
vignesh21@gmail.com
In reply to: jian he (#12)
Re: speedup COPY TO for partitioned table.

On Fri, 28 Mar 2025 at 08:39, jian he <jian.universality@gmail.com> wrote:

hi.

I made a mistake.
The regress test sql file should have a new line at the end of the file.

Couple of suggestions:
1) Can you add some comments here, this is the only code that is
different from the regular table handling code:
+                       scan_tupdesc = RelationGetDescr(scan_rel);
+                       map = build_attrmap_by_name_if_req(tupDesc,
scan_tupdesc, false);
2) You can see if you can try to make a function add call it from both
the partitioned table and regular table case, that way you could
reduce the duplicate code:
+                       while (table_scan_getnextslot(scandesc,
ForwardScanDirection, slot))
+                       {
+                               CHECK_FOR_INTERRUPTS();
+
+                               /* Deconstruct the tuple ... */
+                               if (map != NULL)
+                               {
+                                       original_slot = slot;
+                                       root_slot =
MakeSingleTupleTableSlot(tupDesc, &TTSOpsBufferHeapTuple);
+                                       slot =
execute_attr_map_slot(map, slot, root_slot);
+                               }
+                               else
+                                       slot_getallattrs(slot);
+
+                               /* Format and send the data */
+                               CopyOneRowTo(cstate, slot);
+
+
pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+
                  ++processed);
+
+                               if (original_slot != NULL)
+
ExecDropSingleTupleTableSlot(original_slot);
+                       };

Regards,
Vignesh

#14jian he
jian.universality@gmail.com
In reply to: vignesh C (#13)
Re: speedup COPY TO for partitioned table.

On Fri, Mar 28, 2025 at 9:03 PM vignesh C <vignesh21@gmail.com> wrote:

On Fri, 28 Mar 2025 at 08:39, jian he <jian.universality@gmail.com> wrote:

hi.

I made a mistake.
The regress test sql file should have a new line at the end of the file.

Couple of suggestions:
1) Can you add some comments here, this is the only code that is
different from the regular table handling code:
+                       scan_tupdesc = RelationGetDescr(scan_rel);
+                       map = build_attrmap_by_name_if_req(tupDesc,
scan_tupdesc, false);

I have added the following comments around build_attrmap_by_name_if_req.

/*
* partition's rowtype might differ from the root table's. We must
* convert it back to the root table's rowtype as we are export
* partitioned table data here.
*/

2) You can see if you can try to make a function add call it from both
the partitioned table and regular table case, that way you could
reduce the duplicate code:
+                       while (table_scan_getnextslot(scandesc,
ForwardScanDirection, slot))
+                       {
+                               CHECK_FOR_INTERRUPTS();
+
+                               /* Deconstruct the tuple ... */
+                               if (map != NULL)
+                               {
+                                       original_slot = slot;
+                                       root_slot =
MakeSingleTupleTableSlot(tupDesc, &TTSOpsBufferHeapTuple);
+                                       slot =
execute_attr_map_slot(map, slot, root_slot);
+                               }
+                               else
+                                       slot_getallattrs(slot);
+
+                               /* Format and send the data */
+                               CopyOneRowTo(cstate, slot);
+
+
pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+
++processed);
+
+                               if (original_slot != NULL)
+
ExecDropSingleTupleTableSlot(original_slot);
+                       };

I consolidated it into a new function: CopyThisRelTo.

Attachments:

v6-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=US-ASCII; name=v6-0001-support-COPY-partitioned_table-TO.patchDownload+139-32
#15vignesh C
vignesh21@gmail.com
In reply to: jian he (#14)
Re: speedup COPY TO for partitioned table.

On Sat, 29 Mar 2025 at 12:08, jian he <jian.universality@gmail.com> wrote:

I consolidated it into a new function: CopyThisRelTo.

Few comments:
1) Here the error message is not correct, we are printing the original
table from where copy was done which is a regular table and not a
foreign table, we should use childreloid instead of rel.

+                               if (relkind == RELKIND_FOREIGN_TABLE)
+                                       ereport(ERROR,
+
errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                                                       errmsg("cannot
copy from foreign table \"%s\"",
+
 RelationGetRelationName(rel)),
+
errdetail("partition \"%s\" is a foreign table",
RelationGetRelationName(rel)),
+                                                       errhint("Try
the COPY (SELECT ...) TO variant."));

In the error detail you can include the original table too.

postgres=# copy t1 to stdout(header);
ERROR: cannot copy from foreign table "t1"
DETAIL: partition "t1" is a foreign table
HINT: Try the COPY (SELECT ...) TO variant.

2) 2.a) I felt the comment should be "then copy partitioned rel to
destionation":
+ * rel: the relation to be copied to.
+ * root_rel: if not null, then the COPY TO partitioned rel.
+ * processed: number of tuple processed.
+*/
+static void
+CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
uint64 *processed)
+{
+       TupleTableSlot *slot;

2.b) you can have processed argument in the next line for better readability

3) There is a small indentation issue here:
+       /*
+        * partition's rowtype might differ from the root table's.  We must
+        * convert it back to the root table's rowtype as we are export
+        * partitioned table data here.
+       */
+       if (root_rel != NULL)

Regards,
Vignesh

#16jian he
jian.universality@gmail.com
In reply to: vignesh C (#15)
Re: speedup COPY TO for partitioned table.

On Sun, Mar 30, 2025 at 9:14 AM vignesh C <vignesh21@gmail.com> wrote:

On Sat, 29 Mar 2025 at 12:08, jian he <jian.universality@gmail.com> wrote:

I consolidated it into a new function: CopyThisRelTo.

Few comments:
1) Here the error message is not correct, we are printing the original
table from where copy was done which is a regular table and not a
foreign table, we should use childreloid instead of rel.

+                               if (relkind == RELKIND_FOREIGN_TABLE)
+                                       ereport(ERROR,
+
errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                                                       errmsg("cannot
copy from foreign table \"%s\"",
+
RelationGetRelationName(rel)),
+
errdetail("partition \"%s\" is a foreign table",
RelationGetRelationName(rel)),
+                                                       errhint("Try
the COPY (SELECT ...) TO variant."));

In the error detail you can include the original table too.

I changed it to:
if (relkind == RELKIND_FOREIGN_TABLE)
{
char *relation_name;

relation_name = get_rel_name(childreloid);
ereport(ERROR,
errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot copy from foreign table
\"%s\"", relation_name),
errdetail("Partition \"%s\" is a foreign
table in the partitioned table \"%s.%s\"",
relation_name,
RelationGetRelationName(rel),

get_namespace_name(rel->rd_rel->relnamespace)),
errhint("Try the COPY (SELECT ...) TO variant."));
}

2) 2.a) I felt the comment should be "then copy partitioned rel to
destionation":
+ * rel: the relation to be copied to.
+ * root_rel: if not null, then the COPY TO partitioned rel.
+ * processed: number of tuple processed.
+*/
+static void
+CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
uint64 *processed)
+{
+       TupleTableSlot *slot;
i changed it to:
+/*
+ * rel: the relation to be copied to.
+ * root_rel: if not null, then the COPY partitioned relation to destination.
+ * processed: number of tuple processed.
+*/
+static void
+CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+              uint64 *processed)
3) There is a small indentation issue here:
+       /*
+        * partition's rowtype might differ from the root table's.  We must
+        * convert it back to the root table's rowtype as we are export
+        * partitioned table data here.
+       */
+       if (root_rel != NULL)

I am not so sure.
can you check if the attached still has the indentation issue.

Attachments:

v7-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=US-ASCII; name=v7-0001-support-COPY-partitioned_table-TO.patchDownload+147-32
#17Kirill Reshke
reshkekirill@gmail.com
In reply to: jian he (#16)
Re: speedup COPY TO for partitioned table.

Hi!
I reviewed v7. Maybe we should add a multi-level partitioning case
into copy2.sql regression test?

I also did quick benchmarking for this patch:

==== DDL

create table ppp(i int) partition by range (i);

genddl.sh:

for i in `seq 0 200`; do echo "create table p$i partition of ppp for
values from ( $((10 * i)) ) to ( $((10 * (i + 1))) ); "; done

=== insert data data:
insert into ppp select i / 1000 from generate_series(0, 2000000)i;

=== results:

for 2000001 rows speedup is 1.40 times : 902.604 ms (patches) vs
1270.648 ms (unpatched)

for 4000002 rows speedup is 1.20 times : 1921.724 ms (patches) vs
2343.393 ms (unpatched)

for 8000004 rows speedup is 1.10 times : 3932.361 ms (patches) vs
4358.489ms (unpatched)

So, this patch indeed speeds up some cases, but with larger tables
speedup becomes negligible.

--
Best regards,
Kirill Reshke

#18jian he
jian.universality@gmail.com
In reply to: Kirill Reshke (#17)
Re: speedup COPY TO for partitioned table.

On Mon, Mar 31, 2025 at 4:05 PM Kirill Reshke <reshkekirill@gmail.com> wrote:

Hi!
I reviewed v7. Maybe we should add a multi-level partitioning case
into copy2.sql regression test?

sure.

I also did quick benchmarking for this patch:

==== DDL

create table ppp(i int) partition by range (i);

genddl.sh:

for i in `seq 0 200`; do echo "create table p$i partition of ppp for
values from ( $((10 * i)) ) to ( $((10 * (i + 1))) ); "; done

=== insert data data:
insert into ppp select i / 1000 from generate_series(0, 2000000)i;

=== results:

for 2000001 rows speedup is 1.40 times : 902.604 ms (patches) vs
1270.648 ms (unpatched)

for 4000002 rows speedup is 1.20 times : 1921.724 ms (patches) vs
2343.393 ms (unpatched)

for 8000004 rows speedup is 1.10 times : 3932.361 ms (patches) vs
4358.489ms (unpatched)

So, this patch indeed speeds up some cases, but with larger tables
speedup becomes negligible.

Thanks for doing the benchmark.

Attachments:

v8-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=US-ASCII; name=v8-0001-support-COPY-partitioned_table-TO.patchDownload+157-32
#19vignesh C
vignesh21@gmail.com
In reply to: jian he (#18)
Re: speedup COPY TO for partitioned table.

On Tue, 1 Apr 2025 at 06:31, jian he <jian.universality@gmail.com> wrote:

On Mon, Mar 31, 2025 at 4:05 PM Kirill Reshke <reshkekirill@gmail.com> wrote:

Thanks for doing the benchmark.

Few comments to improve the comments, error message and remove
redundant assignment:
1) How about we change below:
/*
* partition's rowtype might differ from the root table's. We must
* convert it back to the root table's rowtype as we are export
* partitioned table data here.
*/
To:
/*
* A partition's row type might differ from the root table's.
* Since we're exporting partitioned table data, we must
* convert it back to the root table's row type.
*/

2) How about we change below:
+                               if (relkind == RELKIND_FOREIGN_TABLE)
+                                       ereport(ERROR,
+
errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                                                       errmsg("cannot
copy from foreign table \"%s\"",
+
 RelationGetRelationName(rel)),
+
errdetail("partition \"%s\" is a foreign table",
RelationGetRelationName(rel)),
+                                                       errhint("Try
the COPY (SELECT ...) TO variant."));
To:
+                               if (relkind == RELKIND_FOREIGN_TABLE)
+                                       ereport(ERROR,
+
errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                                                       errmsg("cannot
copy from a partitioned table having foreign table partition \"%s\"",
+
 RelationGetRelationName(rel)),
+
errdetail("partition \"%s\" is a foreign table",
RelationGetRelationName(rel)),
+                                                       errhint("Try
the COPY (SELECT ...) TO variant."));

3) How about we change below:
/*
* rel: the relation to be copied to.
* root_rel: if not NULL, then the COPY partitioned relation to destination.
* processed: number of tuples processed.
*/
To:
/*
* rel: the relation from which data will be copied.
* root_rel: If not NULL, indicates that rel's row type must be
* converted to root_rel's row type.
* processed: number of tuples processed.
*/

 4)  You can initialize processed to 0 along with declaration in
DoCopyTo function and remove the below:
 +       if (cstate->rel && cstate->rel->rd_rel->relkind ==
RELKIND_PARTITIONED_TABLE)
        {
...
...
                processed = 0;
-               while (table_scan_getnextslot(scandesc,
ForwardScanDirection, slot))
...
...
-
-               ExecDropSingleTupleTableSlot(slot);
-               table_endscan(scandesc);
+       }
+       else if (cstate->rel)
+       {
+               processed = 0;
+               CopyThisRelTo(cstate, cstate->rel, NULL, &processed);
        }

Regards,
Vignesh

#20jian he
jian.universality@gmail.com
In reply to: vignesh C (#19)
Re: speedup COPY TO for partitioned table.

On Tue, Apr 1, 2025 at 1:38 PM vignesh C <vignesh21@gmail.com> wrote:

On Tue, 1 Apr 2025 at 06:31, jian he <jian.universality@gmail.com> wrote:

On Mon, Mar 31, 2025 at 4:05 PM Kirill Reshke <reshkekirill@gmail.com> wrote:

Thanks for doing the benchmark.

Few comments to improve the comments, error message and remove
redundant assignment:
1) How about we change below:
/*
* partition's rowtype might differ from the root table's. We must
* convert it back to the root table's rowtype as we are export
* partitioned table data here.
*/
To:
/*
* A partition's row type might differ from the root table's.
* Since we're exporting partitioned table data, we must
* convert it back to the root table's row type.
*/

i changed it to
/*
* A partition's rowtype might differ from the root table's.
* Since we are export partitioned table data here,
* we must convert it back to the root table's rowtype.
*/
Since many places use "rowtype",
using "rowtype" instead of "row type" should be fine.

2) How about we change below:
+                               if (relkind == RELKIND_FOREIGN_TABLE)
+                                       ereport(ERROR,
+
errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                                                       errmsg("cannot
copy from foreign table \"%s\"",
+
RelationGetRelationName(rel)),
+
errdetail("partition \"%s\" is a foreign table",
RelationGetRelationName(rel)),
+                                                       errhint("Try
the COPY (SELECT ...) TO variant."));
To:
+                               if (relkind == RELKIND_FOREIGN_TABLE)
+                                       ereport(ERROR,
+
errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                                                       errmsg("cannot
copy from a partitioned table having foreign table partition \"%s\"",
+
RelationGetRelationName(rel)),
+
errdetail("partition \"%s\" is a foreign table",
RelationGetRelationName(rel)),
+                                                       errhint("Try
the COPY (SELECT ...) TO variant."));

i am not so sure.
since the surrounding code we have

else if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot copy from foreign table \"%s\"",
RelationGetRelationName(rel)),
errhint("Try the COPY (SELECT ...) TO variant.")));

let's see what others think.

3) How about we change below:
/*
* rel: the relation to be copied to.
* root_rel: if not NULL, then the COPY partitioned relation to destination.
* processed: number of tuples processed.
*/
To:
/*
* rel: the relation from which data will be copied.
* root_rel: If not NULL, indicates that rel's row type must be
* converted to root_rel's row type.
* processed: number of tuples processed.
*/

i changed it to

/*
* rel: the relation from which the actual data will be copied.
* root_rel: if not NULL, it indicates that we are copying partitioned relation
* data to the destination, and "rel" is the partition of "root_rel".
* processed: number of tuples processed.
*/

4)  You can initialize processed to 0 along with declaration in
DoCopyTo function and remove the below:
+       if (cstate->rel && cstate->rel->rd_rel->relkind ==
RELKIND_PARTITIONED_TABLE)
{
...
...
processed = 0;
-               while (table_scan_getnextslot(scandesc,
ForwardScanDirection, slot))
...
...
-
-               ExecDropSingleTupleTableSlot(slot);
-               table_endscan(scandesc);
+       }
+       else if (cstate->rel)
+       {
+               processed = 0;
+               CopyThisRelTo(cstate, cstate->rel, NULL, &processed);
}

ok.

Attachments:

v9-0001-support-COPY-partitioned_table-TO.patchapplication/x-patch; name=v9-0001-support-COPY-partitioned_table-TO.patchDownload+156-34
#21Kirill Reshke
reshkekirill@gmail.com
In reply to: jian he (#20)
#22Kirill Reshke
reshkekirill@gmail.com
In reply to: Kirill Reshke (#21)
#23Kirill Reshke
reshkekirill@gmail.com
In reply to: Kirill Reshke (#22)
#24jian he
jian.universality@gmail.com
In reply to: Kirill Reshke (#23)
#25Kirill Reshke
reshkekirill@gmail.com
In reply to: jian he (#24)
#26jian he
jian.universality@gmail.com
In reply to: Kirill Reshke (#25)
#27Kirill Reshke
reshkekirill@gmail.com
In reply to: jian he (#26)
#28jian he
jian.universality@gmail.com
In reply to: Kirill Reshke (#27)
#29torikoshia
torikoshia@oss.nttdata.com
In reply to: jian he (#28)
#30jian he
jian.universality@gmail.com
In reply to: torikoshia (#29)
#31torikoshia
torikoshia@oss.nttdata.com
In reply to: jian he (#30)
#32jian he
jian.universality@gmail.com
In reply to: torikoshia (#31)
#33torikoshia
torikoshia@oss.nttdata.com
In reply to: jian he (#32)
#34Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: jian he (#32)
#35jian he
jian.universality@gmail.com
In reply to: Alvaro Herrera (#34)
#36jian he
jian.universality@gmail.com
In reply to: torikoshia (#33)
#37torikoshia
torikoshia@oss.nttdata.com
In reply to: jian he (#35)
#38jian he
jian.universality@gmail.com
In reply to: torikoshia (#37)
#39torikoshia
torikoshia@oss.nttdata.com
In reply to: jian he (#38)
#40jian he
jian.universality@gmail.com
In reply to: torikoshia (#39)
#41Masahiko Sawada
sawada.mshk@gmail.com
In reply to: jian he (#40)
#42torikoshia
torikoshia@oss.nttdata.com
In reply to: Masahiko Sawada (#41)
#43jian he
jian.universality@gmail.com
In reply to: Masahiko Sawada (#41)
#44Chao Li
li.evan.chao@gmail.com
In reply to: jian he (#43)
#45jian he
jian.universality@gmail.com
In reply to: Chao Li (#44)
#46Masahiko Sawada
sawada.mshk@gmail.com
In reply to: jian he (#43)
#47Chao Li
li.evan.chao@gmail.com
In reply to: jian he (#45)
#48jian he
jian.universality@gmail.com
In reply to: Chao Li (#47)
#49Chao Li
li.evan.chao@gmail.com
In reply to: jian he (#48)
#50jian he
jian.universality@gmail.com
In reply to: Masahiko Sawada (#46)
#51Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: jian he (#50)
#52Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Alvaro Herrera (#51)
#53Masahiko Sawada
sawada.mshk@gmail.com
In reply to: jian he (#50)
#54jian he
jian.universality@gmail.com
In reply to: Masahiko Sawada (#53)
#55Masahiko Sawada
sawada.mshk@gmail.com
In reply to: jian he (#54)
#56jian he
jian.universality@gmail.com
In reply to: Masahiko Sawada (#55)
#57Masahiko Sawada
sawada.mshk@gmail.com
In reply to: jian he (#56)
#58Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#57)