speedup COPY TO for partitioned table.
hi.
COPY (select_query) generally slower than
table_beginscan.. table_scan_getnextslot ..table_endscan,
especially for partitioned tables.
so in the function DoCopyTo
trying to use table_beginscan.. table_scan_getnextslot ..table_endscan
for COPY TO when source table is a partitioned table.
----setup-----
CREATE TABLE t3 (a INT, b int ) PARTITION BY RANGE (a);
create table t3_1 partition of t3 for values from (1) to (11);
create table t3_2 partition of t3 for values from (11) to (15);
insert into t3 select g from generate_series(1, 3) g;
insert into t3 select g from generate_series(11, 11) g;
so now you can do:
copy t3 to stdout;
in the master, you will get:
ERROR: cannot copy from partitioned table "t3"
HINT: Try the COPY (SELECT ...) TO variant.
attached copy_par_regress_test.sql is a simple benchmark sql file,
a partitioned table with 10 partitions, 2 levels of indirection.
The simple benchmark shows around 7.7% improvement in my local environment.
local environment:
PostgreSQL 18devel_debug_build_382092a0cd on x86_64-linux, compiled by
gcc-14.1.0, 64-bit
Attachments:
v1-0001-speedup-COPY-TO-for-partitioned-table.patchtext/x-patch; charset=US-ASCII; name=v1-0001-speedup-COPY-TO-for-partitioned-table.patchDownload
From ba2307cca8bd1d53e0febddaf11c932dae5d31e0 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 19 Dec 2024 19:48:12 +0800
Subject: [PATCH v1 1/1] speedup COPY TO for partitioned table.
COPY (select_query) generally slower than
table_beginscan.. table_scan_getnextslot ..table_endscan.
especially for partitioned table.
so using table_beginscan.. table_scan_getnextslot ..table_endscan
for COPY TO when source table is a partitioned table.
enviroment:
PostgreSQL 18devel_debug_build_382092a0cd on x86_64-linux, compiled by gcc-14.1.0, 64-bit
shows around 7.7% improvement.
---
src/backend/commands/copyto.c | 69 +++++++++++++++++++++++++++++++----
1 file changed, 61 insertions(+), 8 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 161a0f8b0a..42dba2d4a8 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,6 +19,8 @@
#include <sys/stat.h>
#include "access/tableam.h"
+#include "access/table.h"
+#include "catalog/pg_inherits.h"
#include "commands/copy.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -79,6 +81,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ bool is_partitioned; /* is the COPY source relation a partitioned table? */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -396,13 +399,7 @@ BeginCopyTo(ParseState *pstate,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
- else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
- else
+ else if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot copy from non-table relation \"%s\"",
@@ -426,6 +423,7 @@ BeginCopyTo(ParseState *pstate,
/* Extract options from the statement node tree */
ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
+ cstate->is_partitioned = false;
/* Process the source/target relation or query */
if (rel)
{
@@ -433,6 +431,8 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
+ if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ cstate->is_partitioned = true;
tupDesc = RelationGetDescr(cstate->rel);
}
else
@@ -847,7 +847,60 @@ DoCopyTo(CopyToState cstate)
}
}
- if (cstate->rel)
+ /*
+ * if COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel && cstate->is_partitioned)
+ {
+ List *children = NIL;
+ List *scan_oids = NIL;
+
+ processed = 0;
+ children = find_all_inheritors(RelationGetRelid(cstate->rel),
+ AccessShareLock,
+ NULL);
+
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
+
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ continue;
+
+ scan_oids = lappend_oid(scan_oids, childreloid);
+ }
+
+ foreach_oid(scan_oid, scan_oids)
+ {
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ Relation scan_rel;
+
+ scan_rel = table_open(scan_oid, AccessShareLock);
+ scandesc = table_beginscan(scan_rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(scan_rel, NULL);
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ /* Deconstruct the tuple ... */
+ slot_getallattrs(slot);
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, slot);
+
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++processed);
+ }
+
+ ExecDropSingleTupleTableSlot(slot);
+ table_endscan(scandesc);
+ table_close(scan_rel, AccessShareLock);
+ }
+ }
+ else if (cstate->rel && !cstate->is_partitioned)
{
TupleTableSlot *slot;
TableScanDesc scandesc;
--
2.34.1
Hi Jian,
Thanks for the patch.
jian he <jian.universality@gmail.com>, 19 Ara 2024 Per, 15:03 tarihinde
şunu yazdı:
attached copy_par_regress_test.sql is a simple benchmark sql file,
a partitioned table with 10 partitions, 2 levels of indirection.
The simple benchmark shows around 7.7% improvement in my local environment.
I confirm that the patch introduces some improvement in simple cases like
the one you shared. I looked around a bit to understand whether there is an
obvious reason why copying from a partitioned table is not allowed, but
couldn't find one. It seems ok to me.
I realized that while both "COPY <partitioned_table> TO..." and "COPY
(SELECT..) TO..." can return the same set of rows, their orders may not be
the same. I guess that it's hard to guess in which
order find_all_inheritors() would return tables, and that might be
something we should be worried about with the patch. What do you think?
Thanks,
--
Melih Mutlu
Microsoft
On Wed, Jan 22, 2025 at 6:54 AM Melih Mutlu <m.melihmutlu@gmail.com> wrote:
Hi Jian,
Thanks for the patch.
jian he <jian.universality@gmail.com>, 19 Ara 2024 Per, 15:03 tarihinde şunu yazdı:
attached copy_par_regress_test.sql is a simple benchmark sql file,
a partitioned table with 10 partitions, 2 levels of indirection.
The simple benchmark shows around 7.7% improvement in my local environment.I confirm that the patch introduces some improvement in simple cases like the one you shared. I looked around a bit to understand whether there is an obvious reason why copying from a partitioned table is not allowed, but couldn't find one. It seems ok to me.
hi. melih mutlu
thanks for confirmation.
I realized that while both "COPY <partitioned_table> TO..." and "COPY (SELECT..) TO..." can return the same set of rows, their orders may not be the same. I guess that it's hard to guess in which order find_all_inheritors() would return tables, and that might be something we should be worried about with the patch. What do you think?
in the
find_all_inheritors->find_inheritance_children->find_inheritance_children_extended
find_inheritance_children_extended we have
"""
if (numoids > 1)
qsort(oidarr, numoids, sizeof(Oid), oid_cmp);
"""
so the find_all_inheritors output order is deterministic?
On Wed, Jan 22, 2025 at 01:54:32AM +0300, Melih Mutlu wrote:
I confirm that the patch introduces some improvement in simple cases like
the one you shared. I looked around a bit to understand whether there is an
obvious reason why copying from a partitioned table is not allowed, but
couldn't find one. It seems ok to me.
From the original discussion [0]/messages/by-id/CAA4eK1LqTqZkPSoonF5_cOz94OUZG9j0PNfLdhi_nPtW82fFVA@mail.gmail.com, it seems like it was considered a
nonessential part of an otherwise massive patch set. Perhaps it's time to
revisit it.
[0]: /messages/by-id/CAA4eK1LqTqZkPSoonF5_cOz94OUZG9j0PNfLdhi_nPtW82fFVA@mail.gmail.com
--
nathan
Hi,
jian he <jian.universality@gmail.com>, 27 Oca 2025 Pzt, 04:47 tarihinde
şunu yazdı:
in the
find_all_inheritors->find_inheritance_children->find_inheritance_children_extended
find_inheritance_children_extended we have
"""
if (numoids > 1)
qsort(oidarr, numoids, sizeof(Oid), oid_cmp);
"""so the find_all_inheritors output order is deterministic?
You're right that order in find_all_inheritors is deterministic. But it's
not always the same with the order of SELECT output. You can quickly see
what I mean by running a slightly modified version of the example that you
shared in your first email:
CREATE TABLE t3 (a INT, b int ) PARTITION BY RANGE (a);
-- change the order. first create t3_2 then t3_1
create table t3_2 partition of t3 for values from (11) to (15);
create table t3_1 partition of t3 for values from (1) to (11);
insert into t3 select g from generate_series(1, 3) g;
insert into t3 select g from generate_series(11, 11) g;
And the results of the two different COPY approaches would be:
postgres=# COPY t3 TO STDOUT;
11 \N
1 \N
2 \N
3 \N
postgres=# COPY (SELECT * FROM t3) TO STDOUT;
1 \N
2 \N
3 \N
11 \N
Notice that "COPY t3 TO STDOUT" changes the order since the partition t3_2
has been created first, hence it has a smaller OID. On the other hand,
SELECT sorts the partitions based on partition boundaries, not OIDs. That's
why we should always see the same order regardless of the OIDs of
partitions (you can see create_range_bounds() in partbounds.c if interested
in more details). One thing that might be useful in the COPY case would be
using a partition descriptor to access the correct order of partitions. I
believe something like (PartitionDesc) partdesc->oid should give us the
partition OIDs in order.
Thanks,
--
Melih Mutlu
Microsoft
On Tue, 11 Feb 2025 at 08:10, Melih Mutlu <m.melihmutlu@gmail.com> wrote:
jian he <jian.universality@gmail.com>, 27 Oca 2025 Pzt, 04:47 tarihinde şunu yazdı:
so the find_all_inheritors output order is deterministic?
You're right that order in find_all_inheritors is deterministic. But it's not always the same with the order of SELECT output. You can quickly see what I mean by running a slightly modified version of the example that you shared in your first email:
I think it's fine to raise the question as to whether the order
changing matters, however, I don't personally think there should be
any concerns with this. The main reason I think this is that the
command isn't the same, so the user shouldn't expect the same
behaviour. They'll need to adjust their commands to get the new
behaviour and possibly a different order.
Another few reasons are:
1) In the subquery version, the Append children are sorted by cost, so
the order isn't that predictable in the first place. (See
create_append_path() -> list_sort(subpaths,
append_total_cost_compare))
2) The order tuples are copied with COPY TO on non-partitioned tables
isn't that well defined in the first place. Two reasons for this, a)
the heap is a heap and has no defined order; and b) sync scans might
be used and the scan might start at any point in the heap and circle
back around again to the page prior to the page where the scan
started. See (table_beginscan() adds SO_ALLOW_SYNC to the flags).
I think the main thing to be concerned about regarding order is to
ensure that all rows from the same partition are copied consecutively,
and that does not seem to be at risk of changing here. This is
important as 3592e0ff9 added caching of the last found partition when
partition lookups continually find the same partition.
David
hi.
rebased and polished patch attached, test case added.
However there is a case (the following) where
``COPY(partitioned_table)`` is much slower
(around 25% in some cases) than ``COPY (select * from partitioned_table)``.
If the partition attribute order is not the same as the partitioned table,
then for each output row, we need to create a template TupleTableSlot
and call execute_attr_map_slot,
i didn't find a work around to reduce the inefficiency.
Since the master doesn't have ``COPY(partitioned_table)``,
I guess this slowness case is allowed?
----------- the follow case is far slower than ``COPY(select * From pp) TO ``
drop table if exists pp;
CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
create table pp_1 (val int, id int);
create table pp_2 (val int, id int);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
insert into pp select g, 10 + g from generate_series(1,9) g;
copy pp to stdout(header);
Attachments:
v2-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=US-ASCII; name=v2-0001-support-COPY-partitioned_table-TO.patchDownload
From eaf3869c4fb5fdacba5efd562f73ca06a0251ac4 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 7 Mar 2025 18:39:56 +0800
Subject: [PATCH v2 1/1] support "COPY partitioned_table TO"
drop table if exists pp;
CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
create table pp_1 (val int, id int);
create table pp_2 (val int, id int);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
insert into pp select g, 10 + g from generate_series(1,9) g;
copy pp to stdout(header);
the above case is much slower (around 25% some case) than
``COPY (select * from pp) to stdout(header);``
but this is still a new feature, since master does not
support ``COPY (partitioned_table)``.
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
---
src/backend/commands/copyto.c | 80 ++++++++++++++++++++++++++---
src/test/regress/expected/copy2.out | 16 ++++++
src/test/regress/sql/copy2.sql | 11 ++++
3 files changed, 101 insertions(+), 6 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 84a3f3879a8..966b6741530 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,6 +19,8 @@
#include <sys/stat.h>
#include "access/tableam.h"
+#include "access/table.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ List *partitions; /* oid list of partition oid for copy to */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -643,6 +646,8 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
+ List *scan_oids = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -670,11 +675,19 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ children = find_all_inheritors(RelationGetRelid(rel),
+ AccessShareLock,
+ NULL);
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ continue;
+
+ scan_oids = lappend_oid(scan_oids, childreloid);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -710,6 +723,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = list_copy(scan_oids);
}
else
{
@@ -1066,7 +1080,61 @@ DoCopyTo(CopyToState cstate)
cstate->routine->CopyToStart(cstate, tupDesc);
- if (cstate->rel)
+ /*
+ * if COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ {
+ processed = 0;
+
+ foreach_oid(scan_oid, cstate->partitions)
+ {
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ Relation scan_rel;
+ TupleDesc scan_tupdesc;
+ AttrMap *map;
+ TupleTableSlot *root_slot = NULL;
+ TupleTableSlot *original_slot = NULL;
+
+ scan_rel = table_open(scan_oid, AccessShareLock);
+ scan_tupdesc = RelationGetDescr(scan_rel);
+ map = build_attrmap_by_name_if_req(tupDesc, scan_tupdesc, false);
+
+ scandesc = table_beginscan(scan_rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(scan_rel, NULL);
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ /* Deconstruct the tuple ... */
+ if (map != NULL)
+ {
+ original_slot = slot;
+ root_slot = MakeSingleTupleTableSlot(tupDesc, &TTSOpsBufferHeapTuple);
+ slot = execute_attr_map_slot(map, slot, root_slot);
+ }
+ else
+ slot_getallattrs(slot);
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, slot);
+
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++processed);
+
+ if (original_slot != NULL)
+ ExecDropSingleTupleTableSlot(original_slot);
+ };
+
+ ExecDropSingleTupleTableSlot(slot);
+ table_endscan(scandesc);
+ table_close(scan_rel, AccessShareLock);
+ }
+ }
+ else if (cstate->rel)
{
TupleTableSlot *slot;
TableScanDesc scandesc;
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..dcd97ae45b7 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -929,3 +929,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+-- COPY TO with partitioned table
+CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
+create table pp_1 (val int, id int);
+create table pp_2 (val int, id int);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (3);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (3) TO (7);
+insert into pp select g, 10 + g from generate_series(1,6) g;
+copy pp to stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..56d7c1ffc8f 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -707,3 +707,14 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+-- COPY TO with partitioned table
+CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
+create table pp_1 (val int, id int);
+create table pp_2 (val int, id int);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (3);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (3) TO (7);
+insert into pp select g, 10 + g from generate_series(1,6) g;
+copy pp to stdout(header);
+
+DROP TABLE PP;
\ No newline at end of file
--
2.34.1
On Fri, Mar 7, 2025 at 6:41 PM jian he <jian.universality@gmail.com> wrote:
hi.
rebased and polished patch attached, test case added.
hi.
I realized I need to change the doc/src/sgml/ref/copy.sgml
<title>Notes</title> section.
current doc note section:
COPY TO can be used only with plain tables, not views, and does not
copy rows from child tables or child partitions.
For example, COPY table TO copies the same rows as SELECT * FROM ONLY table.
The syntax COPY (SELECT * FROM table) TO ... can be used to dump all
of the rows in an inheritance hierarchy, partitioned table, or view.
after my change:
------------
COPY TO can be used only with plain tables, not views, and does not
copy rows from child tables,
however COPY TO can be used with partitioned tables.
For example, in a table inheritance hierarchy, COPY table TO copies
the same rows as SELECT * FROM ONLY table.
The syntax COPY (SELECT * FROM table) TO ... can be used to dump all
of the rows in an inheritance hierarchy, or view.
------------
Attachments:
v3-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=US-ASCII; name=v3-0001-support-COPY-partitioned_table-TO.patchDownload
From f7376da47f51e385c5496b0cf7eb52e5340a39b9 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Tue, 11 Mar 2025 20:51:30 +0800
Subject: [PATCH v3 1/1] support "COPY partitioned_table TO"
drop table if exists pp;
CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
create table pp_1 (val int, id int);
create table pp_2 (val int, id int);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
insert into pp select g, 10 + g from generate_series(1,9) g;
copy pp to stdout(header);
the above case is much slower (around 25% some case) than
``COPY (select * from pp) to stdout(header);``
but this is still a new feature, since master does not
support ``COPY (partitioned_table)``.
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 8 +--
src/backend/commands/copyto.c | 80 ++++++++++++++++++++++++++---
src/test/regress/expected/copy2.out | 16 ++++++
src/test/regress/sql/copy2.sql | 11 ++++
4 files changed, 105 insertions(+), 10 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..f86e0b7ec35 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -521,15 +521,15 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used only with plain
- tables, not views, and does not copy rows from child tables
- or child partitions. For example, <literal>COPY <replaceable
+ tables, not views, and does not copy rows from child tables,
+ however <command>COPY TO</command> can be used with partitioned tables.
+ For example, in a table inheritance hierarchy, <literal>COPY <replaceable
class="parameter">table</replaceable> TO</literal> copies
the same rows as <literal>SELECT * FROM ONLY <replaceable
class="parameter">table</replaceable></literal>.
The syntax <literal>COPY (SELECT * FROM <replaceable
class="parameter">table</replaceable>) TO ...</literal> can be used to
- dump all of the rows in an inheritance hierarchy, partitioned table,
- or view.
+ dump all of the rows in an inheritance hierarchy, or view.
</para>
<para>
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 84a3f3879a8..966b6741530 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,6 +19,8 @@
#include <sys/stat.h>
#include "access/tableam.h"
+#include "access/table.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ List *partitions; /* oid list of partition oid for copy to */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -643,6 +646,8 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
+ List *scan_oids = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -670,11 +675,19 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ children = find_all_inheritors(RelationGetRelid(rel),
+ AccessShareLock,
+ NULL);
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ continue;
+
+ scan_oids = lappend_oid(scan_oids, childreloid);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -710,6 +723,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = list_copy(scan_oids);
}
else
{
@@ -1066,7 +1080,61 @@ DoCopyTo(CopyToState cstate)
cstate->routine->CopyToStart(cstate, tupDesc);
- if (cstate->rel)
+ /*
+ * if COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ {
+ processed = 0;
+
+ foreach_oid(scan_oid, cstate->partitions)
+ {
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ Relation scan_rel;
+ TupleDesc scan_tupdesc;
+ AttrMap *map;
+ TupleTableSlot *root_slot = NULL;
+ TupleTableSlot *original_slot = NULL;
+
+ scan_rel = table_open(scan_oid, AccessShareLock);
+ scan_tupdesc = RelationGetDescr(scan_rel);
+ map = build_attrmap_by_name_if_req(tupDesc, scan_tupdesc, false);
+
+ scandesc = table_beginscan(scan_rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(scan_rel, NULL);
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ /* Deconstruct the tuple ... */
+ if (map != NULL)
+ {
+ original_slot = slot;
+ root_slot = MakeSingleTupleTableSlot(tupDesc, &TTSOpsBufferHeapTuple);
+ slot = execute_attr_map_slot(map, slot, root_slot);
+ }
+ else
+ slot_getallattrs(slot);
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, slot);
+
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++processed);
+
+ if (original_slot != NULL)
+ ExecDropSingleTupleTableSlot(original_slot);
+ };
+
+ ExecDropSingleTupleTableSlot(slot);
+ table_endscan(scandesc);
+ table_close(scan_rel, AccessShareLock);
+ }
+ }
+ else if (cstate->rel)
{
TupleTableSlot *slot;
TableScanDesc scandesc;
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..dcd97ae45b7 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -929,3 +929,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+-- COPY TO with partitioned table
+CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
+create table pp_1 (val int, id int);
+create table pp_2 (val int, id int);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (3);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (3) TO (7);
+insert into pp select g, 10 + g from generate_series(1,6) g;
+copy pp to stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..56d7c1ffc8f 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -707,3 +707,14 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+-- COPY TO with partitioned table
+CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
+create table pp_1 (val int, id int);
+create table pp_2 (val int, id int);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (3);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (3) TO (7);
+insert into pp select g, 10 + g from generate_series(1,6) g;
+copy pp to stdout(header);
+
+DROP TABLE PP;
\ No newline at end of file
--
2.34.1
Hi Jian,
Tested this patch with COPY sales TO STDOUT; ~ 1.909ms, improving performance over the older COPY (SELECT * FROM sales) TO STDOUT; ~ 3.80ms method. This eliminates query planning overhead and significantly speeds up data export from partitioned tables.
Our test setup involved creating a partitioned table(sales), inserted 500 records, and comparing execution times.
-- Step 1: Create Partitioned Parent Table
CREATE TABLE sales (
id SERIAL NOT NULL,
sale_date DATE NOT NULL,
region TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL,
category TEXT NOT NULL,
PRIMARY KEY (id, sale_date,region)
) PARTITION BY RANGE (sale_date);
-- Step 2: Create Range Partitions (2023 & 2024)
CREATE TABLE sales_2023 PARTITION OF sales
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01')
PARTITION BY HASH (region);
CREATE TABLE sales_2024 PARTITION OF sales
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
PARTITION BY HASH (region);
-- Step 3: Create Hash Partitions for sales_2023
CREATE TABLE sales_2023_part1 PARTITION OF sales_2023
FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE sales_2023_part2 PARTITION OF sales_2023
FOR VALUES WITH (MODULUS 2, REMAINDER 1);
-- Step 4: Create Hash Partitions for sales_2024
CREATE TABLE sales_2024_part1 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE sales_2024_part2 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 2, REMAINDER 1);
-- Step 5: Insert Data **AFTER** Creating Partitions
INSERT INTO sales (sale_date, region, amount, category)
SELECT
('2023-01-01'::DATE + (random() * 730)::int) AS sale_date, -- Random date in 2023-2024 range
CASE WHEN random() > 0.5 THEN 'North' ELSE 'South' END AS region, -- Random region
(random() * 1000)::NUMERIC(10,2) AS amount, -- Random amount (0 to 1000)
CASE WHEN random() > 0.5 THEN 'Electronics' ELSE 'Furniture' END AS category -- Random category
FROM generate_series(1, 500);
COPY (SELECT * FROM SALES) TO STDOUT; ~ 1.909ms
COPY SALES TO STDOUT; ~ 3.80ms
This change is recommended for better performance in PostgreSQL partitioned tables.
On Tue, 11 Mar 2025 at 18:24, jian he <jian.universality@gmail.com> wrote:
after my change:
------------
COPY TO can be used only with plain tables, not views, and does not
copy rows from child tables,
however COPY TO can be used with partitioned tables.
For example, in a table inheritance hierarchy, COPY table TO copies
the same rows as SELECT * FROM ONLY table.
The syntax COPY (SELECT * FROM table) TO ... can be used to dump all
of the rows in an inheritance hierarchy, or view.
------------
I find an issue with the patch:
-- Setup
CREATE SERVER myserver FOREIGN DATA WRAPPER postgres_fdw OPTIONS
(dbname 'testdb', port '5432');
CREATE TABLE t1(id int) PARTITION BY RANGE(id);
CREATE TABLE part1 PARTITION OF t1 FOR VALUES FROM (0) TO (5);
CREATE TABLE part2 PARTITION OF t1 FOR VALUES FROM (5) TO (15)
PARTITION BY RANGE(id);
CREATE FOREIGN TABLE part2_1 PARTITION OF part2 FOR VALUES FROM (10)
TO (15) SERVER myserver;
-- Create table in testdb
create table part2_1(id int);
-- Copy partitioned table data
postgres=# copy t1 to stdout(header);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Stack trace for the same is:
#0 table_beginscan (rel=0x72b109f9aad8, snapshot=0x5daafa77e000,
nkeys=0, key=0x0) at ../../../src/include/access/tableam.h:883
#1 0x00005daadf89eb9b in DoCopyTo (cstate=0x5daafa71e278) at copyto.c:1105
#2 0x00005daadf8913f4 in DoCopy (pstate=0x5daafa6c5fc0,
stmt=0x5daafa6f20c8, stmt_location=0, stmt_len=25,
processed=0x7ffd3799c2f0) at copy.c:316
#3 0x00005daadfc7a770 in standard_ProcessUtility
(pstmt=0x5daafa6f21e8, queryString=0x5daafa6f15c0 "copy t1 to
stdout(header);", readOnlyTree=false,
context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, queryEnv=0x0, dest=0x5daafa6f25a8, qc=0x7ffd3799c660)
at utility.c:738
(gdb) f 0
#0 table_beginscan (rel=0x72b109f9aad8, snapshot=0x5daafa77e000,
nkeys=0, key=0x0) at ../../../src/include/access/tableam.h:883
883 return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
The table access method is not available in this care
(gdb) p *rel->rd_tableam
Cannot access memory at address 0x0
This failure happens when we do table_beginscan on scan part2_1 table
Regards,
Vignesh
On Fri, Mar 21, 2025 at 6:13 PM vignesh C <vignesh21@gmail.com> wrote:
I find an issue with the patch:
-- Setup
CREATE SERVER myserver FOREIGN DATA WRAPPER postgres_fdw OPTIONS
(dbname 'testdb', port '5432');
CREATE TABLE t1(id int) PARTITION BY RANGE(id);
CREATE TABLE part1 PARTITION OF t1 FOR VALUES FROM (0) TO (5);
CREATE TABLE part2 PARTITION OF t1 FOR VALUES FROM (5) TO (15)
PARTITION BY RANGE(id);
CREATE FOREIGN TABLE part2_1 PARTITION OF part2 FOR VALUES FROM (10)
TO (15) SERVER myserver;-- Create table in testdb
create table part2_1(id int);-- Copy partitioned table data
postgres=# copy t1 to stdout(header);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
I manually tested:
sequence, temp table, materialized view, index, view,
composite types, partitioned indexes.
all these above can not attach to partitioned tables.
We should care about the unlogged table, foreign table attached to the
partition.
an unlogged table should work just fine.
we should error out foreign tables.
so:
copy t1 to stdout(header);
ERROR: cannot copy from foreign table "t1"
DETAIL: partition "t1" is a foreign table
HINT: Try the COPY (SELECT ...) TO variant.
Attachments:
v4-0001-support-COPY-partitioned_table-TO.patchapplication/x-patch; name=v4-0001-support-COPY-partitioned_table-TO.patchDownload
From a2db87abfe0e1a4dda0ace47c65a9778f29fe5f2 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 28 Mar 2025 11:01:52 +0800
Subject: [PATCH v4 1/1] support COPY partitioned_table TO
CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
create table pp_1 (val int, id int);
create table pp_2 (val int, id int);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
insert into pp select g, 10 + g from generate_series(1,9) g;
copy pp to stdout(header);
the above case is much slower (around 25% some case) than
``COPY (select * from pp) to stdout(header);``,
because of column remaping. but this is still a new
feature, since master does not support ``COPY (partitioned_table)``.
reivewed by: vignesh C <vignesh21@gmail.com>
reivewed by: David Rowley <dgrowleyml@gmail.com>
reivewed by: Melih Mutlu <m.melihmutlu@gmail.com>
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5467/
---
doc/src/sgml/ref/copy.sgml | 8 +--
src/backend/commands/copyto.c | 89 +++++++++++++++++++++++++++--
src/test/regress/expected/copy2.out | 16 ++++++
src/test/regress/sql/copy2.sql | 11 ++++
4 files changed, 114 insertions(+), 10 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..f86e0b7ec35 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -521,15 +521,15 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used only with plain
- tables, not views, and does not copy rows from child tables
- or child partitions. For example, <literal>COPY <replaceable
+ tables, not views, and does not copy rows from child tables,
+ however <command>COPY TO</command> can be used with partitioned tables.
+ For example, in a table inheritance hierarchy, <literal>COPY <replaceable
class="parameter">table</replaceable> TO</literal> copies
the same rows as <literal>SELECT * FROM ONLY <replaceable
class="parameter">table</replaceable></literal>.
The syntax <literal>COPY (SELECT * FROM <replaceable
class="parameter">table</replaceable>) TO ...</literal> can be used to
- dump all of the rows in an inheritance hierarchy, partitioned table,
- or view.
+ dump all of the rows in an inheritance hierarchy, or view.
</para>
<para>
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 84a3f3879a8..0973dc9c14b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,6 +19,8 @@
#include <sys/stat.h>
#include "access/tableam.h"
+#include "access/table.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ List *partitions; /* oid list of partition oid for copy to */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -643,6 +646,8 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
+ List *scan_oids = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -670,11 +675,28 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ children = find_all_inheritors(RelationGetRelid(rel),
+ AccessShareLock,
+ NULL);
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"",
+ RelationGetRelationName(rel)),
+ errdetail("partition \"%s\" is a foreign table", RelationGetRelationName(rel)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ continue;
+
+ scan_oids = lappend_oid(scan_oids, childreloid);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -710,6 +732,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = list_copy(scan_oids);
}
else
{
@@ -1066,7 +1089,61 @@ DoCopyTo(CopyToState cstate)
cstate->routine->CopyToStart(cstate, tupDesc);
- if (cstate->rel)
+ /*
+ * if COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ {
+ processed = 0;
+
+ foreach_oid(scan_oid, cstate->partitions)
+ {
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ Relation scan_rel;
+ TupleDesc scan_tupdesc;
+ AttrMap *map;
+ TupleTableSlot *root_slot = NULL;
+ TupleTableSlot *original_slot = NULL;
+
+ scan_rel = table_open(scan_oid, AccessShareLock);
+ scan_tupdesc = RelationGetDescr(scan_rel);
+ map = build_attrmap_by_name_if_req(tupDesc, scan_tupdesc, false);
+
+ scandesc = table_beginscan(scan_rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(scan_rel, NULL);
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ /* Deconstruct the tuple ... */
+ if (map != NULL)
+ {
+ original_slot = slot;
+ root_slot = MakeSingleTupleTableSlot(tupDesc, &TTSOpsBufferHeapTuple);
+ slot = execute_attr_map_slot(map, slot, root_slot);
+ }
+ else
+ slot_getallattrs(slot);
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, slot);
+
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++processed);
+
+ if (original_slot != NULL)
+ ExecDropSingleTupleTableSlot(original_slot);
+ };
+
+ ExecDropSingleTupleTableSlot(slot);
+ table_endscan(scandesc);
+ table_close(scan_rel, AccessShareLock);
+ }
+ }
+ else if (cstate->rel)
{
TupleTableSlot *slot;
TableScanDesc scandesc;
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..dcd97ae45b7 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -929,3 +929,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+-- COPY TO with partitioned table
+CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
+create table pp_1 (val int, id int);
+create table pp_2 (val int, id int);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (3);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (3) TO (7);
+insert into pp select g, 10 + g from generate_series(1,6) g;
+copy pp to stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..56d7c1ffc8f 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -707,3 +707,14 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+-- COPY TO with partitioned table
+CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
+create table pp_1 (val int, id int);
+create table pp_2 (val int, id int);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (3);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (3) TO (7);
+insert into pp select g, 10 + g from generate_series(1,6) g;
+copy pp to stdout(header);
+
+DROP TABLE PP;
\ No newline at end of file
--
2.34.1
hi.
I made a mistake.
The regress test sql file should have a new line at the end of the file.
Attachments:
v5-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=US-ASCII; name=v5-0001-support-COPY-partitioned_table-TO.patchDownload
From a4c643ac3a9f40bbdf07dcacc38527ef6e86f1bc Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 28 Mar 2025 11:05:53 +0800
Subject: [PATCH v5 1/1] support COPY partitioned_table TO
CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
create table pp_1 (val int, id int);
create table pp_2 (val int, id int);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
insert into pp select g, 10 + g from generate_series(1,9) g;
copy pp to stdout(header);
the above case is much slower (around 25% some case) than
``COPY (select * from pp) to stdout(header);``,
because of column remaping. but this is still a new
feature, since master does not support ``COPY (partitioned_table)``.
reivewed by: vignesh C <vignesh21@gmail.com>
reivewed by: David Rowley <dgrowleyml@gmail.com>
reivewed by: Melih Mutlu <m.melihmutlu@gmail.com>
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5467/
---
doc/src/sgml/ref/copy.sgml | 8 +--
src/backend/commands/copyto.c | 89 +++++++++++++++++++++++++++--
src/test/regress/expected/copy2.out | 16 ++++++
src/test/regress/sql/copy2.sql | 11 ++++
4 files changed, 114 insertions(+), 10 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..f86e0b7ec35 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -521,15 +521,15 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used only with plain
- tables, not views, and does not copy rows from child tables
- or child partitions. For example, <literal>COPY <replaceable
+ tables, not views, and does not copy rows from child tables,
+ however <command>COPY TO</command> can be used with partitioned tables.
+ For example, in a table inheritance hierarchy, <literal>COPY <replaceable
class="parameter">table</replaceable> TO</literal> copies
the same rows as <literal>SELECT * FROM ONLY <replaceable
class="parameter">table</replaceable></literal>.
The syntax <literal>COPY (SELECT * FROM <replaceable
class="parameter">table</replaceable>) TO ...</literal> can be used to
- dump all of the rows in an inheritance hierarchy, partitioned table,
- or view.
+ dump all of the rows in an inheritance hierarchy, or view.
</para>
<para>
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 84a3f3879a8..0973dc9c14b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,6 +19,8 @@
#include <sys/stat.h>
#include "access/tableam.h"
+#include "access/table.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ List *partitions; /* oid list of partition oid for copy to */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -643,6 +646,8 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
+ List *scan_oids = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -670,11 +675,28 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ children = find_all_inheritors(RelationGetRelid(rel),
+ AccessShareLock,
+ NULL);
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"",
+ RelationGetRelationName(rel)),
+ errdetail("partition \"%s\" is a foreign table", RelationGetRelationName(rel)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ continue;
+
+ scan_oids = lappend_oid(scan_oids, childreloid);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -710,6 +732,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = list_copy(scan_oids);
}
else
{
@@ -1066,7 +1089,61 @@ DoCopyTo(CopyToState cstate)
cstate->routine->CopyToStart(cstate, tupDesc);
- if (cstate->rel)
+ /*
+ * if COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ {
+ processed = 0;
+
+ foreach_oid(scan_oid, cstate->partitions)
+ {
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ Relation scan_rel;
+ TupleDesc scan_tupdesc;
+ AttrMap *map;
+ TupleTableSlot *root_slot = NULL;
+ TupleTableSlot *original_slot = NULL;
+
+ scan_rel = table_open(scan_oid, AccessShareLock);
+ scan_tupdesc = RelationGetDescr(scan_rel);
+ map = build_attrmap_by_name_if_req(tupDesc, scan_tupdesc, false);
+
+ scandesc = table_beginscan(scan_rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(scan_rel, NULL);
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ /* Deconstruct the tuple ... */
+ if (map != NULL)
+ {
+ original_slot = slot;
+ root_slot = MakeSingleTupleTableSlot(tupDesc, &TTSOpsBufferHeapTuple);
+ slot = execute_attr_map_slot(map, slot, root_slot);
+ }
+ else
+ slot_getallattrs(slot);
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, slot);
+
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++processed);
+
+ if (original_slot != NULL)
+ ExecDropSingleTupleTableSlot(original_slot);
+ };
+
+ ExecDropSingleTupleTableSlot(slot);
+ table_endscan(scandesc);
+ table_close(scan_rel, AccessShareLock);
+ }
+ }
+ else if (cstate->rel)
{
TupleTableSlot *slot;
TableScanDesc scandesc;
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..dcd97ae45b7 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -929,3 +929,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+-- COPY TO with partitioned table
+CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
+create table pp_1 (val int, id int);
+create table pp_2 (val int, id int);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (3);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (3) TO (7);
+insert into pp select g, 10 + g from generate_series(1,6) g;
+copy pp to stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..ba984388248 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -707,3 +707,14 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+-- COPY TO with partitioned table
+CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
+create table pp_1 (val int, id int);
+create table pp_2 (val int, id int);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (3);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (3) TO (7);
+insert into pp select g, 10 + g from generate_series(1,6) g;
+copy pp to stdout(header);
+
+DROP TABLE PP;
--
2.34.1
On Fri, 28 Mar 2025 at 08:39, jian he <jian.universality@gmail.com> wrote:
hi.
I made a mistake.
The regress test sql file should have a new line at the end of the file.
Couple of suggestions:
1) Can you add some comments here, this is the only code that is
different from the regular table handling code:
+ scan_tupdesc = RelationGetDescr(scan_rel);
+ map = build_attrmap_by_name_if_req(tupDesc,
scan_tupdesc, false);
2) You can see if you can try to make a function add call it from both
the partitioned table and regular table case, that way you could
reduce the duplicate code:
+ while (table_scan_getnextslot(scandesc,
ForwardScanDirection, slot))
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ /* Deconstruct the tuple ... */
+ if (map != NULL)
+ {
+ original_slot = slot;
+ root_slot =
MakeSingleTupleTableSlot(tupDesc, &TTSOpsBufferHeapTuple);
+ slot =
execute_attr_map_slot(map, slot, root_slot);
+ }
+ else
+ slot_getallattrs(slot);
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, slot);
+
+
pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+
++processed);
+
+ if (original_slot != NULL)
+
ExecDropSingleTupleTableSlot(original_slot);
+ };
Regards,
Vignesh
On Fri, Mar 28, 2025 at 9:03 PM vignesh C <vignesh21@gmail.com> wrote:
On Fri, 28 Mar 2025 at 08:39, jian he <jian.universality@gmail.com> wrote:
hi.
I made a mistake.
The regress test sql file should have a new line at the end of the file.Couple of suggestions: 1) Can you add some comments here, this is the only code that is different from the regular table handling code: + scan_tupdesc = RelationGetDescr(scan_rel); + map = build_attrmap_by_name_if_req(tupDesc, scan_tupdesc, false);
I have added the following comments around build_attrmap_by_name_if_req.
/*
* partition's rowtype might differ from the root table's. We must
* convert it back to the root table's rowtype as we are export
* partitioned table data here.
*/
2) You can see if you can try to make a function add call it from both the partitioned table and regular table case, that way you could reduce the duplicate code: + while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot)) + { + CHECK_FOR_INTERRUPTS(); + + /* Deconstruct the tuple ... */ + if (map != NULL) + { + original_slot = slot; + root_slot = MakeSingleTupleTableSlot(tupDesc, &TTSOpsBufferHeapTuple); + slot = execute_attr_map_slot(map, slot, root_slot); + } + else + slot_getallattrs(slot); + + /* Format and send the data */ + CopyOneRowTo(cstate, slot); + + pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED, + ++processed); + + if (original_slot != NULL) + ExecDropSingleTupleTableSlot(original_slot); + };
I consolidated it into a new function: CopyThisRelTo.
Attachments:
v6-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=US-ASCII; name=v6-0001-support-COPY-partitioned_table-TO.patchDownload
From 3036d31163ffea4c0a605d9411bc46af3b1b6394 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Sat, 29 Mar 2025 14:32:30 +0800
Subject: [PATCH v6 1/1] support COPY partitioned_table TO
CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
create table pp_1 (val int, id int);
create table pp_2 (val int, id int);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
insert into pp select g, 10 + g from generate_series(1,9) g;
copy pp to stdout(header);
the above case is much slower (around 25% some case) than
``COPY (select * from pp) to stdout(header);``,
because of column remaping. but this is still a new
feature, since master does not support ``COPY (partitioned_table)``.
reivewed by: vignesh C <vignesh21@gmail.com>
reivewed by: David Rowley <dgrowleyml@gmail.com>
reivewed by: Melih Mutlu <m.melihmutlu@gmail.com>
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5467/
---
doc/src/sgml/ref/copy.sgml | 8 +-
src/backend/commands/copyto.c | 135 ++++++++++++++++++++++------
src/test/regress/expected/copy2.out | 16 ++++
src/test/regress/sql/copy2.sql | 11 +++
4 files changed, 139 insertions(+), 31 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..f86e0b7ec35 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -521,15 +521,15 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used only with plain
- tables, not views, and does not copy rows from child tables
- or child partitions. For example, <literal>COPY <replaceable
+ tables, not views, and does not copy rows from child tables,
+ however <command>COPY TO</command> can be used with partitioned tables.
+ For example, in a table inheritance hierarchy, <literal>COPY <replaceable
class="parameter">table</replaceable> TO</literal> copies
the same rows as <literal>SELECT * FROM ONLY <replaceable
class="parameter">table</replaceable></literal>.
The syntax <literal>COPY (SELECT * FROM <replaceable
class="parameter">table</replaceable>) TO ...</literal> can be used to
- dump all of the rows in an inheritance hierarchy, partitioned table,
- or view.
+ dump all of the rows in an inheritance hierarchy, or view.
</para>
<para>
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 84a3f3879a8..facf87eb344 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,6 +19,8 @@
#include <sys/stat.h>
#include "access/tableam.h"
+#include "access/table.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ List *partitions; /* oid list of partition oid for copy to */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -116,6 +119,8 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyThisRelTo(CopyToState cstate, Relation rel,
+ Relation root_rel, uint64 *processed);
/* built-in format-specific routines */
static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -643,6 +648,8 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
+ List *scan_oids = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -670,11 +677,28 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ children = find_all_inheritors(RelationGetRelid(rel),
+ AccessShareLock,
+ NULL);
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"",
+ RelationGetRelationName(rel)),
+ errdetail("partition \"%s\" is a foreign table", RelationGetRelationName(rel)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ continue;
+
+ scan_oids = lappend_oid(scan_oids, childreloid);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -710,6 +734,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = list_copy(scan_oids);
}
else
{
@@ -1066,35 +1091,28 @@ DoCopyTo(CopyToState cstate)
cstate->routine->CopyToStart(cstate, tupDesc);
- if (cstate->rel)
+ /*
+ * if COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
- TupleTableSlot *slot;
- TableScanDesc scandesc;
-
- scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
- slot = table_slot_create(cstate->rel, NULL);
-
processed = 0;
- while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ foreach_oid(scan_oid, cstate->partitions)
{
- CHECK_FOR_INTERRUPTS();
+ Relation scan_rel;
- /* Deconstruct the tuple ... */
- slot_getallattrs(slot);
+ scan_rel = table_open(scan_oid, AccessShareLock);
- /* Format and send the data */
- CopyOneRowTo(cstate, slot);
+ CopyThisRelTo(cstate, scan_rel, cstate->rel, &processed);
- /*
- * Increment the number of processed tuples, and report the
- * progress.
- */
- pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
- ++processed);
+ table_close(scan_rel, AccessShareLock);
}
-
- ExecDropSingleTupleTableSlot(slot);
- table_endscan(scandesc);
+ }
+ else if (cstate->rel)
+ {
+ processed = 0;
+ CopyThisRelTo(cstate, cstate->rel, NULL, &processed);
}
else
{
@@ -1113,6 +1131,69 @@ DoCopyTo(CopyToState cstate)
return processed;
}
+/*
+ * rel: the relation to be copied to.
+ * root_rel: if not null, then the COPY TO partitioned rel.
+ * processed: number of tuple processed.
+*/
+static void
+CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *processed)
+{
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ AttrMap *map = NULL;
+ TupleTableSlot *root_slot = NULL;
+ TupleTableSlot *original_slot = NULL;
+ TupleDesc scan_tupdesc;
+ TupleDesc rootdesc = NULL;
+
+ scan_tupdesc = RelationGetDescr(rel);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(rel, NULL);
+
+ /*
+ * partition's rowtype might differ from the root table's. We must
+ * convert it back to the root table's rowtype as we are export
+ * partitioned table data here.
+ */
+ if (root_rel != NULL)
+ {
+ rootdesc = RelationGetDescr(root_rel);
+ map = build_attrmap_by_name_if_req(rootdesc, scan_tupdesc, false);
+ }
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ /* Deconstruct the tuple ... */
+ if (map != NULL)
+ {
+ original_slot = slot;
+ root_slot = MakeSingleTupleTableSlot(rootdesc, &TTSOpsBufferHeapTuple);
+ slot = execute_attr_map_slot(map, slot, root_slot);
+ }
+ else
+ slot_getallattrs(slot);
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, slot);
+
+ /*
+ * Increment the number of processed tuples, and report the
+ * progress.
+ */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++(*processed));
+
+ if (original_slot != NULL)
+ ExecDropSingleTupleTableSlot(original_slot);
+ }
+
+ ExecDropSingleTupleTableSlot(slot);
+ table_endscan(scandesc);
+}
+
/*
* Emit one row during DoCopyTo().
*/
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..dcd97ae45b7 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -929,3 +929,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+-- COPY TO with partitioned table
+CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
+create table pp_1 (val int, id int);
+create table pp_2 (val int, id int);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (3);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (3) TO (7);
+insert into pp select g, 10 + g from generate_series(1,6) g;
+copy pp to stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..ba984388248 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -707,3 +707,14 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+-- COPY TO with partitioned table
+CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
+create table pp_1 (val int, id int);
+create table pp_2 (val int, id int);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (3);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (3) TO (7);
+insert into pp select g, 10 + g from generate_series(1,6) g;
+copy pp to stdout(header);
+
+DROP TABLE PP;
--
2.34.1
On Sat, 29 Mar 2025 at 12:08, jian he <jian.universality@gmail.com> wrote:
I consolidated it into a new function: CopyThisRelTo.
Few comments:
1) Here the error message is not correct, we are printing the original
table from where copy was done which is a regular table and not a
foreign table, we should use childreloid instead of rel.
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ ereport(ERROR,
+
errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot
copy from foreign table \"%s\"",
+
RelationGetRelationName(rel)),
+
errdetail("partition \"%s\" is a foreign table",
RelationGetRelationName(rel)),
+ errhint("Try
the COPY (SELECT ...) TO variant."));
In the error detail you can include the original table too.
postgres=# copy t1 to stdout(header);
ERROR: cannot copy from foreign table "t1"
DETAIL: partition "t1" is a foreign table
HINT: Try the COPY (SELECT ...) TO variant.
2) 2.a) I felt the comment should be "then copy partitioned rel to
destionation":
+ * rel: the relation to be copied to.
+ * root_rel: if not null, then the COPY TO partitioned rel.
+ * processed: number of tuple processed.
+*/
+static void
+CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
uint64 *processed)
+{
+ TupleTableSlot *slot;
2.b) you can have processed argument in the next line for better readability
3) There is a small indentation issue here:
+ /*
+ * partition's rowtype might differ from the root table's. We must
+ * convert it back to the root table's rowtype as we are export
+ * partitioned table data here.
+ */
+ if (root_rel != NULL)
Regards,
Vignesh
On Sun, Mar 30, 2025 at 9:14 AM vignesh C <vignesh21@gmail.com> wrote:
On Sat, 29 Mar 2025 at 12:08, jian he <jian.universality@gmail.com> wrote:
I consolidated it into a new function: CopyThisRelTo.
Few comments:
1) Here the error message is not correct, we are printing the original
table from where copy was done which is a regular table and not a
foreign table, we should use childreloid instead of rel.+ if (relkind == RELKIND_FOREIGN_TABLE) + ereport(ERROR, + errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot copy from foreign table \"%s\"", + RelationGetRelationName(rel)), + errdetail("partition \"%s\" is a foreign table", RelationGetRelationName(rel)), + errhint("Try the COPY (SELECT ...) TO variant."));In the error detail you can include the original table too.
I changed it to:
if (relkind == RELKIND_FOREIGN_TABLE)
{
char *relation_name;
relation_name = get_rel_name(childreloid);
ereport(ERROR,
errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot copy from foreign table
\"%s\"", relation_name),
errdetail("Partition \"%s\" is a foreign
table in the partitioned table \"%s.%s\"",
relation_name,
RelationGetRelationName(rel),
get_namespace_name(rel->rd_rel->relnamespace)),
errhint("Try the COPY (SELECT ...) TO variant."));
}
2) 2.a) I felt the comment should be "then copy partitioned rel to destionation": + * rel: the relation to be copied to. + * root_rel: if not null, then the COPY TO partitioned rel. + * processed: number of tuple processed. +*/ +static void +CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *processed) +{ + TupleTableSlot *slot;
i changed it to:
+/*
+ * rel: the relation to be copied to.
+ * root_rel: if not null, then the COPY partitioned relation to destination.
+ * processed: number of tuple processed.
+*/
+static void
+CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed)
3) There is a small indentation issue here: + /* + * partition's rowtype might differ from the root table's. We must + * convert it back to the root table's rowtype as we are export + * partitioned table data here. + */ + if (root_rel != NULL)
I am not so sure.
can you check if the attached still has the indentation issue.
Attachments:
v7-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=US-ASCII; name=v7-0001-support-COPY-partitioned_table-TO.patchDownload
From 4e501b7a8e67cffbf6432bdb43985b21d2b635b8 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Sun, 30 Mar 2025 21:47:12 +0800
Subject: [PATCH v7 1/1] support COPY partitioned_table TO
CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
create table pp_1 (val int, id int);
create table pp_2 (val int, id int);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
insert into pp select g, 10 + g from generate_series(1,9) g;
copy pp to stdout(header);
the above case is much slower (around 25% in some case) than
``COPY (select * from pp) to stdout(header);``,
because of column remaping. but this is still a new
feature, since master does not support ``COPY (partitioned_table)``.
reivewed by: vignesh C <vignesh21@gmail.com>
reivewed by: David Rowley <dgrowleyml@gmail.com>
reivewed by: Melih Mutlu <m.melihmutlu@gmail.com>
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5467/
---
doc/src/sgml/ref/copy.sgml | 8 +-
src/backend/commands/copyto.c | 143 ++++++++++++++++++++++------
src/test/regress/expected/copy2.out | 16 ++++
src/test/regress/sql/copy2.sql | 11 +++
4 files changed, 147 insertions(+), 31 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..f86e0b7ec35 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -521,15 +521,15 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used only with plain
- tables, not views, and does not copy rows from child tables
- or child partitions. For example, <literal>COPY <replaceable
+ tables, not views, and does not copy rows from child tables,
+ however <command>COPY TO</command> can be used with partitioned tables.
+ For example, in a table inheritance hierarchy, <literal>COPY <replaceable
class="parameter">table</replaceable> TO</literal> copies
the same rows as <literal>SELECT * FROM ONLY <replaceable
class="parameter">table</replaceable></literal>.
The syntax <literal>COPY (SELECT * FROM <replaceable
class="parameter">table</replaceable>) TO ...</literal> can be used to
- dump all of the rows in an inheritance hierarchy, partitioned table,
- or view.
+ dump all of the rows in an inheritance hierarchy, or view.
</para>
<para>
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 84a3f3879a8..b75bbfea6a4 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,6 +19,8 @@
#include <sys/stat.h>
#include "access/tableam.h"
+#include "access/table.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ List *partitions; /* oid list of partition oid for copy to */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -116,6 +119,8 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyThisRelTo(CopyToState cstate, Relation rel,
+ Relation root_rel, uint64 *processed);
/* built-in format-specific routines */
static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -643,6 +648,8 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
+ List *scan_oids = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -670,11 +677,35 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ children = find_all_inheritors(RelationGetRelid(rel),
+ AccessShareLock,
+ NULL);
+
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ {
+ char *relation_name;
+
+ relation_name = get_rel_name(childreloid);
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"", relation_name),
+ errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s.%s\"",
+ relation_name, RelationGetRelationName(rel),
+ get_namespace_name(rel->rd_rel->relnamespace)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+ }
+
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ continue;
+
+ scan_oids = lappend_oid(scan_oids, childreloid);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -710,6 +741,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = list_copy(scan_oids);
}
else
{
@@ -1066,35 +1098,28 @@ DoCopyTo(CopyToState cstate)
cstate->routine->CopyToStart(cstate, tupDesc);
- if (cstate->rel)
+ /*
+ * if COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
- TupleTableSlot *slot;
- TableScanDesc scandesc;
-
- scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
- slot = table_slot_create(cstate->rel, NULL);
-
processed = 0;
- while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ foreach_oid(scan_oid, cstate->partitions)
{
- CHECK_FOR_INTERRUPTS();
+ Relation scan_rel;
- /* Deconstruct the tuple ... */
- slot_getallattrs(slot);
+ scan_rel = table_open(scan_oid, AccessShareLock);
- /* Format and send the data */
- CopyOneRowTo(cstate, slot);
+ CopyThisRelTo(cstate, scan_rel, cstate->rel, &processed);
- /*
- * Increment the number of processed tuples, and report the
- * progress.
- */
- pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
- ++processed);
+ table_close(scan_rel, AccessShareLock);
}
-
- ExecDropSingleTupleTableSlot(slot);
- table_endscan(scandesc);
+ }
+ else if (cstate->rel)
+ {
+ processed = 0;
+ CopyThisRelTo(cstate, cstate->rel, NULL, &processed);
}
else
{
@@ -1113,6 +1138,70 @@ DoCopyTo(CopyToState cstate)
return processed;
}
+/*
+ * rel: the relation to be copied to.
+ * root_rel: if not null, then the COPY partitioned relation to destination.
+ * processed: number of tuple processed.
+*/
+static void
+CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed)
+{
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ AttrMap *map = NULL;
+ TupleTableSlot *root_slot = NULL;
+ TupleTableSlot *original_slot = NULL;
+ TupleDesc scan_tupdesc;
+ TupleDesc rootdesc = NULL;
+
+ scan_tupdesc = RelationGetDescr(rel);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(rel, NULL);
+
+ /*
+ * partition's rowtype might differ from the root table's. We must
+ * convert it back to the root table's rowtype as we are export
+ * partitioned table data here.
+ */
+ if (root_rel != NULL)
+ {
+ rootdesc = RelationGetDescr(root_rel);
+ map = build_attrmap_by_name_if_req(rootdesc, scan_tupdesc, false);
+ }
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ /* Deconstruct the tuple ... */
+ if (map != NULL)
+ {
+ original_slot = slot;
+ root_slot = MakeSingleTupleTableSlot(rootdesc, &TTSOpsBufferHeapTuple);
+ slot = execute_attr_map_slot(map, slot, root_slot);
+ }
+ else
+ slot_getallattrs(slot);
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, slot);
+
+ /*
+ * Increment the number of processed tuples, and report the
+ * progress.
+ */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++(*processed));
+
+ if (original_slot != NULL)
+ ExecDropSingleTupleTableSlot(original_slot);
+ }
+
+ ExecDropSingleTupleTableSlot(slot);
+ table_endscan(scandesc);
+}
+
/*
* Emit one row during DoCopyTo().
*/
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..dcd97ae45b7 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -929,3 +929,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+-- COPY TO with partitioned table
+CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
+create table pp_1 (val int, id int);
+create table pp_2 (val int, id int);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (3);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (3) TO (7);
+insert into pp select g, 10 + g from generate_series(1,6) g;
+copy pp to stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..ba984388248 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -707,3 +707,14 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+-- COPY TO with partitioned table
+CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
+create table pp_1 (val int, id int);
+create table pp_2 (val int, id int);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (3);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (3) TO (7);
+insert into pp select g, 10 + g from generate_series(1,6) g;
+copy pp to stdout(header);
+
+DROP TABLE PP;
--
2.34.1
Hi!
I reviewed v7. Maybe we should add a multi-level partitioning case
into copy2.sql regression test?
I also did quick benchmarking for this patch:
==== DDL
create table ppp(i int) partition by range (i);
genddl.sh:
for i in `seq 0 200`; do echo "create table p$i partition of ppp for
values from ( $((10 * i)) ) to ( $((10 * (i + 1))) ); "; done
=== insert data data:
insert into ppp select i / 1000 from generate_series(0, 2000000)i;
=== results:
for 2000001 rows speedup is 1.40 times : 902.604 ms (patches) vs
1270.648 ms (unpatched)
for 4000002 rows speedup is 1.20 times : 1921.724 ms (patches) vs
2343.393 ms (unpatched)
for 8000004 rows speedup is 1.10 times : 3932.361 ms (patches) vs
4358.489ms (unpatched)
So, this patch indeed speeds up some cases, but with larger tables
speedup becomes negligible.
--
Best regards,
Kirill Reshke
On Mon, Mar 31, 2025 at 4:05 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
Hi!
I reviewed v7. Maybe we should add a multi-level partitioning case
into copy2.sql regression test?
sure.
I also did quick benchmarking for this patch:
==== DDL
create table ppp(i int) partition by range (i);
genddl.sh:
for i in `seq 0 200`; do echo "create table p$i partition of ppp for
values from ( $((10 * i)) ) to ( $((10 * (i + 1))) ); "; done=== insert data data:
insert into ppp select i / 1000 from generate_series(0, 2000000)i;=== results:
for 2000001 rows speedup is 1.40 times : 902.604 ms (patches) vs
1270.648 ms (unpatched)for 4000002 rows speedup is 1.20 times : 1921.724 ms (patches) vs
2343.393 ms (unpatched)for 8000004 rows speedup is 1.10 times : 3932.361 ms (patches) vs
4358.489ms (unpatched)So, this patch indeed speeds up some cases, but with larger tables
speedup becomes negligible.
Thanks for doing the benchmark.
Attachments:
v8-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=US-ASCII; name=v8-0001-support-COPY-partitioned_table-TO.patchDownload
From b27371ca4ff132e7d2803406f9e3f371c51c96df Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Tue, 1 Apr 2025 08:56:22 +0800
Subject: [PATCH v8 1/1] support COPY partitioned_table TO
CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
create table pp_1 (val int, id int);
create table pp_2 (val int, id int);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
insert into pp select g, 10 + g from generate_series(1,9) g;
copy pp to stdout(header);
the above case is much slower (around 25% in some case) than
``COPY (select * from pp) to stdout(header);``,
because of column remaping. but this is still a new
feature, since master does not support ``COPY (partitioned_table)``.
reivewed by: vignesh C <vignesh21@gmail.com>
reivewed by: David Rowley <dgrowleyml@gmail.com>
reivewed by: Melih Mutlu <m.melihmutlu@gmail.com>
reivewed by: Kirill Reshke <reshkekirill@gmail.com>
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5467/
---
doc/src/sgml/ref/copy.sgml | 8 +-
src/backend/commands/copyto.c | 143 ++++++++++++++++++++++------
src/test/regress/expected/copy2.out | 20 ++++
src/test/regress/sql/copy2.sql | 17 ++++
4 files changed, 157 insertions(+), 31 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..f86e0b7ec35 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -521,15 +521,15 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used only with plain
- tables, not views, and does not copy rows from child tables
- or child partitions. For example, <literal>COPY <replaceable
+ tables, not views, and does not copy rows from child tables,
+ however <command>COPY TO</command> can be used with partitioned tables.
+ For example, in a table inheritance hierarchy, <literal>COPY <replaceable
class="parameter">table</replaceable> TO</literal> copies
the same rows as <literal>SELECT * FROM ONLY <replaceable
class="parameter">table</replaceable></literal>.
The syntax <literal>COPY (SELECT * FROM <replaceable
class="parameter">table</replaceable>) TO ...</literal> can be used to
- dump all of the rows in an inheritance hierarchy, partitioned table,
- or view.
+ dump all of the rows in an inheritance hierarchy, or view.
</para>
<para>
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 84a3f3879a8..6fc940bddbc 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,6 +19,8 @@
#include <sys/stat.h>
#include "access/tableam.h"
+#include "access/table.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ List *partitions; /* oid list of partition oid for copy to */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -116,6 +119,8 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyThisRelTo(CopyToState cstate, Relation rel,
+ Relation root_rel, uint64 *processed);
/* built-in format-specific routines */
static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -643,6 +648,8 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
+ List *scan_oids = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -670,11 +677,35 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ children = find_all_inheritors(RelationGetRelid(rel),
+ AccessShareLock,
+ NULL);
+
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ {
+ char *relation_name;
+
+ relation_name = get_rel_name(childreloid);
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"", relation_name),
+ errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s.%s\"",
+ relation_name, RelationGetRelationName(rel),
+ get_namespace_name(rel->rd_rel->relnamespace)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+ }
+
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ continue;
+
+ scan_oids = lappend_oid(scan_oids, childreloid);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -710,6 +741,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = list_copy(scan_oids);
}
else
{
@@ -1066,35 +1098,28 @@ DoCopyTo(CopyToState cstate)
cstate->routine->CopyToStart(cstate, tupDesc);
- if (cstate->rel)
+ /*
+ * if COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
- TupleTableSlot *slot;
- TableScanDesc scandesc;
-
- scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
- slot = table_slot_create(cstate->rel, NULL);
-
processed = 0;
- while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ foreach_oid(scan_oid, cstate->partitions)
{
- CHECK_FOR_INTERRUPTS();
+ Relation scan_rel;
- /* Deconstruct the tuple ... */
- slot_getallattrs(slot);
+ scan_rel = table_open(scan_oid, AccessShareLock);
- /* Format and send the data */
- CopyOneRowTo(cstate, slot);
+ CopyThisRelTo(cstate, scan_rel, cstate->rel, &processed);
- /*
- * Increment the number of processed tuples, and report the
- * progress.
- */
- pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
- ++processed);
+ table_close(scan_rel, AccessShareLock);
}
-
- ExecDropSingleTupleTableSlot(slot);
- table_endscan(scandesc);
+ }
+ else if (cstate->rel)
+ {
+ processed = 0;
+ CopyThisRelTo(cstate, cstate->rel, NULL, &processed);
}
else
{
@@ -1113,6 +1138,70 @@ DoCopyTo(CopyToState cstate)
return processed;
}
+/*
+ * rel: the relation to be copied to.
+ * root_rel: if not NULL, then the COPY partitioned relation to destination.
+ * processed: number of tuple processed.
+*/
+static void
+CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed)
+{
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ AttrMap *map = NULL;
+ TupleTableSlot *root_slot = NULL;
+ TupleTableSlot *original_slot = NULL;
+ TupleDesc scan_tupdesc;
+ TupleDesc rootdesc = NULL;
+
+ scan_tupdesc = RelationGetDescr(rel);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(rel, NULL);
+
+ /*
+ * partition's rowtype might differ from the root table's. We must
+ * convert it back to the root table's rowtype as we are export
+ * partitioned table data here.
+ */
+ if (root_rel != NULL)
+ {
+ rootdesc = RelationGetDescr(root_rel);
+ map = build_attrmap_by_name_if_req(rootdesc, scan_tupdesc, false);
+ }
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ /* Deconstruct the tuple ... */
+ if (map != NULL)
+ {
+ original_slot = slot;
+ root_slot = MakeSingleTupleTableSlot(rootdesc, &TTSOpsBufferHeapTuple);
+ slot = execute_attr_map_slot(map, slot, root_slot);
+ }
+ else
+ slot_getallattrs(slot);
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, slot);
+
+ /*
+ * Increment the number of processed tuples, and report the
+ * progress.
+ */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++(*processed));
+
+ if (original_slot != NULL)
+ ExecDropSingleTupleTableSlot(original_slot);
+ }
+
+ ExecDropSingleTupleTableSlot(slot);
+ table_endscan(scandesc);
+}
+
/*
* Emit one row during DoCopyTo().
*/
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..b01389df44c 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -929,3 +929,23 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+-- COPY TO with partitioned table
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE pp_15 (val int, id int);
+CREATE TABLE pp_510 (val int,id int);
+ALTER TABLE pp_1 ATTACH PARTITION pp_15 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp_2 ATTACH PARTITION pp_510 FOR VALUES FROM (5) TO (10);
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1, 6) g;
+COPY pp TO stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..3730f8af4e5 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -707,3 +707,20 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+-- COPY TO with partitioned table
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+
+CREATE TABLE pp_15 (val int, id int);
+CREATE TABLE pp_510 (val int,id int);
+ALTER TABLE pp_1 ATTACH PARTITION pp_15 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp_2 ATTACH PARTITION pp_510 FOR VALUES FROM (5) TO (10);
+
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1, 6) g;
+
+COPY pp TO stdout(header);
+DROP TABLE PP;
--
2.34.1
On Tue, 1 Apr 2025 at 06:31, jian he <jian.universality@gmail.com> wrote:
On Mon, Mar 31, 2025 at 4:05 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
Thanks for doing the benchmark.
Few comments to improve the comments, error message and remove
redundant assignment:
1) How about we change below:
/*
* partition's rowtype might differ from the root table's. We must
* convert it back to the root table's rowtype as we are export
* partitioned table data here.
*/
To:
/*
* A partition's row type might differ from the root table's.
* Since we're exporting partitioned table data, we must
* convert it back to the root table's row type.
*/
2) How about we change below:
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ ereport(ERROR,
+
errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot
copy from foreign table \"%s\"",
+
RelationGetRelationName(rel)),
+
errdetail("partition \"%s\" is a foreign table",
RelationGetRelationName(rel)),
+ errhint("Try
the COPY (SELECT ...) TO variant."));
To:
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ ereport(ERROR,
+
errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot
copy from a partitioned table having foreign table partition \"%s\"",
+
RelationGetRelationName(rel)),
+
errdetail("partition \"%s\" is a foreign table",
RelationGetRelationName(rel)),
+ errhint("Try
the COPY (SELECT ...) TO variant."));
3) How about we change below:
/*
* rel: the relation to be copied to.
* root_rel: if not NULL, then the COPY partitioned relation to destination.
* processed: number of tuples processed.
*/
To:
/*
* rel: the relation from which data will be copied.
* root_rel: If not NULL, indicates that rel's row type must be
* converted to root_rel's row type.
* processed: number of tuples processed.
*/
4) You can initialize processed to 0 along with declaration in
DoCopyTo function and remove the below:
+ if (cstate->rel && cstate->rel->rd_rel->relkind ==
RELKIND_PARTITIONED_TABLE)
{
...
...
processed = 0;
- while (table_scan_getnextslot(scandesc,
ForwardScanDirection, slot))
...
...
-
- ExecDropSingleTupleTableSlot(slot);
- table_endscan(scandesc);
+ }
+ else if (cstate->rel)
+ {
+ processed = 0;
+ CopyThisRelTo(cstate, cstate->rel, NULL, &processed);
}
Regards,
Vignesh
On Tue, Apr 1, 2025 at 1:38 PM vignesh C <vignesh21@gmail.com> wrote:
On Tue, 1 Apr 2025 at 06:31, jian he <jian.universality@gmail.com> wrote:
On Mon, Mar 31, 2025 at 4:05 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
Thanks for doing the benchmark.
Few comments to improve the comments, error message and remove
redundant assignment:
1) How about we change below:
/*
* partition's rowtype might differ from the root table's. We must
* convert it back to the root table's rowtype as we are export
* partitioned table data here.
*/
To:
/*
* A partition's row type might differ from the root table's.
* Since we're exporting partitioned table data, we must
* convert it back to the root table's row type.
*/
i changed it to
/*
* A partition's rowtype might differ from the root table's.
* Since we are export partitioned table data here,
* we must convert it back to the root table's rowtype.
*/
Since many places use "rowtype",
using "rowtype" instead of "row type" should be fine.
2) How about we change below: + if (relkind == RELKIND_FOREIGN_TABLE) + ereport(ERROR, + errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot copy from foreign table \"%s\"", + RelationGetRelationName(rel)), + errdetail("partition \"%s\" is a foreign table", RelationGetRelationName(rel)), + errhint("Try the COPY (SELECT ...) TO variant."));To: + if (relkind == RELKIND_FOREIGN_TABLE) + ereport(ERROR, + errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot copy from a partitioned table having foreign table partition \"%s\"", + RelationGetRelationName(rel)), + errdetail("partition \"%s\" is a foreign table", RelationGetRelationName(rel)), + errhint("Try the COPY (SELECT ...) TO variant."));
i am not so sure.
since the surrounding code we have
else if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot copy from foreign table \"%s\"",
RelationGetRelationName(rel)),
errhint("Try the COPY (SELECT ...) TO variant.")));
let's see what others think.
3) How about we change below:
/*
* rel: the relation to be copied to.
* root_rel: if not NULL, then the COPY partitioned relation to destination.
* processed: number of tuples processed.
*/
To:
/*
* rel: the relation from which data will be copied.
* root_rel: If not NULL, indicates that rel's row type must be
* converted to root_rel's row type.
* processed: number of tuples processed.
*/
i changed it to
/*
* rel: the relation from which the actual data will be copied.
* root_rel: if not NULL, it indicates that we are copying partitioned relation
* data to the destination, and "rel" is the partition of "root_rel".
* processed: number of tuples processed.
*/
4) You can initialize processed to 0 along with declaration in DoCopyTo function and remove the below: + if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) { ... ... processed = 0; - while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot)) ... ... - - ExecDropSingleTupleTableSlot(slot); - table_endscan(scandesc); + } + else if (cstate->rel) + { + processed = 0; + CopyThisRelTo(cstate, cstate->rel, NULL, &processed); }
ok.
Attachments:
v9-0001-support-COPY-partitioned_table-TO.patchapplication/x-patch; name=v9-0001-support-COPY-partitioned_table-TO.patchDownload
From 374f38d7187e92882e5fa4fe6e8b9dd7d7785ed9 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 4 Apr 2025 11:09:04 +0800
Subject: [PATCH v9 1/1] support COPY partitioned_table TO
CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id);
create table pp_1 (val int, id int);
create table pp_2 (val int, id int);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
insert into pp select g, 10 + g from generate_series(1,9) g;
copy pp to stdout(header);
the above case is much slower (around 25% in some case) than
``COPY (select * from pp) to stdout(header);``,
because of column remaping. but this is still a new
feature, since master does not support ``COPY (partitioned_table)``.
reivewed by: vignesh C <vignesh21@gmail.com>
reivewed by: David Rowley <dgrowleyml@gmail.com>
reivewed by: Melih Mutlu <m.melihmutlu@gmail.com>
reivewed by: Kirill Reshke <reshkekirill@gmail.com>
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5467/
---
doc/src/sgml/ref/copy.sgml | 8 +-
src/backend/commands/copyto.c | 144 ++++++++++++++++++++++------
src/test/regress/expected/copy2.out | 20 ++++
src/test/regress/sql/copy2.sql | 17 ++++
4 files changed, 156 insertions(+), 33 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..b7d24f5d271 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -521,15 +521,15 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used only with plain
- tables, not views, and does not copy rows from child tables
- or child partitions. For example, <literal>COPY <replaceable
+ tables, not views, and does not copy rows from child tables,
+ however <command>COPY TO</command> can be used with partitioned tables.
+ For example, in a table inheritance hierarchy, <literal>COPY <replaceable
class="parameter">table</replaceable> TO</literal> copies
the same rows as <literal>SELECT * FROM ONLY <replaceable
class="parameter">table</replaceable></literal>.
The syntax <literal>COPY (SELECT * FROM <replaceable
class="parameter">table</replaceable>) TO ...</literal> can be used to
- dump all of the rows in an inheritance hierarchy, partitioned table,
- or view.
+ dump all of the rows in a table inheritance hierarchy, or view.
</para>
<para>
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 84a3f3879a8..d0aab894c36 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,6 +19,8 @@
#include <sys/stat.h>
#include "access/tableam.h"
+#include "access/table.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ List *partitions; /* oid list of partition oid for copy to */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -116,6 +119,8 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyThisRelTo(CopyToState cstate, Relation rel,
+ Relation root_rel, uint64 *processed);
/* built-in format-specific routines */
static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -643,6 +648,8 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
+ List *scan_oids = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -670,11 +677,35 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ children = find_all_inheritors(RelationGetRelid(rel),
+ AccessShareLock,
+ NULL);
+
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ {
+ char *relation_name;
+
+ relation_name = get_rel_name(childreloid);
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"", relation_name),
+ errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s.%s\"",
+ relation_name, RelationGetRelationName(rel),
+ get_namespace_name(rel->rd_rel->relnamespace)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+ }
+
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ continue;
+
+ scan_oids = lappend_oid(scan_oids, childreloid);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -710,6 +741,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = list_copy(scan_oids);
}
else
{
@@ -1028,7 +1060,7 @@ DoCopyTo(CopyToState cstate)
TupleDesc tupDesc;
int num_phys_attrs;
ListCell *cur;
- uint64 processed;
+ uint64 processed = 0;
if (fe_copy)
SendCopyBegin(cstate);
@@ -1066,36 +1098,25 @@ DoCopyTo(CopyToState cstate)
cstate->routine->CopyToStart(cstate, tupDesc);
- if (cstate->rel)
+ /*
+ * if COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
- TupleTableSlot *slot;
- TableScanDesc scandesc;
-
- scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
- slot = table_slot_create(cstate->rel, NULL);
-
- processed = 0;
- while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ foreach_oid(scan_oid, cstate->partitions)
{
- CHECK_FOR_INTERRUPTS();
+ Relation scan_rel;
- /* Deconstruct the tuple ... */
- slot_getallattrs(slot);
+ scan_rel = table_open(scan_oid, AccessShareLock);
- /* Format and send the data */
- CopyOneRowTo(cstate, slot);
+ CopyThisRelTo(cstate, scan_rel, cstate->rel, &processed);
- /*
- * Increment the number of processed tuples, and report the
- * progress.
- */
- pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
- ++processed);
+ table_close(scan_rel, AccessShareLock);
}
-
- ExecDropSingleTupleTableSlot(slot);
- table_endscan(scandesc);
}
+ else if (cstate->rel)
+ CopyThisRelTo(cstate, cstate->rel, NULL, &processed);
else
{
/* run the plan --- the dest receiver will send tuples */
@@ -1113,6 +1134,71 @@ DoCopyTo(CopyToState cstate)
return processed;
}
+/*
+ * rel: the relation from which the actual data will be copied.
+ * root_rel: if not NULL, it indicates that we are copying partitioned relation
+ * data to the destination, and "rel" is the partition of "root_rel".
+ * processed: number of tuples processed.
+*/
+static void
+CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed)
+{
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ AttrMap *map = NULL;
+ TupleTableSlot *root_slot = NULL;
+ TupleTableSlot *original_slot = NULL;
+ TupleDesc scan_tupdesc;
+ TupleDesc rootdesc = NULL;
+
+ scan_tupdesc = RelationGetDescr(rel);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(rel, NULL);
+
+ /*
+ * A partition's rowtype might differ from the root table's.
+ * Since we are export partitioned table data here,
+ * we must convert it back to the root table's rowtype.
+ */
+ if (root_rel != NULL)
+ {
+ rootdesc = RelationGetDescr(root_rel);
+ map = build_attrmap_by_name_if_req(rootdesc, scan_tupdesc, false);
+ }
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ /* Deconstruct the tuple ... */
+ if (map != NULL)
+ {
+ original_slot = slot;
+ root_slot = MakeSingleTupleTableSlot(rootdesc, &TTSOpsBufferHeapTuple);
+ slot = execute_attr_map_slot(map, slot, root_slot);
+ }
+ else
+ slot_getallattrs(slot);
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, slot);
+
+ /*
+ * Increment the number of processed tuples, and report the
+ * progress.
+ */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++(*processed));
+
+ if (original_slot != NULL)
+ ExecDropSingleTupleTableSlot(original_slot);
+ }
+
+ ExecDropSingleTupleTableSlot(slot);
+ table_endscan(scandesc);
+}
+
/*
* Emit one row during DoCopyTo().
*/
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..b01389df44c 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -929,3 +929,23 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+-- COPY TO with partitioned table
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE pp_15 (val int, id int);
+CREATE TABLE pp_510 (val int,id int);
+ALTER TABLE pp_1 ATTACH PARTITION pp_15 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp_2 ATTACH PARTITION pp_510 FOR VALUES FROM (5) TO (10);
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1, 6) g;
+COPY pp TO stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..3730f8af4e5 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -707,3 +707,20 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+-- COPY TO with partitioned table
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+
+CREATE TABLE pp_15 (val int, id int);
+CREATE TABLE pp_510 (val int,id int);
+ALTER TABLE pp_1 ATTACH PARTITION pp_15 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp_2 ATTACH PARTITION pp_510 FOR VALUES FROM (5) TO (10);
+
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1, 6) g;
+
+COPY pp TO stdout(header);
+DROP TABLE PP;
--
2.34.1
Hi!
First of all, a commit message does not need to contain SQL examples
of what it does. We should provide human-readable explanations and
that's it.
Next, about changes to src/test/regress/sql/copy2.sql. I find the sql
you used to test really unintuitive. How about CREATE TABLE ...
PARTITION OF syntax? It is also one command instead of two (create +
alter). It is also hard to say what partition structure is, because
column names on different partition levels are the same, just order is
switched. Let's change it to something more intuitive too?
--
Best regards,
Kirill Reshke
On Fri, 4 Apr 2025, 15:17 Kirill Reshke, <reshkekirill@gmail.com> wrote:
Hi!
First of all, a commit message does not need to contain SQL examples
of what it does. We should provide human-readable explanations and
that's it.Next, about changes to src/test/regress/sql/copy2.sql. I find the sql
you used to test really unintuitive. How about CREATE TABLE ...
PARTITION OF syntax? It is also one command instead of two (create +
alter). It is also hard to say what partition structure is, because
column names on different partition levels are the same, just order is
switched. Let's change it to something more intuitive too?--
Best regards,
Kirill Reshke
Maybe we can tab-complete here if prefix matches pg_% ? Does that makes
good use case?
Show quoted text
Sorry, wrong thread
Best regards,
Kirill Reshke
On Mon, 7 Apr 2025, 19:54 Kirill Reshke, <reshkekirill@gmail.com> wrote:
Show quoted text
On Fri, 4 Apr 2025, 15:17 Kirill Reshke, <reshkekirill@gmail.com> wrote:
Hi!
First of all, a commit message does not need to contain SQL examples
of what it does. We should provide human-readable explanations and
that's it.Next, about changes to src/test/regress/sql/copy2.sql. I find the sql
you used to test really unintuitive. How about CREATE TABLE ...
PARTITION OF syntax? It is also one command instead of two (create +
alter). It is also hard to say what partition structure is, because
column names on different partition levels are the same, just order is
switched. Let's change it to something more intuitive too?--
Best regards,
Kirill ReshkeMaybe we can tab-complete here if prefix matches pg_% ? Does that makes
good use case?
hi.
rebase and simplify regress tests.
Attachments:
v10-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=US-ASCII; name=v10-0001-support-COPY-partitioned_table-TO.patchDownload
From f56c94ccb018928e41cc35e162174831cb016c1d Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 10 Apr 2025 10:41:40 +0800
Subject: [PATCH v10 1/1] support COPY partitioned_table TO
CREATE TABLE pp (id int, val int ) PARTITION BY RANGE (id);
CREATE TABLE pp_1 (val int, id int);
CREATE TABLE pp_2 (val int, id int);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
insert into pp select g, 10 + g from generate_series(1,9) g;
copy pp to stdout(header);
the above case is much slower (around 25% in some case) than
``COPY (select * from pp) to stdout(header);``,
because of column remaping. but this is still a new
feature, since master does not support ``COPY (partitioned_table)``.
reivewed by: vignesh C <vignesh21@gmail.com>
reivewed by: David Rowley <dgrowleyml@gmail.com>
reivewed by: Melih Mutlu <m.melihmutlu@gmail.com>
reivewed by: Kirill Reshke <reshkekirill@gmail.com>
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5467/
---
doc/src/sgml/ref/copy.sgml | 6 +-
src/backend/commands/copyto.c | 144 +++++++++++++++++++++++------
src/test/regress/expected/copy.out | 18 ++++
src/test/regress/sql/copy.sql | 15 +++
4 files changed, 151 insertions(+), 32 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index d6859276bed..293251a76a0 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -521,13 +521,13 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used with plain
- tables and populated materialized views.
- For example,
+ tables, populated materialized views and partitioned tables.
+ For example, if <replaceable class="parameter">table</replaceable> is not partitioned table,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal> copies the same rows as
<literal>SELECT * FROM ONLY <replaceable class="parameter">table</replaceable></literal>.
However it doesn't directly support other relation types,
- such as partitioned tables, inheritance child tables, or views.
+ such as inheritance child tables, or views.
To copy all rows from such relations, use <literal>COPY (SELECT * FROM
<replaceable class="parameter">table</replaceable>) TO</literal>.
</para>
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f87e405351d..5f6cf92f86e 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,6 +19,8 @@
#include <sys/stat.h>
#include "access/tableam.h"
+#include "access/table.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ List *partitions; /* oid list of partition oid for copy to */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -116,6 +119,8 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyThisRelTo(CopyToState cstate, Relation rel,
+ Relation root_rel, uint64 *processed);
/* built-in format-specific routines */
static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -643,6 +648,8 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
+ List *scan_oids = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -673,11 +680,35 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ children = find_all_inheritors(RelationGetRelid(rel),
+ AccessShareLock,
+ NULL);
+
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ {
+ char *relation_name;
+
+ relation_name = get_rel_name(childreloid);
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"", relation_name),
+ errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s.%s\"",
+ relation_name, RelationGetRelationName(rel),
+ get_namespace_name(rel->rd_rel->relnamespace)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+ }
+
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ continue;
+
+ scan_oids = lappend_oid(scan_oids, childreloid);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -713,6 +744,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = list_copy(scan_oids);
}
else
{
@@ -1031,7 +1063,7 @@ DoCopyTo(CopyToState cstate)
TupleDesc tupDesc;
int num_phys_attrs;
ListCell *cur;
- uint64 processed;
+ uint64 processed = 0;
if (fe_copy)
SendCopyBegin(cstate);
@@ -1069,36 +1101,25 @@ DoCopyTo(CopyToState cstate)
cstate->routine->CopyToStart(cstate, tupDesc);
- if (cstate->rel)
+ /*
+ * if COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
- TupleTableSlot *slot;
- TableScanDesc scandesc;
-
- scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
- slot = table_slot_create(cstate->rel, NULL);
-
- processed = 0;
- while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ foreach_oid(scan_oid, cstate->partitions)
{
- CHECK_FOR_INTERRUPTS();
+ Relation scan_rel;
- /* Deconstruct the tuple ... */
- slot_getallattrs(slot);
+ scan_rel = table_open(scan_oid, AccessShareLock);
- /* Format and send the data */
- CopyOneRowTo(cstate, slot);
+ CopyThisRelTo(cstate, scan_rel, cstate->rel, &processed);
- /*
- * Increment the number of processed tuples, and report the
- * progress.
- */
- pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
- ++processed);
+ table_close(scan_rel, AccessShareLock);
}
-
- ExecDropSingleTupleTableSlot(slot);
- table_endscan(scandesc);
}
+ else if (cstate->rel)
+ CopyThisRelTo(cstate, cstate->rel, NULL, &processed);
else
{
/* run the plan --- the dest receiver will send tuples */
@@ -1116,6 +1137,71 @@ DoCopyTo(CopyToState cstate)
return processed;
}
+/*
+ * rel: the relation from which the actual data will be copied.
+ * root_rel: if not NULL, it indicates that we are copying partitioned relation
+ * data to the destination, and "rel" is the partition of "root_rel".
+ * processed: number of tuples processed.
+*/
+static void
+CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed)
+{
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ AttrMap *map = NULL;
+ TupleTableSlot *root_slot = NULL;
+ TupleTableSlot *original_slot = NULL;
+ TupleDesc scan_tupdesc;
+ TupleDesc rootdesc = NULL;
+
+ scan_tupdesc = RelationGetDescr(rel);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(rel, NULL);
+
+ /*
+ * A partition's rowtype might differ from the root table's.
+ * Since we are export partitioned table data here,
+ * we must convert it back to the root table's rowtype.
+ */
+ if (root_rel != NULL)
+ {
+ rootdesc = RelationGetDescr(root_rel);
+ map = build_attrmap_by_name_if_req(rootdesc, scan_tupdesc, false);
+ }
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ /* Deconstruct the tuple ... */
+ if (map != NULL)
+ {
+ original_slot = slot;
+ root_slot = MakeSingleTupleTableSlot(rootdesc, &TTSOpsBufferHeapTuple);
+ slot = execute_attr_map_slot(map, slot, root_slot);
+ }
+ else
+ slot_getallattrs(slot);
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, slot);
+
+ /*
+ * Increment the number of processed tuples, and report the
+ * progress.
+ */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++(*processed));
+
+ if (original_slot != NULL)
+ ExecDropSingleTupleTableSlot(original_slot);
+ }
+
+ ExecDropSingleTupleTableSlot(slot);
+ table_endscan(scandesc);
+}
+
/*
* Emit one row during DoCopyTo().
*/
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 8d5a06563c4..05e5649c1bc 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -350,3 +350,21 @@ COPY copytest_mv(id) TO stdout WITH (header);
id
1
DROP MATERIALIZED VIEW copytest_mv;
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+COPY pp TO stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f0b88a23db8..fd2627deefa 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -375,3 +375,18 @@ COPY copytest_mv(id) TO stdout WITH (header);
REFRESH MATERIALIZED VIEW copytest_mv;
COPY copytest_mv(id) TO stdout WITH (header);
DROP MATERIALIZED VIEW copytest_mv;
+
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+
+COPY pp TO stdout(header);
+DROP TABLE PP;
\ No newline at end of file
--
2.34.1
On Thu, 10 Apr 2025 at 07:45, jian he <jian.universality@gmail.com> wrote:
hi.
rebase and simplify regress tests.
HI!
You used CREATE TABLE PARTITION OF syntax for the second level of
partitioning scheme, but not for the first level. Is there any reason?
Also about column names. how about
+CREATE TABLE pp (year int, day int) PARTITION BY RANGE (year);
+CREATE TABLE pp_1 (year int, day int) PARTITION BY RANGE (day);
+CREATE TABLE pp_2 (year int, day int) PARTITION BY RANGE (day);
??
--
Best regards,
Kirill Reshke
On Thu, Apr 10, 2025 at 4:25 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
On Thu, 10 Apr 2025 at 07:45, jian he <jian.universality@gmail.com> wrote:
hi.
rebase and simplify regress tests.
HI!
You used CREATE TABLE PARTITION OF syntax for the second level of
partitioning scheme, but not for the first level. Is there any reason?
hi.
I want the partitioned table and partition column position to be different.
Here, the partitioned table column order is "(id int,val int) ",
but the actual partition column order is "(val int, id int)".
Also about column names. how about
+CREATE TABLE pp (year int, day int) PARTITION BY RANGE (year); +CREATE TABLE pp_1 (year int, day int) PARTITION BY RANGE (day); +CREATE TABLE pp_2 (year int, day int) PARTITION BY RANGE (day);??
I think the current test example is fine.
On Thu, 10 Apr 2025 at 17:37, jian he <jian.universality@gmail.com> wrote:
I think the current test example is fine.
Ok, let it be so. I changed status to RFQ as I have no more input
here, and other reviewers in thread remain silent (so I assume they
are fine with v10)
--
Best regards,
Kirill Reshke
hi.
In the V10 patch, there will be some regression if the partition column
ordering is different from the root partitioned table.
because in V10 CopyThisRelTo
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ if (map != NULL)
+ {
+ original_slot = slot;
+ root_slot = MakeSingleTupleTableSlot(rootdesc, &TTSOpsBufferHeapTuple);
+ slot = execute_attr_map_slot(map, slot, root_slot);
+ }
+ else
+ slot_getallattrs(slot);
+
+ if (original_slot != NULL)
+ ExecDropSingleTupleTableSlot(original_slot);
+}
as you can see, for each slot in the partition, i called
MakeSingleTupleTableSlot to get the dumpy root_slot
and ExecDropSingleTupleTableSlot too.
that will cause overhead.
we can call produce root_slot before the main while loop.
like the following:
+ if (root_rel != NULL)
+ {
+ rootdesc = RelationGetDescr(root_rel);
+ root_slot = table_slot_create(root_rel, NULL);
+ map = build_attrmap_by_name_if_req(rootdesc, tupdesc, false);
+ }
....
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ TupleTableSlot *copyslot;
+ if (map != NULL)
+ copyslot = execute_attr_map_slot(map, slot, root_slot);
+ else
+ {
+ slot_getallattrs(slot);
+ copyslot = slot;
+ }
+
please check CopyThisRelTo in v11. so, with v11, there is no
regression for case when
partition column ordering differs from partitioned.
I have tested 30 partitions, 10 columns, all the column ordering
is different with the root partitioned table.
copy pp to '/tmp/2.txt'
is still faster than
copy (select * from pp) to '/tmp/1.txt';
(359.463 ms versus 376.371 ms)
I am using -Dbuildtype=release
PostgreSQL 18beta1_release_build on x86_64-linux, compiled by gcc-14.1.0, 64-bit
you may see the attached test file.
Attachments:
v11-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=US-ASCII; name=v11-0001-support-COPY-partitioned_table-TO.patchDownload
From fca7b87718264cb5ea52f3b4462f4d6e52d58cdc Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 5 Jun 2025 08:44:13 +0800
Subject: [PATCH v11 1/1] support COPY partitioned_table TO
this is for implementatoin of ``COPY (partitioned_table) TO``. it will be
faster than ``COPY (select * from partitioned_table) TO``.
reivewed by: vignesh C <vignesh21@gmail.com>
reivewed by: David Rowley <dgrowleyml@gmail.com>
reivewed by: Melih Mutlu <m.melihmutlu@gmail.com>
reivewed by: Kirill Reshke <reshkekirill@gmail.com>
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5467/
---
doc/src/sgml/ref/copy.sgml | 6 +-
src/backend/commands/copyto.c | 146 +++++++++++++++++++++++------
src/test/regress/expected/copy.out | 18 ++++
src/test/regress/sql/copy.sql | 15 +++
4 files changed, 153 insertions(+), 32 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8433344e5b6..d750a2fef7c 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -521,13 +521,13 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used with plain
- tables and populated materialized views.
- For example,
+ tables, populated materialized views and partitioned tables.
+ For example, if <replaceable class="parameter">table</replaceable> is not partitioned table,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal> copies the same rows as
<literal>SELECT * FROM ONLY <replaceable class="parameter">table</replaceable></literal>.
However it doesn't directly support other relation types,
- such as partitioned tables, inheritance child tables, or views.
+ such as inheritance child tables, or views.
To copy all rows from such relations, use <literal>COPY (SELECT * FROM
<replaceable class="parameter">table</replaceable>) TO</literal>.
</para>
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ea6f18f2c80..a718ad02960 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,6 +19,8 @@
#include <sys/stat.h>
#include "access/tableam.h"
+#include "access/table.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ List *partitions; /* oid list of partition oid for copy to */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -116,6 +119,8 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyThisRelTo(CopyToState cstate, Relation rel,
+ Relation root_rel, uint64 *processed);
/* built-in format-specific routines */
static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -643,6 +648,8 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
+ List *scan_oids = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -673,11 +680,35 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ children = find_all_inheritors(RelationGetRelid(rel),
+ AccessShareLock,
+ NULL);
+
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ {
+ char *relation_name;
+
+ relation_name = get_rel_name(childreloid);
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"", relation_name),
+ errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s.%s\"",
+ relation_name, RelationGetRelationName(rel),
+ get_namespace_name(rel->rd_rel->relnamespace)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+ }
+
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ continue;
+
+ scan_oids = lappend_oid(scan_oids, childreloid);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -713,6 +744,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = list_copy(scan_oids);
}
else
{
@@ -722,6 +754,7 @@ BeginCopyTo(ParseState *pstate,
DestReceiver *dest;
cstate->rel = NULL;
+ cstate->partitions = NIL;
/*
* Run parse analysis and rewrite. Note this also acquires sufficient
@@ -1030,7 +1063,7 @@ DoCopyTo(CopyToState cstate)
TupleDesc tupDesc;
int num_phys_attrs;
ListCell *cur;
- uint64 processed;
+ uint64 processed = 0;
if (fe_copy)
SendCopyBegin(cstate);
@@ -1068,36 +1101,25 @@ DoCopyTo(CopyToState cstate)
cstate->routine->CopyToStart(cstate, tupDesc);
- if (cstate->rel)
+ /*
+ * if COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
- TupleTableSlot *slot;
- TableScanDesc scandesc;
-
- scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
- slot = table_slot_create(cstate->rel, NULL);
-
- processed = 0;
- while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ foreach_oid(scan_oid, cstate->partitions)
{
- CHECK_FOR_INTERRUPTS();
+ Relation scan_rel;
- /* Deconstruct the tuple ... */
- slot_getallattrs(slot);
+ scan_rel = table_open(scan_oid, AccessShareLock);
- /* Format and send the data */
- CopyOneRowTo(cstate, slot);
+ CopyThisRelTo(cstate, scan_rel, cstate->rel, &processed);
- /*
- * Increment the number of processed tuples, and report the
- * progress.
- */
- pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
- ++processed);
+ table_close(scan_rel, AccessShareLock);
}
-
- ExecDropSingleTupleTableSlot(slot);
- table_endscan(scandesc);
}
+ else if (cstate->rel)
+ CopyThisRelTo(cstate, cstate->rel, NULL, &processed);
else
{
/* run the plan --- the dest receiver will send tuples */
@@ -1115,6 +1137,72 @@ DoCopyTo(CopyToState cstate)
return processed;
}
+/*
+ * rel: the relation from which the actual data will be copied.
+ * root_rel: if not NULL, it indicates that we are copying partitioned relation
+ * data to the destination, and "rel" is the partition of "root_rel".
+ * processed: number of tuples processed.
+*/
+static void
+CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed)
+{
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ AttrMap *map = NULL;
+ TupleTableSlot *root_slot = NULL;
+ TupleDesc tupdesc;
+ TupleDesc rootdesc;
+
+ tupdesc = RelationGetDescr(rel);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(rel, NULL);
+
+ /*
+ * A partition's rowtype might differ from the root table's. Since we are
+ * export partitioned table data here, we must convert it back to the root
+ * table's rowtype.
+ */
+ if (root_rel != NULL)
+ {
+ rootdesc = RelationGetDescr(root_rel);
+ root_slot = table_slot_create(root_rel, NULL);
+ map = build_attrmap_by_name_if_req(rootdesc, tupdesc, false);
+ }
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ TupleTableSlot *copyslot;
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* Deconstruct the tuple ... */
+ if (map != NULL)
+ copyslot = execute_attr_map_slot(map, slot, root_slot);
+ else
+ {
+ slot_getallattrs(slot);
+ copyslot = slot;
+ }
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, copyslot);
+
+ /*
+ * Increment the number of processed tuples, and report the
+ * progress.
+ */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++(*processed));
+ }
+
+ ExecDropSingleTupleTableSlot(slot);
+
+ if (root_slot != NULL)
+ ExecDropSingleTupleTableSlot(root_slot);
+ table_endscan(scandesc);
+}
+
/*
* Emit one row during DoCopyTo().
*/
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 8d5a06563c4..05e5649c1bc 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -350,3 +350,21 @@ COPY copytest_mv(id) TO stdout WITH (header);
id
1
DROP MATERIALIZED VIEW copytest_mv;
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+COPY pp TO stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f0b88a23db8..fd2627deefa 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -375,3 +375,18 @@ COPY copytest_mv(id) TO stdout WITH (header);
REFRESH MATERIALIZED VIEW copytest_mv;
COPY copytest_mv(id) TO stdout WITH (header);
DROP MATERIALIZED VIEW copytest_mv;
+
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+
+COPY pp TO stdout(header);
+DROP TABLE PP;
\ No newline at end of file
--
2.34.1
On 2025-06-05 09:45, jian he wrote:
hi.
In the V10 patch, there will be some regression if the partition column
ordering is different from the root partitioned table.because in V10 CopyThisRelTo + while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot)) + { + if (map != NULL) + { + original_slot = slot; + root_slot = MakeSingleTupleTableSlot(rootdesc, &TTSOpsBufferHeapTuple); + slot = execute_attr_map_slot(map, slot, root_slot); + } + else + slot_getallattrs(slot); + + if (original_slot != NULL) + ExecDropSingleTupleTableSlot(original_slot); +} as you can see, for each slot in the partition, i called MakeSingleTupleTableSlot to get the dumpy root_slot and ExecDropSingleTupleTableSlot too. that will cause overhead.we can call produce root_slot before the main while loop. like the following: + if (root_rel != NULL) + { + rootdesc = RelationGetDescr(root_rel); + root_slot = table_slot_create(root_rel, NULL); + map = build_attrmap_by_name_if_req(rootdesc, tupdesc, false); + } .... + while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot)) + { + TupleTableSlot *copyslot; + if (map != NULL) + copyslot = execute_attr_map_slot(map, slot, root_slot); + else + { + slot_getallattrs(slot); + copyslot = slot; + } + please check CopyThisRelTo in v11. so, with v11, there is no regression for case when partition column ordering differs from partitioned.
Thanks for working on this improvement.
Here are some minor comments on v11 patch:
+ For example, if <replaceable class="parameter">table</replaceable>
is not partitioned table,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal> copies the same rows as
<literal>SELECT * FROM ONLY <replaceable
class="parameter">table</replaceable></literal>
This describes the behavior when the table is not partitioned, but would
it also be helpful to mention the behavior when the table is a
partitioned table?
For example:
If table is a partitioned table, then COPY table TO copies the same
rows as SELECT * FROM table.
+ * if COPY TO source table is a partitioned table, then open
each
if -> If
+ scan_rel = table_open(scan_oid,
AccessShareLock);- /* Format and send the data */ - CopyOneRowTo(cstate, slot); + CopyThisRelTo(cstate, scan_rel, cstate->rel, &processed);- /* - * Increment the number of processed tuples, and report the - * progress. - */ - pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED, - ++processed); + table_close(scan_rel, AccessShareLock)
After applying the patch, blank lines exist between these statements as
below. Do we really need these blank lines?
```
scan_rel = table_open(scan_oid,
AccessShareLock);
CopyThisRelTo(cstate, scan_rel, cstate->rel,
&processed);
table_close(scan_rel, AccessShareLock);
``
+/* + * rel: the relation from which the actual data will be copied. + * root_rel: if not NULL, it indicates that we are copying partitioned relation + * data to the destination, and "rel" is the partition of "root_rel". + * processed: number of tuples processed. +*/ +static void +CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
This comment only describes the parameters. Wouldn't it better to add a
brief summary of what this function does overall?
+ * A partition's rowtype might differ from the root table's.
Since we are
+ * export partitioned table data here, we must convert it back
to the root
are export -> are exporting?
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
On Thu, Jun 26, 2025 at 9:43 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
After applying the patch, blank lines exist between these statements as
below. Do we really need these blank lines?```
scan_rel = table_open(scan_oid,
AccessShareLock);CopyThisRelTo(cstate, scan_rel, cstate->rel,
&processed);table_close(scan_rel, AccessShareLock);
``
we can remove these empty new lines.
actually, I realized we don't need to use AccessShareLock here—we can use NoLock
instead, since BeginCopyTo has already acquired AccessShareLock via
find_all_inheritors.
+/* + * rel: the relation from which the actual data will be copied. + * root_rel: if not NULL, it indicates that we are copying partitioned relation + * data to the destination, and "rel" is the partition of "root_rel". + * processed: number of tuples processed. +*/ +static void +CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,This comment only describes the parameters. Wouldn't it better to add a
brief summary of what this function does overall?
what do you think the following
/*
* CopyThisRelTo:
* This will scanning a single table (which may be a partition) and exporting
* its rows to a COPY destination.
*
* rel: the relation from which the actual data will be copied.
* root_rel: if not NULL, it indicates that we are copying partitioned relation
* data to the destination, and "rel" is the partition of "root_rel".
* processed: number of tuples processed.
*/
static void
CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
uint64 *processed)
Attachments:
v12-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=US-ASCII; name=v12-0001-support-COPY-partitioned_table-TO.patchDownload
From 9257c14d08c0c0a53262cf3d7be70dfc2cfa62df Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 27 Jun 2025 15:12:24 +0800
Subject: [PATCH v12 1/1] support COPY partitioned_table TO
this is for implementatoin of ``COPY (partitioned_table) TO``. it will be
faster than ``COPY (select * from partitioned_table) TO``.
If destination table is a partitioned table, COPY table TO copies the same rows
as SELECT * FROM table.
reivewed by: vignesh C <vignesh21@gmail.com>
reivewed by: David Rowley <dgrowleyml@gmail.com>
reivewed by: Melih Mutlu <m.melihmutlu@gmail.com>
reivewed by: Kirill Reshke <reshkekirill@gmail.com>
reivewed by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5467
---
doc/src/sgml/ref/copy.sgml | 9 +-
src/backend/commands/copyto.c | 152 +++++++++++++++++++++++------
src/test/regress/expected/copy.out | 18 ++++
src/test/regress/sql/copy.sql | 15 +++
4 files changed, 160 insertions(+), 34 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8433344e5b6..0775a799a5e 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -521,13 +521,16 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used with plain
- tables and populated materialized views.
- For example,
+ tables, populated materialized views and partitioned tables.
+ For example, if <replaceable class="parameter">table</replaceable> is not partitioned table,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal> copies the same rows as
<literal>SELECT * FROM ONLY <replaceable class="parameter">table</replaceable></literal>.
+ If <replaceable class="parameter">table</replaceable> is a partitioned table,
+ <literal>COPY <replaceable class="parameter">table</replaceable> TO</literal>
+ copies the same rows as <literal>SELECT * FROM <replaceable class="parameter">table</replaceable></literal>.
However it doesn't directly support other relation types,
- such as partitioned tables, inheritance child tables, or views.
+ such as inheritance child tables, or views.
To copy all rows from such relations, use <literal>COPY (SELECT * FROM
<replaceable class="parameter">table</replaceable>) TO</literal>.
</para>
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ea6f18f2c80..fbfe6d926d0 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,6 +19,8 @@
#include <sys/stat.h>
#include "access/tableam.h"
+#include "access/table.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ List *partitions; /* oid list of partition oid for copy to */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -116,6 +119,8 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyThisRelTo(CopyToState cstate, Relation rel,
+ Relation root_rel, uint64 *processed);
/* built-in format-specific routines */
static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -643,6 +648,8 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
+ List *scan_oids = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -673,11 +680,35 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ children = find_all_inheritors(RelationGetRelid(rel),
+ AccessShareLock,
+ NULL);
+
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ {
+ char *relation_name;
+
+ relation_name = get_rel_name(childreloid);
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"", relation_name),
+ errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s.%s\"",
+ relation_name, RelationGetRelationName(rel),
+ get_namespace_name(rel->rd_rel->relnamespace)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+ }
+
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ continue;
+
+ scan_oids = lappend_oid(scan_oids, childreloid);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -713,6 +744,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = list_copy(scan_oids);
}
else
{
@@ -722,6 +754,7 @@ BeginCopyTo(ParseState *pstate,
DestReceiver *dest;
cstate->rel = NULL;
+ cstate->partitions = NIL;
/*
* Run parse analysis and rewrite. Note this also acquires sufficient
@@ -1030,7 +1063,7 @@ DoCopyTo(CopyToState cstate)
TupleDesc tupDesc;
int num_phys_attrs;
ListCell *cur;
- uint64 processed;
+ uint64 processed = 0;
if (fe_copy)
SendCopyBegin(cstate);
@@ -1068,36 +1101,23 @@ DoCopyTo(CopyToState cstate)
cstate->routine->CopyToStart(cstate, tupDesc);
- if (cstate->rel)
+ /*
+ * If COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
- TupleTableSlot *slot;
- TableScanDesc scandesc;
-
- scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
- slot = table_slot_create(cstate->rel, NULL);
-
- processed = 0;
- while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ foreach_oid(scan_oid, cstate->partitions)
{
- CHECK_FOR_INTERRUPTS();
+ Relation scan_rel;
- /* Deconstruct the tuple ... */
- slot_getallattrs(slot);
-
- /* Format and send the data */
- CopyOneRowTo(cstate, slot);
-
- /*
- * Increment the number of processed tuples, and report the
- * progress.
- */
- pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
- ++processed);
+ scan_rel = table_open(scan_oid, NoLock);
+ CopyThisRelTo(cstate, scan_rel, cstate->rel, &processed);
+ table_close(scan_rel, NoLock);
}
-
- ExecDropSingleTupleTableSlot(slot);
- table_endscan(scandesc);
}
+ else if (cstate->rel)
+ CopyThisRelTo(cstate, cstate->rel, NULL, &processed);
else
{
/* run the plan --- the dest receiver will send tuples */
@@ -1115,6 +1135,76 @@ DoCopyTo(CopyToState cstate)
return processed;
}
+/*
+ * CopyThisRelTo:
+ * This will scanning a single table (which may be a partition) and exporting
+ * its rows to a COPY destination.
+ *
+ * rel: the relation from which the actual data will be copied.
+ * root_rel: if not NULL, it indicates that we are copying partitioned relation
+ * data to the destination, and "rel" is the partition of "root_rel".
+ * processed: number of tuples processed.
+*/
+static void
+CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed)
+{
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ AttrMap *map = NULL;
+ TupleTableSlot *root_slot = NULL;
+ TupleDesc tupdesc;
+ TupleDesc rootdesc;
+
+ tupdesc = RelationGetDescr(rel);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(rel, NULL);
+
+ /*
+ * A partition's rowtype might differ from the root table's. Since we are
+ * exporting partitioned table data here, we must convert it back to the
+ * root table's rowtype.
+ */
+ if (root_rel != NULL)
+ {
+ rootdesc = RelationGetDescr(root_rel);
+ root_slot = table_slot_create(root_rel, NULL);
+ map = build_attrmap_by_name_if_req(rootdesc, tupdesc, false);
+ }
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ TupleTableSlot *copyslot;
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* Deconstruct the tuple ... */
+ if (map != NULL)
+ copyslot = execute_attr_map_slot(map, slot, root_slot);
+ else
+ {
+ slot_getallattrs(slot);
+ copyslot = slot;
+ }
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, copyslot);
+
+ /*
+ * Increment the number of processed tuples, and report the
+ * progress.
+ */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++(*processed));
+ }
+
+ ExecDropSingleTupleTableSlot(slot);
+
+ if (root_slot != NULL)
+ ExecDropSingleTupleTableSlot(root_slot);
+ table_endscan(scandesc);
+}
+
/*
* Emit one row during DoCopyTo().
*/
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 8d5a06563c4..05e5649c1bc 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -350,3 +350,21 @@ COPY copytest_mv(id) TO stdout WITH (header);
id
1
DROP MATERIALIZED VIEW copytest_mv;
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+COPY pp TO stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f0b88a23db8..9be7cb6c8dc 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -375,3 +375,18 @@ COPY copytest_mv(id) TO stdout WITH (header);
REFRESH MATERIALIZED VIEW copytest_mv;
COPY copytest_mv(id) TO stdout WITH (header);
DROP MATERIALIZED VIEW copytest_mv;
+
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+
+COPY pp TO stdout(header);
+DROP TABLE PP;
--
2.34.1
On 2025-06-27 16:14, jian he wrote:
Thanks for updating the patch!
On Thu, Jun 26, 2025 at 9:43 AM torikoshia <torikoshia@oss.nttdata.com>
wrote:After applying the patch, blank lines exist between these statements
as
below. Do we really need these blank lines?```
scan_rel = table_open(scan_oid,
AccessShareLock);CopyThisRelTo(cstate, scan_rel, cstate->rel,
&processed);table_close(scan_rel, AccessShareLock);
``we can remove these empty new lines.
actually, I realized we don't need to use AccessShareLock here—we can
use NoLock
instead, since BeginCopyTo has already acquired AccessShareLock via
find_all_inheritors.
That makes sense.
I think it would be helpful to add a comment explaining why NoLock is
safe here — for example:
/* We already got the needed lock */
In fact, in other places where table_open(..., NoLock) is used, similar
explanatory comments are often included(Above comment is one of them).
+/* + * rel: the relation from which the actual data will be copied. + * root_rel: if not NULL, it indicates that we are copying partitioned relation + * data to the destination, and "rel" is the partition of "root_rel". + * processed: number of tuples processed. +*/ +static void +CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,This comment only describes the parameters. Wouldn't it better to add
a
brief summary of what this function does overall?what do you think the following
/*
* CopyThisRelTo:
* This will scanning a single table (which may be a partition) and
exporting
* its rows to a COPY destination.
*
* rel: the relation from which the actual data will be copied.
* root_rel: if not NULL, it indicates that we are copying partitioned
relation
* data to the destination, and "rel" is the partition of "root_rel".
* processed: number of tuples processed.
*/
static void
CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
uint64 *processed)
I think it would be better to follow the style of nearby functions in
the same file. For example:
/*
* Scan a single table (which may be a partition) and export
* its rows to the COPY destination.
*/
Also, regarding the function name CopyThisRelTo() — I wonder if the
"This" is really necessary?
Maybe something simpler like CopyRelTo() is enough.
What do you think?
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
On Mon, Jun 30, 2025 at 3:57 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
```
scan_rel = table_open(scan_oid,
AccessShareLock);CopyThisRelTo(cstate, scan_rel, cstate->rel,
&processed);table_close(scan_rel, AccessShareLock);
``we can remove these empty new lines.
actually, I realized we don't need to use AccessShareLock here—we can
use NoLock
instead, since BeginCopyTo has already acquired AccessShareLock via
find_all_inheritors.That makes sense.
I think it would be helpful to add a comment explaining why NoLock is
safe here — for example:/* We already got the needed lock */
In fact, in other places where table_open(..., NoLock) is used, similar
explanatory comments are often included(Above comment is one of them).
hi.
I changed it to:
foreach_oid(scan_oid, cstate->partitions)
{
Relation scan_rel;
/* We already got the needed lock in BeginCopyTo */
scan_rel = table_open(scan_oid, NoLock);
CopyRelTo(cstate, scan_rel, cstate->rel, &processed);
table_close(scan_rel, NoLock);
}
what do you think the following
/*
* CopyThisRelTo:
* This will scanning a single table (which may be a partition) and
exporting
* its rows to a COPY destination.
*
* rel: the relation from which the actual data will be copied.
* root_rel: if not NULL, it indicates that we are copying partitioned
relation
* data to the destination, and "rel" is the partition of "root_rel".
* processed: number of tuples processed.
*/
static void
CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
uint64 *processed)I think it would be better to follow the style of nearby functions in
the same file. For example:/*
* Scan a single table (which may be a partition) and export
* its rows to the COPY destination.
*/
now it is:
/*
* Scan a single table (which may be a partition) and export its rows to the
* COPY destination.
*
* rel: the relation from which the actual data will be copied.
* root_rel: if not NULL, it indicates that we are copying partitioned relation
* data to the destination, and "rel" is the partition of "root_rel".
* processed: number of tuples processed.
*/
static void
CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
uint64 *processed)
Also, regarding the function name CopyThisRelTo() — I wonder if the
"This" is really necessary?
Maybe something simpler like CopyRelTo() is enough.What do you think?
sure. CopyRelTo looks good to me.
while at it.
I found that in BeginCopyTo:
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot copy from foreign table \"%s\"",
RelationGetRelationName(rel)),
errhint("Try the COPY (SELECT ...) TO variant.")));
ereport(ERROR,
errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot copy from foreign table
\"%s\"", relation_name),
errdetail("Partition \"%s\" is a foreign
table in the partitioned table \"%s\"",
relation_name,
RelationGetRelationName(rel)),
errhint("Try the COPY (SELECT ...) TO variant."));
don't have any regress tests on it.
see https://coverage.postgresql.org/src/backend/commands/copyto.c.gcov.html
So I added some tests on contrib/postgres_fdw/sql/postgres_fdw.sql
Attachments:
v13-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=US-ASCII; name=v13-0001-support-COPY-partitioned_table-TO.patchDownload
From 26eb0aa22c091d9f3de6db0433b5202ebe564bdd Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Wed, 2 Jul 2025 12:07:19 +0800
Subject: [PATCH v13 1/1] support COPY partitioned_table TO
This is for implementation of ``COPY (partitioned_table) TO``. it will be
faster than ``COPY (select * from partitioned_table) TO``.
If the destination table is a partitioned table, COPY table TO copies the same
rows as SELECT * FROM table.
reivewed by: vignesh C <vignesh21@gmail.com>
reivewed by: David Rowley <dgrowleyml@gmail.com>
reivewed by: Melih Mutlu <m.melihmutlu@gmail.com>
reivewed by: Kirill Reshke <reshkekirill@gmail.com>
reivewed by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5467
---
.../postgres_fdw/expected/postgres_fdw.out | 8 +
contrib/postgres_fdw/sql/postgres_fdw.sql | 4 +
doc/src/sgml/ref/copy.sgml | 9 +-
src/backend/commands/copyto.c | 151 ++++++++++++++----
src/test/regress/expected/copy.out | 18 +++
src/test/regress/sql/copy.sql | 15 ++
6 files changed, 171 insertions(+), 34 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2185b42bb4f..05f64157832 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11475,6 +11475,14 @@ SELECT * FROM result_tbl ORDER BY a;
(3 rows)
DELETE FROM result_tbl;
+-- Test COPY TO with a foreign table or when the foreign table is a partition
+COPY async_p3 TO stdout; --error
+ERROR: cannot copy from foreign table "async_p3"
+HINT: Try the COPY (SELECT ...) TO variant.
+COPY async_pt TO stdout; --error
+ERROR: cannot copy from foreign table "async_p1"
+DETAIL: Partition "async_p1" is a foreign table in the partitioned table "async_pt"
+HINT: Try the COPY (SELECT ...) TO variant.
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
-- Check case where the partitioned table has local/remote partitions
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index e534b40de3c..d11105a20dc 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3872,6 +3872,10 @@ INSERT INTO result_tbl SELECT * FROM async_pt WHERE b === 505;
SELECT * FROM result_tbl ORDER BY a;
DELETE FROM result_tbl;
+-- Test COPY TO with a foreign table or when the foreign table is a partition
+COPY async_p3 TO stdout; --error
+COPY async_pt TO stdout; --error
+
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8433344e5b6..0775a799a5e 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -521,13 +521,16 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used with plain
- tables and populated materialized views.
- For example,
+ tables, populated materialized views and partitioned tables.
+ For example, if <replaceable class="parameter">table</replaceable> is not partitioned table,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal> copies the same rows as
<literal>SELECT * FROM ONLY <replaceable class="parameter">table</replaceable></literal>.
+ If <replaceable class="parameter">table</replaceable> is a partitioned table,
+ <literal>COPY <replaceable class="parameter">table</replaceable> TO</literal>
+ copies the same rows as <literal>SELECT * FROM <replaceable class="parameter">table</replaceable></literal>.
However it doesn't directly support other relation types,
- such as partitioned tables, inheritance child tables, or views.
+ such as inheritance child tables, or views.
To copy all rows from such relations, use <literal>COPY (SELECT * FROM
<replaceable class="parameter">table</replaceable>) TO</literal>.
</para>
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ea6f18f2c80..ca25cba2a15 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,6 +19,8 @@
#include <sys/stat.h>
#include "access/tableam.h"
+#include "access/table.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ List *partitions; /* oid list of partition oid for copy to */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -116,6 +119,8 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed);
/* built-in format-specific routines */
static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -643,6 +648,8 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
+ List *scan_oids = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -673,11 +680,34 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ children = find_all_inheritors(RelationGetRelid(rel),
+ AccessShareLock,
+ NULL);
+
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ {
+ char *relation_name;
+
+ relation_name = get_rel_name(childreloid);
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"", relation_name),
+ errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"",
+ relation_name, RelationGetRelationName(rel)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+ }
+
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ continue;
+
+ scan_oids = lappend_oid(scan_oids, childreloid);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -713,6 +743,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = list_copy(scan_oids);
}
else
{
@@ -722,6 +753,7 @@ BeginCopyTo(ParseState *pstate,
DestReceiver *dest;
cstate->rel = NULL;
+ cstate->partitions = NIL;
/*
* Run parse analysis and rewrite. Note this also acquires sufficient
@@ -1030,7 +1062,7 @@ DoCopyTo(CopyToState cstate)
TupleDesc tupDesc;
int num_phys_attrs;
ListCell *cur;
- uint64 processed;
+ uint64 processed = 0;
if (fe_copy)
SendCopyBegin(cstate);
@@ -1068,36 +1100,24 @@ DoCopyTo(CopyToState cstate)
cstate->routine->CopyToStart(cstate, tupDesc);
- if (cstate->rel)
+ /*
+ * If COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
- TupleTableSlot *slot;
- TableScanDesc scandesc;
-
- scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
- slot = table_slot_create(cstate->rel, NULL);
-
- processed = 0;
- while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ foreach_oid(scan_oid, cstate->partitions)
{
- CHECK_FOR_INTERRUPTS();
+ Relation scan_rel;
- /* Deconstruct the tuple ... */
- slot_getallattrs(slot);
-
- /* Format and send the data */
- CopyOneRowTo(cstate, slot);
-
- /*
- * Increment the number of processed tuples, and report the
- * progress.
- */
- pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
- ++processed);
+ /* We already got the needed lock in BeginCopyTo */
+ scan_rel = table_open(scan_oid, NoLock);
+ CopyRelTo(cstate, scan_rel, cstate->rel, &processed);
+ table_close(scan_rel, NoLock);
}
-
- ExecDropSingleTupleTableSlot(slot);
- table_endscan(scandesc);
}
+ else if (cstate->rel)
+ CopyRelTo(cstate, cstate->rel, NULL, &processed);
else
{
/* run the plan --- the dest receiver will send tuples */
@@ -1115,6 +1135,75 @@ DoCopyTo(CopyToState cstate)
return processed;
}
+/*
+ * Scan a single table (which may be a partition) and export its rows to the
+ * COPY destination.
+ *
+ * rel: the relation from which the actual data will be copied.
+ * root_rel: if not NULL, it indicates that we are copying partitioned relation
+ * data to the destination, and "rel" is the partition of "root_rel".
+ * processed: number of tuples processed.
+*/
+static void
+CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed)
+{
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ AttrMap *map = NULL;
+ TupleTableSlot *root_slot = NULL;
+ TupleDesc tupdesc;
+ TupleDesc rootdesc;
+
+ tupdesc = RelationGetDescr(rel);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(rel, NULL);
+
+ /*
+ * A partition's rowtype might differ from the root table's. Since we are
+ * exporting partitioned table data here, we must convert it back to the
+ * root table's rowtype.
+ */
+ if (root_rel != NULL)
+ {
+ rootdesc = RelationGetDescr(root_rel);
+ root_slot = table_slot_create(root_rel, NULL);
+ map = build_attrmap_by_name_if_req(rootdesc, tupdesc, false);
+ }
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ TupleTableSlot *copyslot;
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* Deconstruct the tuple ... */
+ if (map != NULL)
+ copyslot = execute_attr_map_slot(map, slot, root_slot);
+ else
+ {
+ slot_getallattrs(slot);
+ copyslot = slot;
+ }
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, copyslot);
+
+ /*
+ * Increment the number of processed tuples, and report the
+ * progress.
+ */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++(*processed));
+ }
+
+ ExecDropSingleTupleTableSlot(slot);
+
+ if (root_slot != NULL)
+ ExecDropSingleTupleTableSlot(root_slot);
+ table_endscan(scandesc);
+}
+
/*
* Emit one row during DoCopyTo().
*/
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 8d5a06563c4..05e5649c1bc 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -350,3 +350,21 @@ COPY copytest_mv(id) TO stdout WITH (header);
id
1
DROP MATERIALIZED VIEW copytest_mv;
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+COPY pp TO stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f0b88a23db8..9be7cb6c8dc 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -375,3 +375,18 @@ COPY copytest_mv(id) TO stdout WITH (header);
REFRESH MATERIALIZED VIEW copytest_mv;
COPY copytest_mv(id) TO stdout WITH (header);
DROP MATERIALIZED VIEW copytest_mv;
+
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+
+COPY pp TO stdout(header);
+DROP TABLE PP;
--
2.34.1
On 2025-07-02 13:10, jian he wrote:
Thanks for updating the patch.
now it is:
/*
* Scan a single table (which may be a partition) and export its rows
to the
* COPY destination.
Based on the explanations in the glossary, should 'parition' be
partitioned table/relation?
| -- https://www.postgresql.org/docs/devel/glossary.html
| partition: One of several disjoint (not overlapping) subsets of a
larger set
| Partitioned table(relation): A relation that is in semantic terms the
same as a table, but whose storage is distributed across several
partitions
Also, the terms "table" and "relation" seem to be used somewhat
interchangeably in this patch.
For consistency, perhaps it's better to pick one term and use it
consistently throughout the comments.
249 + * root_rel: if not NULL, it indicates that we are copying
partitioned relation
270 + * exporting partitioned table data here, we must convert it
back to the
*
* rel: the relation from which the actual data will be copied.
* root_rel: if not NULL, it indicates that we are copying partitioned
relation
* data to the destination, and "rel" is the partition of "root_rel".
* processed: number of tuples processed.
*/
static void
CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
uint64 *processed)Also, regarding the function name CopyThisRelTo() — I wonder if the
"This" is really necessary?
Maybe something simpler like CopyRelTo() is enough.What do you think?
sure. CopyRelTo looks good to me.
while at it.
I found that in BeginCopyTo:
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot copy from foreign table \"%s\"",
RelationGetRelationName(rel)),
errhint("Try the COPY (SELECT ...) TO
variant.")));ereport(ERROR,
errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot copy from foreign table
\"%s\"", relation_name),
errdetail("Partition \"%s\" is a foreign
table in the partitioned table \"%s\"",
relation_name,
RelationGetRelationName(rel)),
errhint("Try the COPY (SELECT ...) TO
variant."));don't have any regress tests on it.
Hmm, I agree there are no regression tests for this, but is it about
copying foreign table, isn't it?
Since this patch is primarily about supporting COPY on partitioned
tables, I’m not sure adding regression tests for foreign tables is in
scope here.
It might be better handled in a follow-up patch focused on improving
test coverage for such unsupported cases, if we decide that's
worthwhile.
--https://coverage.postgresql.org/src/backend/commands/copyto.c.gcov.html
670 0 : else if (rel->rd_rel->relkind ==
RELKIND_SEQUENCE)
671 0 : ereport(ERROR,
672 :
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
673 : errmsg("cannot copy from
sequence \"%s\"",
674 :
RelationGetRelationName(rel))));
Also, I’m not entirely sure how much value such tests would bring,
especially if the error paths are straightforward and unlikely to
regress.
Regarding performance: it's already confirmed that COPY
partitioned_table performs better than COPY (SELECT * FROM
partitioned_table) as expected [1]/messages/by-id/174219852967.294107.6195385625494034792.pgcf@coridan.postgresql.org.
I was a bit curious, though, whether this patch might introduce any
performance regression when copying a regular (non-partitioned) table.
To check this, I ran a simple benchmark and did not observe any
degradation.
To minimize I/O overhead, I used a tmpfs mount:
% mkdir /tmp/mem
% sudo mount_tmpfs -s 500M /tmp/mem
% pgbench -i
Then I ran the following command several times on both patched and
unpatched builds:
=# COPY pgbench_accounts TO '/tmp/mem/accounts';
[1]: /messages/by-id/174219852967.294107.6195385625494034792.pgcf@coridan.postgresql.org
/messages/by-id/174219852967.294107.6195385625494034792.pgcf@coridan.postgresql.org
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
On 2025-Jul-02, jian he wrote:
@@ -673,11 +680,34 @@ BeginCopyTo(ParseState *pstate, errmsg("cannot copy from sequence \"%s\"", RelationGetRelationName(rel)))); else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - ereport(ERROR, - (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg("cannot copy from partitioned table \"%s\"", - RelationGetRelationName(rel)), - errhint("Try the COPY (SELECT ...) TO variant."))); + { + children = find_all_inheritors(RelationGetRelid(rel), + AccessShareLock, + NULL); + + foreach_oid(childreloid, children) + { + char relkind = get_rel_relkind(childreloid); + + if (relkind == RELKIND_FOREIGN_TABLE) + { + char *relation_name; + + relation_name = get_rel_name(childreloid); + ereport(ERROR, + errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot copy from foreign table \"%s\"", relation_name), + errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"", + relation_name, RelationGetRelationName(rel)), + errhint("Try the COPY (SELECT ...) TO variant.")); + }
This code looks like it's duplicating what you could obtain by using
RelationGetPartitionDesc and then observe the ->isleaf bits. Maybe have
a look at the function RelationHasForeignPartition() in the patch at
/messages/by-id/CANhcyEW_s2LD6RiDSMHtWQnpYB67EWXqf7N8mn7dOrnaKMfROg@mail.gmail.com
which looks very similar to what you need here. I think that would also
have the (maybe dubious) advantage that the rows will be output in
partition bound order rather than breadth-first (partition hierarchy)
OID order.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"The saddest aspect of life right now is that science gathers knowledge faster
than society gathers wisdom." (Isaac Asimov)
On Mon, Jul 14, 2025 at 10:02 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
On 2025-Jul-02, jian he wrote:
@@ -673,11 +680,34 @@ BeginCopyTo(ParseState *pstate, errmsg("cannot copy from sequence \"%s\"", RelationGetRelationName(rel)))); else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - ereport(ERROR, - (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg("cannot copy from partitioned table \"%s\"", - RelationGetRelationName(rel)), - errhint("Try the COPY (SELECT ...) TO variant."))); + { + children = find_all_inheritors(RelationGetRelid(rel), + AccessShareLock, + NULL); + + foreach_oid(childreloid, children) + { + char relkind = get_rel_relkind(childreloid); + + if (relkind == RELKIND_FOREIGN_TABLE) + { + char *relation_name; + + relation_name = get_rel_name(childreloid); + ereport(ERROR, + errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot copy from foreign table \"%s\"", relation_name), + errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"", + relation_name, RelationGetRelationName(rel)), + errhint("Try the COPY (SELECT ...) TO variant.")); + }This code looks like it's duplicating what you could obtain by using
RelationGetPartitionDesc and then observe the ->isleaf bits. Maybe have
a look at the function RelationHasForeignPartition() in the patch at
/messages/by-id/CANhcyEW_s2LD6RiDSMHtWQnpYB67EWXqf7N8mn7dOrnaKMfROg@mail.gmail.com
which looks very similar to what you need here. I think that would also
have the (maybe dubious) advantage that the rows will be output in
partition bound order rather than breadth-first (partition hierarchy)
OID order.
hi.
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
PartitionDesc pd = RelationGetPartitionDesc(rel, true);
for (int i = 0; i < pd->nparts; i++)
{
Relation partRel;
if (!pd->is_leaf[i])
continue;
partRel = table_open(pd->oids[i], AccessShareLock);
if (partRel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
ereport(ERROR,
errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot copy from foreign table
\"%s\"", RelationGetRelationName(partRel)),
errdetail("Partition \"%s\" is a foreign
table in the partitioned table \"%s\"",
RelationGetRelationName(partRel), RelationGetRelationName(rel)),
errhint("Try the COPY (SELECT ...) TO
variant."));
table_close(partRel, NoLock);
scan_oids = lappend_oid(scan_oids, RelationGetRelid(partRel));
}
}
I tried the above code, but it doesn't work because RelationGetPartitionDesc
only retrieves the immediate partition descriptor of a partitioned relation, it
doesn't recurse to the lowest level.
Actually Melih Mutlu raised this question at
/messages/by-id/CAGPVpCQou3hWQYUqXNTLKdcuO6envsWJYSJqbZZQnRCjZA6nkQ@mail.gmail.com
I kind of ignored it...
I guess we have to stick with find_all_inheritors here?
On Mon, Jul 14, 2025 at 9:38 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
Based on the explanations in the glossary, should 'parition' be
partitioned table/relation?
I think "Scan a single table (which may be a partition) and export its
rows to the...."
the word "partition" is correct.
| -- https://www.postgresql.org/docs/devel/glossary.html
| partition: One of several disjoint (not overlapping) subsets of a
larger set
| Partitioned table(relation): A relation that is in semantic terms the
same as a table, but whose storage is distributed across several
partitionsAlso, the terms "table" and "relation" seem to be used somewhat
interchangeably in this patch.
For consistency, perhaps it's better to pick one term and use it
consistently throughout the comments.249 + * root_rel: if not NULL, it indicates that we are copying
partitioned relation
270 + * exporting partitioned table data here, we must convert it
back to the
now it's:
+/*
+ * Scan a single table (which may be a partition) and export its rows to the
+ * COPY destination.
+ *
+ * rel: the table from which the actual data will be copied.
+ * root_rel: if not NULL, it indicates that COPY TO command copy partitioned
+ * table data to the destination, and "rel" is the partition of "root_rel".
+ * processed: number of tuples processed.
+*/
+static void
+CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed)
+{
+
+ tupdesc = RelationGetDescr(rel);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(rel, NULL);
+
+ /*
+ * A partition's rowtype might differ from the root table's. If we are
+ * exporting partition data here, we must convert it back to the root
+ * table's rowtype.
+ */
+ if (root_rel != NULL)
+ {
+ rootdesc = RelationGetDescr(root_rel);
+ root_slot = table_slot_create(root_rel, NULL);
+ map = build_attrmap_by_name_if_req(rootdesc, tupdesc, false);
+ }
+
while at it.
I found that in BeginCopyTo:
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot copy from foreign table \"%s\"",
RelationGetRelationName(rel)),
errhint("Try the COPY (SELECT ...) TO
variant.")));ereport(ERROR,
errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot copy from foreign table
\"%s\"", relation_name),
errdetail("Partition \"%s\" is a foreign
table in the partitioned table \"%s\"",
relation_name,
RelationGetRelationName(rel)),
errhint("Try the COPY (SELECT ...) TO
variant."));don't have any regress tests on it.
Hmm, I agree there are no regression tests for this, but is it about
copying foreign table, isn't it?Since this patch is primarily about supporting COPY on partitioned
tables, I’m not sure adding regression tests for foreign tables is in
scope here.
It might be better handled in a follow-up patch focused on improving
test coverage for such unsupported cases, if we decide that's
worthwhile.
i guess it should be fine.
since we are only adding one somehow related test case.
+-- Test COPY TO with a foreign table or when the foreign table is a partition
+COPY async_p3 TO stdout; --error
+COPY async_pt TO stdout; --error
Attachments:
v14-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=UTF-8; name=v14-0001-support-COPY-partitioned_table-TO.patchDownload
From a6064e7943d791329ef6e73b48e4695f5883a8eb Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Tue, 15 Jul 2025 12:11:35 +0800
Subject: [PATCH v14 1/1] support COPY partitioned_table TO
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This is for implementation of ``COPY (partitioned_table) TO``. it will be
faster than ``COPY (select * from partitioned_table) TO``.
If the destination table is a partitioned table, COPY table TO copies the same
rows as SELECT * FROM table.
reivewed by: vignesh C <vignesh21@gmail.com>
reivewed by: David Rowley <dgrowleyml@gmail.com>
reivewed by: Melih Mutlu <m.melihmutlu@gmail.com>
reivewed by: Kirill Reshke <reshkekirill@gmail.com>
reivewed by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
reivewed by: Álvaro Herrera <alvherre@kurilemu.de>
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5467
---
.../postgres_fdw/expected/postgres_fdw.out | 8 +
contrib/postgres_fdw/sql/postgres_fdw.sql | 4 +
doc/src/sgml/ref/copy.sgml | 9 +-
src/backend/commands/copyto.c | 151 ++++++++++++++----
src/test/regress/expected/copy.out | 18 +++
src/test/regress/sql/copy.sql | 15 ++
6 files changed, 171 insertions(+), 34 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2185b42bb4f..05f64157832 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11475,6 +11475,14 @@ SELECT * FROM result_tbl ORDER BY a;
(3 rows)
DELETE FROM result_tbl;
+-- Test COPY TO with a foreign table or when the foreign table is a partition
+COPY async_p3 TO stdout; --error
+ERROR: cannot copy from foreign table "async_p3"
+HINT: Try the COPY (SELECT ...) TO variant.
+COPY async_pt TO stdout; --error
+ERROR: cannot copy from foreign table "async_p1"
+DETAIL: Partition "async_p1" is a foreign table in the partitioned table "async_pt"
+HINT: Try the COPY (SELECT ...) TO variant.
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
-- Check case where the partitioned table has local/remote partitions
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index e534b40de3c..d11105a20dc 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3872,6 +3872,10 @@ INSERT INTO result_tbl SELECT * FROM async_pt WHERE b === 505;
SELECT * FROM result_tbl ORDER BY a;
DELETE FROM result_tbl;
+-- Test COPY TO with a foreign table or when the foreign table is a partition
+COPY async_p3 TO stdout; --error
+COPY async_pt TO stdout; --error
+
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c2d1fbc1fbe..f91bc9740ec 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -539,13 +539,16 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used with plain
- tables and populated materialized views.
- For example,
+ tables, populated materialized views and partitioned tables.
+ For example, if <replaceable class="parameter">table</replaceable> is not partitioned table,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal> copies the same rows as
<literal>SELECT * FROM ONLY <replaceable class="parameter">table</replaceable></literal>.
+ If <replaceable class="parameter">table</replaceable> is a partitioned table,
+ <literal>COPY <replaceable class="parameter">table</replaceable> TO</literal>
+ copies the same rows as <literal>SELECT * FROM <replaceable class="parameter">table</replaceable></literal>.
However it doesn't directly support other relation types,
- such as partitioned tables, inheritance child tables, or views.
+ such as inheritance child tables, or views.
To copy all rows from such relations, use <literal>COPY (SELECT * FROM
<replaceable class="parameter">table</replaceable>) TO</literal>.
</para>
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 67b94b91cae..fa61032491e 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,6 +19,8 @@
#include <sys/stat.h>
#include "access/tableam.h"
+#include "access/table.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ List *partitions; /* oid list of partition oid for copy to */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -116,6 +119,8 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed);
/* built-in format-specific routines */
static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -643,6 +648,8 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
+ List *scan_oids = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -673,11 +680,34 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ children = find_all_inheritors(RelationGetRelid(rel),
+ AccessShareLock,
+ NULL);
+
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ {
+ char *relation_name;
+
+ relation_name = get_rel_name(childreloid);
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"", relation_name),
+ errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"",
+ relation_name, RelationGetRelationName(rel)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+ }
+
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ continue;
+
+ scan_oids = lappend_oid(scan_oids, childreloid);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -713,6 +743,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = list_copy(scan_oids);
}
else
{
@@ -722,6 +753,7 @@ BeginCopyTo(ParseState *pstate,
DestReceiver *dest;
cstate->rel = NULL;
+ cstate->partitions = NIL;
/*
* Run parse analysis and rewrite. Note this also acquires sufficient
@@ -1030,7 +1062,7 @@ DoCopyTo(CopyToState cstate)
TupleDesc tupDesc;
int num_phys_attrs;
ListCell *cur;
- uint64 processed;
+ uint64 processed = 0;
if (fe_copy)
SendCopyBegin(cstate);
@@ -1068,36 +1100,24 @@ DoCopyTo(CopyToState cstate)
cstate->routine->CopyToStart(cstate, tupDesc);
- if (cstate->rel)
+ /*
+ * If COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
- TupleTableSlot *slot;
- TableScanDesc scandesc;
-
- scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
- slot = table_slot_create(cstate->rel, NULL);
-
- processed = 0;
- while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ foreach_oid(scan_oid, cstate->partitions)
{
- CHECK_FOR_INTERRUPTS();
+ Relation scan_rel;
- /* Deconstruct the tuple ... */
- slot_getallattrs(slot);
-
- /* Format and send the data */
- CopyOneRowTo(cstate, slot);
-
- /*
- * Increment the number of processed tuples, and report the
- * progress.
- */
- pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
- ++processed);
+ /* We already got the needed lock in BeginCopyTo */
+ scan_rel = table_open(scan_oid, NoLock);
+ CopyRelTo(cstate, scan_rel, cstate->rel, &processed);
+ table_close(scan_rel, NoLock);
}
-
- ExecDropSingleTupleTableSlot(slot);
- table_endscan(scandesc);
}
+ else if (cstate->rel)
+ CopyRelTo(cstate, cstate->rel, NULL, &processed);
else
{
/* run the plan --- the dest receiver will send tuples */
@@ -1115,6 +1135,75 @@ DoCopyTo(CopyToState cstate)
return processed;
}
+/*
+ * Scan a single table (which may be a partition) and export its rows to the
+ * COPY destination.
+ *
+ * rel: the table from which the actual data will be copied.
+ * root_rel: if not NULL, it indicates that COPY TO command copy partitioned
+ * table data to the destination, and "rel" is the partition of "root_rel".
+ * processed: number of tuples processed.
+*/
+static void
+CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed)
+{
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ AttrMap *map = NULL;
+ TupleTableSlot *root_slot = NULL;
+ TupleDesc tupdesc;
+ TupleDesc rootdesc;
+
+ tupdesc = RelationGetDescr(rel);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(rel, NULL);
+
+ /*
+ * A partition's rowtype might differ from the root table's. If we are
+ * exporting partition data here, we must convert it back to the root
+ * table's rowtype.
+ */
+ if (root_rel != NULL)
+ {
+ rootdesc = RelationGetDescr(root_rel);
+ root_slot = table_slot_create(root_rel, NULL);
+ map = build_attrmap_by_name_if_req(rootdesc, tupdesc, false);
+ }
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ TupleTableSlot *copyslot;
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* Deconstruct the tuple ... */
+ if (map != NULL)
+ copyslot = execute_attr_map_slot(map, slot, root_slot);
+ else
+ {
+ slot_getallattrs(slot);
+ copyslot = slot;
+ }
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, copyslot);
+
+ /*
+ * Increment the number of processed tuples, and report the
+ * progress.
+ */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++(*processed));
+ }
+
+ ExecDropSingleTupleTableSlot(slot);
+
+ if (root_slot != NULL)
+ ExecDropSingleTupleTableSlot(root_slot);
+ table_endscan(scandesc);
+}
+
/*
* Emit one row during DoCopyTo().
*/
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index ac66eb55aee..3bf5ecf469e 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -373,3 +373,21 @@ COPY copytest_mv(id) TO stdout WITH (header);
id
1
DROP MATERIALIZED VIEW copytest_mv;
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+COPY pp TO stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index a1316c73bac..3d84764c65f 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -405,3 +405,18 @@ COPY copytest_mv(id) TO stdout WITH (header);
REFRESH MATERIALIZED VIEW copytest_mv;
COPY copytest_mv(id) TO stdout WITH (header);
DROP MATERIALIZED VIEW copytest_mv;
+
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+
+COPY pp TO stdout(header);
+DROP TABLE PP;
--
2.34.1
On 2025-07-15 12:31, jian he wrote:
On Mon, Jul 14, 2025 at 10:02 PM Álvaro Herrera <alvherre@kurilemu.de>
wrote:On 2025-Jul-02, jian he wrote:
@@ -673,11 +680,34 @@ BeginCopyTo(ParseState *pstate, errmsg("cannot copy from sequence \"%s\"", RelationGetRelationName(rel)))); else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - ereport(ERROR, - (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg("cannot copy from partitioned table \"%s\"", - RelationGetRelationName(rel)), - errhint("Try the COPY (SELECT ...) TO variant."))); + { + children = find_all_inheritors(RelationGetRelid(rel), + AccessShareLock, + NULL); + + foreach_oid(childreloid, children) + { + char relkind = get_rel_relkind(childreloid); + + if (relkind == RELKIND_FOREIGN_TABLE) + { + char *relation_name; + + relation_name = get_rel_name(childreloid); + ereport(ERROR, + errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot copy from foreign table \"%s\"", relation_name), + errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"", + relation_name, RelationGetRelationName(rel)), + errhint("Try the COPY (SELECT ...) TO variant.")); + }This code looks like it's duplicating what you could obtain by using
RelationGetPartitionDesc and then observe the ->isleaf bits. Maybe
have
a look at the function RelationHasForeignPartition() in the patch at
/messages/by-id/CANhcyEW_s2LD6RiDSMHtWQnpYB67EWXqf7N8mn7dOrnaKMfROg@mail.gmail.com
which looks very similar to what you need here. I think that would
also
have the (maybe dubious) advantage that the rows will be output in
partition bound order rather than breadth-first (partition hierarchy)
OID order.hi.
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
PartitionDesc pd = RelationGetPartitionDesc(rel, true);
for (int i = 0; i < pd->nparts; i++)
{
Relation partRel;
if (!pd->is_leaf[i])
continue;
partRel = table_open(pd->oids[i], AccessShareLock);
if (partRel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
ereport(ERROR,
errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot copy from foreign table
\"%s\"", RelationGetRelationName(partRel)),
errdetail("Partition \"%s\" is a foreign
table in the partitioned table \"%s\"",RelationGetRelationName(partRel), RelationGetRelationName(rel)),
errhint("Try the COPY (SELECT ...) TO
variant."));
table_close(partRel, NoLock);
scan_oids = lappend_oid(scan_oids,
RelationGetRelid(partRel));
}
}I tried the above code, but it doesn't work because
RelationGetPartitionDesc
only retrieves the immediate partition descriptor of a partitioned
relation, it
doesn't recurse to the lowest level.Actually Melih Mutlu raised this question at
/messages/by-id/CAGPVpCQou3hWQYUqXNTLKdcuO6envsWJYSJqbZZQnRCjZA6nkQ@mail.gmail.com
I kind of ignored it...
I guess we have to stick with find_all_inheritors here?
That might be the case.
I thought we could consider using RelationHasForeignPartition() instead,
if [1]/messages/by-id/CANhcyEW_s2LD6RiDSMHtWQnpYB67EWXqf7N8mn7dOrnaKMfROg@mail.gmail.com gets committed.
However, since that function only tells us whether any foreign
partitions exist, whereas the current patch outputs the specific
problematic partitions or foreign tables in the log, I think the current
approach is more user-friendly.
<command>COPY TO</command> can be used with plain - tables and populated materialized views. - For example, + tables, populated materialized views and partitioned tables. + For example, if <replaceable
class="parameter">table</replaceable> is not partitioned table,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal> copies the same rows as
<literal>SELECT * FROM ONLY <replaceable
class="parameter">table</replaceable></literal>.
I believe "is not a partitioned table" here is intended to refer to both
plain tables and materialized views.
However, as far as I understand, using ONLY with a materialized view has
no effect.
So, wouldn’t it be better and clearer to say "if the table is a plain
table" instead?
I think the behavior for materialized views can be described along with
that for partitioned tables. For example:
<command>COPY TO</command> can be used with plain
tables, populated materialized views and partitioned tables.
For example, if <replaceable class="parameter">table</replaceable>
is a plain table,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal> copies the same rows as
<literal>SELECT * FROM ONLY <replaceable
class="parameter">table</replaceable></literal>.
If <replaceable class="parameter">table</replaceable> is a
partitioned table or a materialized view,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal>
copies the same rows as <literal>SELECT * FROM <replaceable
class="parameter">table</replaceable></literal>.
+ List *children = NIL;
...
@@ -673,11 +680,34 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table
\"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO
variant.")));
+ {
+ children = find_all_inheritors(RelationGetRelid(rel),
Since 'children' is only used inside the else if block, I think we don't
need the separate "List *children = NIL;" declaration.
Instead, it could just be "List *children = find_all_inheritors(...)".
[1]: /messages/by-id/CANhcyEW_s2LD6RiDSMHtWQnpYB67EWXqf7N8mn7dOrnaKMfROg@mail.gmail.com
/messages/by-id/CANhcyEW_s2LD6RiDSMHtWQnpYB67EWXqf7N8mn7dOrnaKMfROg@mail.gmail.com
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
On Mon, Jul 28, 2025 at 9:22 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
I think the behavior for materialized views can be described along with
that for partitioned tables. For example:<command>COPY TO</command> can be used with plain
tables, populated materialized views and partitioned tables.
For example, if <replaceable class="parameter">table</replaceable>
is a plain table,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal> copies the same rows as
<literal>SELECT * FROM ONLY <replaceable
class="parameter">table</replaceable></literal>.If <replaceable class="parameter">table</replaceable> is a
partitioned table or a materialized view,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal>
copies the same rows as <literal>SELECT * FROM <replaceable
class="parameter">table</replaceable></literal>.
Your description seems ok to me.
Let's see if anyone else has a different take.
+ List *children = NIL; ... + { + children = find_all_inheritors(RelationGetRelid(rel),Since 'children' is only used inside the else if block, I think we don't
need the separate "List *children = NIL;" declaration.
Instead, it could just be "List *children = find_all_inheritors(...)".
you are right.
""List *children = find_all_inheritors(...)"." should be ok.
On 2025-07-30 12:21, jian he wrote:
Hi, Jian
On Mon, Jul 28, 2025 at 9:22 AM torikoshia <torikoshia@oss.nttdata.com>
wrote:I think the behavior for materialized views can be described along
with
that for partitioned tables. For example:<command>COPY TO</command> can be used with plain
tables, populated materialized views and partitioned tables.
For example, if <replaceable
class="parameter">table</replaceable>
is a plain table,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal> copies the same rows as
<literal>SELECT * FROM ONLY <replaceable
class="parameter">table</replaceable></literal>.If <replaceable class="parameter">table</replaceable> is a
partitioned table or a materialized view,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal>
copies the same rows as <literal>SELECT * FROM <replaceable
class="parameter">table</replaceable></literal>.Your description seems ok to me.
Let's see if anyone else has a different take.
It’s been about two months since this discussion, and there don’t seem
to be any further comments.
How about updating the patch accordingly?
If there are no new remarks, I’d like to mark the patch as Ready for
Committer.
+ List *children = NIL; ... + { + children = find_all_inheritors(RelationGetRelid(rel),Since 'children' is only used inside the else if block, I think we
don't
need the separate "List *children = NIL;" declaration.
Instead, it could just be "List *children =
find_all_inheritors(...)".you are right.
""List *children = find_all_inheritors(...)"." should be ok.
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
On Fri, Oct 3, 2025 at 8:31 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
It’s been about two months since this discussion, and there don’t seem
to be any further comments.
How about updating the patch accordingly?
If there are no new remarks, I’d like to mark the patch as Ready for
Committer.+ List *children = NIL; ... + { + children = find_all_inheritors(RelationGetRelid(rel),Since 'children' is only used inside the else if block, I think we
don't
need the separate "List *children = NIL;" declaration.
Instead, it could just be "List *children =
find_all_inheritors(...)".you are right.
""List *children = find_all_inheritors(...)"." should be ok.
hi.
please check the attached v15.
only minor adjustment based on comments in
/messages/by-id/c507919d8c8219ab6cfd8376a4f9a887@oss.nttdata.com
Attachments:
v15-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=UTF-8; name=v15-0001-support-COPY-partitioned_table-TO.patchDownload
From bdfc161494ce6a5fedfaa901fde7dba0dcb79c2c Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Mon, 6 Oct 2025 17:39:57 +0800
Subject: [PATCH v15 1/1] support COPY partitioned_table TO
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This is for implementation of ``COPY (partitioned_table) TO``. it will be
faster than ``COPY (select * from partitioned_table) TO``.
If the destination table is a partitioned table, COPY table TO copies the same
rows as SELECT * FROM table.
reivewed by: vignesh C <vignesh21@gmail.com>
reivewed by: David Rowley <dgrowleyml@gmail.com>
reivewed by: Melih Mutlu <m.melihmutlu@gmail.com>
reivewed by: Kirill Reshke <reshkekirill@gmail.com>
reivewed by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
reivewed by: Álvaro Herrera <alvherre@kurilemu.de>
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5467
---
.../postgres_fdw/expected/postgres_fdw.out | 8 +
contrib/postgres_fdw/sql/postgres_fdw.sql | 4 +
doc/src/sgml/ref/copy.sgml | 9 +-
src/backend/commands/copyto.c | 152 ++++++++++++++----
src/test/regress/expected/copy.out | 18 +++
src/test/regress/sql/copy.sql | 15 ++
6 files changed, 172 insertions(+), 34 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 6dc04e916dc..99827de2dd8 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11596,6 +11596,14 @@ SELECT * FROM result_tbl ORDER BY a;
(3 rows)
DELETE FROM result_tbl;
+-- Test COPY TO with a foreign table or when the foreign table is a partition
+COPY async_p3 TO stdout; --error
+ERROR: cannot copy from foreign table "async_p3"
+HINT: Try the COPY (SELECT ...) TO variant.
+COPY async_pt TO stdout; --error
+ERROR: cannot copy from foreign table "async_p1"
+DETAIL: Partition "async_p1" is a foreign table in the partitioned table "async_pt"
+HINT: Try the COPY (SELECT ...) TO variant.
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
-- Check case where the partitioned table has local/remote partitions
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 3b7da128519..8a672f05039 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3941,6 +3941,10 @@ INSERT INTO result_tbl SELECT * FROM async_pt WHERE b === 505;
SELECT * FROM result_tbl ORDER BY a;
DELETE FROM result_tbl;
+-- Test COPY TO with a foreign table or when the foreign table is a partition
+COPY async_p3 TO stdout; --error
+COPY async_pt TO stdout; --error
+
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c2d1fbc1fbe..ecd300097fc 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -539,13 +539,16 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used with plain
- tables and populated materialized views.
- For example,
+ tables, populated materialized views and partitioned tables.
+ For example, if <replaceable class="parameter">table</replaceable> is a plain table,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal> copies the same rows as
<literal>SELECT * FROM ONLY <replaceable class="parameter">table</replaceable></literal>.
+ If <replaceable class="parameter">table</replaceable> is a partitioned table or a materialized view
+ <literal>COPY <replaceable class="parameter">table</replaceable> TO</literal>
+ copies the same rows as <literal>SELECT * FROM <replaceable class="parameter">table</replaceable></literal>.
However it doesn't directly support other relation types,
- such as partitioned tables, inheritance child tables, or views.
+ such as inheritance child tables, or views.
To copy all rows from such relations, use <literal>COPY (SELECT * FROM
<replaceable class="parameter">table</replaceable>) TO</literal>.
</para>
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 67b94b91cae..73f54ed5f62 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,6 +19,8 @@
#include <sys/stat.h>
#include "access/tableam.h"
+#include "access/table.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ List *partitions; /* oid list of partition oid for copy to */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -116,6 +119,8 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed);
/* built-in format-specific routines */
static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -643,6 +648,7 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *scan_oids = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -673,11 +679,36 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ List *children = NIL;
+
+ children = find_all_inheritors(RelationGetRelid(rel),
+ AccessShareLock,
+ NULL);
+
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ {
+ char *relation_name;
+
+ relation_name = get_rel_name(childreloid);
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"", relation_name),
+ errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"",
+ relation_name, RelationGetRelationName(rel)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+ }
+
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ continue;
+
+ scan_oids = lappend_oid(scan_oids, childreloid);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -713,6 +744,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = list_copy(scan_oids);
}
else
{
@@ -722,6 +754,7 @@ BeginCopyTo(ParseState *pstate,
DestReceiver *dest;
cstate->rel = NULL;
+ cstate->partitions = NIL;
/*
* Run parse analysis and rewrite. Note this also acquires sufficient
@@ -1030,7 +1063,7 @@ DoCopyTo(CopyToState cstate)
TupleDesc tupDesc;
int num_phys_attrs;
ListCell *cur;
- uint64 processed;
+ uint64 processed = 0;
if (fe_copy)
SendCopyBegin(cstate);
@@ -1068,36 +1101,24 @@ DoCopyTo(CopyToState cstate)
cstate->routine->CopyToStart(cstate, tupDesc);
- if (cstate->rel)
+ /*
+ * If COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
- TupleTableSlot *slot;
- TableScanDesc scandesc;
-
- scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
- slot = table_slot_create(cstate->rel, NULL);
-
- processed = 0;
- while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ foreach_oid(scan_oid, cstate->partitions)
{
- CHECK_FOR_INTERRUPTS();
+ Relation scan_rel;
- /* Deconstruct the tuple ... */
- slot_getallattrs(slot);
-
- /* Format and send the data */
- CopyOneRowTo(cstate, slot);
-
- /*
- * Increment the number of processed tuples, and report the
- * progress.
- */
- pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
- ++processed);
+ /* We already got the needed lock in BeginCopyTo */
+ scan_rel = table_open(scan_oid, NoLock);
+ CopyRelTo(cstate, scan_rel, cstate->rel, &processed);
+ table_close(scan_rel, NoLock);
}
-
- ExecDropSingleTupleTableSlot(slot);
- table_endscan(scandesc);
}
+ else if (cstate->rel)
+ CopyRelTo(cstate, cstate->rel, NULL, &processed);
else
{
/* run the plan --- the dest receiver will send tuples */
@@ -1115,6 +1136,75 @@ DoCopyTo(CopyToState cstate)
return processed;
}
+/*
+ * Scan a single table (which may be a partition) and export its rows to the
+ * COPY destination.
+ *
+ * rel: the table from which the actual data will be copied.
+ * root_rel: if not NULL, it indicates that COPY TO command copy partitioned
+ * table data to the destination, and "rel" is the partition of "root_rel".
+ * processed: number of tuples processed.
+*/
+static void
+CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed)
+{
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ AttrMap *map = NULL;
+ TupleTableSlot *root_slot = NULL;
+ TupleDesc tupdesc;
+ TupleDesc rootdesc;
+
+ tupdesc = RelationGetDescr(rel);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(rel, NULL);
+
+ /*
+ * A partition's rowtype might differ from the root table's. If we are
+ * exporting partition data here, we must convert it back to the root
+ * table's rowtype.
+ */
+ if (root_rel != NULL)
+ {
+ rootdesc = RelationGetDescr(root_rel);
+ root_slot = table_slot_create(root_rel, NULL);
+ map = build_attrmap_by_name_if_req(rootdesc, tupdesc, false);
+ }
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ TupleTableSlot *copyslot;
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* Deconstruct the tuple ... */
+ if (map != NULL)
+ copyslot = execute_attr_map_slot(map, slot, root_slot);
+ else
+ {
+ slot_getallattrs(slot);
+ copyslot = slot;
+ }
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, copyslot);
+
+ /*
+ * Increment the number of processed tuples, and report the
+ * progress.
+ */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++(*processed));
+ }
+
+ ExecDropSingleTupleTableSlot(slot);
+
+ if (root_slot != NULL)
+ ExecDropSingleTupleTableSlot(root_slot);
+ table_endscan(scandesc);
+}
+
/*
* Emit one row during DoCopyTo().
*/
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index ac66eb55aee..3bf5ecf469e 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -373,3 +373,21 @@ COPY copytest_mv(id) TO stdout WITH (header);
id
1
DROP MATERIALIZED VIEW copytest_mv;
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+COPY pp TO stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index a1316c73bac..3d84764c65f 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -405,3 +405,18 @@ COPY copytest_mv(id) TO stdout WITH (header);
REFRESH MATERIALIZED VIEW copytest_mv;
COPY copytest_mv(id) TO stdout WITH (header);
DROP MATERIALIZED VIEW copytest_mv;
+
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+
+COPY pp TO stdout(header);
+DROP TABLE PP;
--
2.34.1
Hi,
On Mon, Oct 6, 2025 at 2:49 AM jian he <jian.universality@gmail.com> wrote:
On Fri, Oct 3, 2025 at 8:31 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
It’s been about two months since this discussion, and there don’t seem
to be any further comments.
How about updating the patch accordingly?
If there are no new remarks, I’d like to mark the patch as Ready for
Committer.+ List *children = NIL; ... + { + children = find_all_inheritors(RelationGetRelid(rel),Since 'children' is only used inside the else if block, I think we
don't
need the separate "List *children = NIL;" declaration.
Instead, it could just be "List *children =
find_all_inheritors(...)".you are right.
""List *children = find_all_inheritors(...)"." should be ok.hi.
please check the attached v15.
Thank you for working on this! I've reviewed the v15 patch, and here
are review comments:
---
+ children = find_all_inheritors(RelationGetRelid(rel),
+ AccessShareLock,
+ NULL);
+
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
I think it's better to write some comments summarizing what we're
doing in the loop.
---
+ relation_name = get_rel_name(childreloid);
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table
\"%s\"", relation_name),
+ errdetail("Partition \"%s\" is a foreign
table in the partitioned table \"%s\"",
+ relation_name,
RelationGetRelationName(rel)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
I think we don't need "the" in the error message.
It's conventional to put all err*() macros in parentheses (i.e.,
"(errcode(), ...)", it's technically omittable though.
---
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ continue;
+
+ scan_oids = lappend_oid(scan_oids, childreloid);
find_all_inheritors() returns a list of OIDs of child relations. I
think we can delete relations whose kind is RELKIND_HAS_PARTITIONS()
from the list instead of creating a new list scan_oids. Then, we can
set cstate->partition to the list.
---
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = list_copy(scan_oids);
}
Why do we need to copy the list here?
---
With the patch we have:
/*
* If COPY TO source table is a partitioned table, then open each
* partition and process each individual partition.
*/
if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
foreach_oid(scan_oid, cstate->partitions)
{
Relation scan_rel;
/* We already got the needed lock in BeginCopyTo */
scan_rel = table_open(scan_oid, NoLock);
CopyRelTo(cstate, scan_rel, cstate->rel, &processed);
table_close(scan_rel, NoLock);
}
}
else if (cstate->rel)
CopyRelTo(cstate, cstate->rel, NULL, &processed);
else
{
/* run the plan --- the dest receiver will send tuples */
I think we can refactor the code structure as follow:
if (cstate->rel)
{
if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
do CopyRelTo() for each OIDs in cstate->partition here.
}
else
CopyRelTo(cstate, cstate->rel, NULL, &processed);
}
else
{
...
---
+ if (root_rel != NULL)
+ {
+ rootdesc = RelationGetDescr(root_rel);
+ root_slot = table_slot_create(root_rel, NULL);
+ map = build_attrmap_by_name_if_req(rootdesc, tupdesc, false);
+ }
rootdesc can be declared inside this if statement or we can directly
pass 'RelationGetDescr(root_rel)' to build_attrmap_by_name_if_req().
---
+ /* Deconstruct the tuple ... */
+ if (map != NULL)
+ copyslot = execute_attr_map_slot(map, slot, root_slot);
+ else
+ {
+ slot_getallattrs(slot);
+ copyslot = slot;
+ }
ISTM that the comment "Deconstruct the tuple" needs to move to before
slot_getallattrs(slot).
How about doing "slot = execute_attr_map_slot(map, slot, root_slot);"
instead? (i.e., no need to have 'copyslot')
---
+ if (root_slot != NULL)
+ ExecDropSingleTupleTableSlot(root_slot);
+ table_endscan(scandesc);
We might want to pfree 'map' if we create it.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
<jian.universality@gmail.com> wrote:
On Fri, Oct 3, 2025 at 8:31 AM torikoshia <torikoshia@oss.nttdata.com>
wrote:It’s been about two months since this discussion, and there don’t seem
to be any further comments.
How about updating the patch accordingly?
If there are no new remarks, I’d like to mark the patch as Ready for
Committer.+ List *children = NIL; ... + { + children = find_all_inheritors(RelationGetRelid(rel),Since 'children' is only used inside the else if block, I think we
don't
need the separate "List *children = NIL;" declaration.
Instead, it could just be "List *children =
find_all_inheritors(...)".you are right.
""List *children = find_all_inheritors(...)"." should be ok.hi.
please check the attached v15.
Thanks for updating the patch!
Here are some minor comments.
#include "access/tableam.h"
+#include "access/table.h"
As in partbounds.c, I think table.h should come before tableam.h in the
include order.
+ char relkind = get_rel_relkind(childreloid); + + if (relkind == RELKIND_FOREIGN_TABLE) + { + char *relation_name; + + relation_name = get_rel_name(childreloid);
Similar to how relkind is declared, it might be simpler to combine the
declaration and assignment here.
On 2025-10-09 10:13, Masahiko Sawada wrote:
Thanks for your review!
RelationGetRelationName(rel)),
+ errhint("Try the COPY (SELECT ...) TO
variant."));I think we don't need "the" in the error message.
I agree. However, I noticed that some existing messages use “the” in
similar contexts. For example:
if (rel->rd_rel->relkind == RELKIND_VIEW)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot copy from view \"%s\"",
RelationGetRelationName(rel)),
errhint("Try the COPY (SELECT ...) TO
variant.")));
If we want to fix it, I think we should update all similar messages
together for consistency.
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
On Thu, Oct 9, 2025 at 9:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
please check the attached v15.
Thank you for working on this! I've reviewed the v15 patch, and here
are review comments:--- + children = find_all_inheritors(RelationGetRelid(rel), + AccessShareLock, + NULL); + + foreach_oid(childreloid, children) + { + char relkind = get_rel_relkind(childreloid);I think it's better to write some comments summarizing what we're
doing in the loop.
sure.
--- + relation_name = get_rel_name(childreloid); + ereport(ERROR, + errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot copy from foreign table \"%s\"", relation_name), + errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"", + relation_name, RelationGetRelationName(rel)), + errhint("Try the COPY (SELECT ...) TO variant."));I think we don't need "the" in the error message.
It's conventional to put all err*() macros in parentheses (i.e.,
"(errcode(), ...)", it's technically omittable though.
https://www.postgresql.org/docs/current/error-message-reporting.html
QUOTE:
<<<<>>>>>
The extra parentheses were required before PostgreSQL version 12, but
are now optional.
Here is a more complex example:
.....
<<<<>>>>>
related commit:
https://git.postgresql.org/cgit/postgresql.git/commit/?id=e3a87b4991cc2d00b7a3082abb54c5f12baedfd1
Less parenthesis is generally more readable, I think.
--- + if (RELKIND_HAS_PARTITIONS(relkind)) + continue; + + scan_oids = lappend_oid(scan_oids, childreloid);find_all_inheritors() returns a list of OIDs of child relations. I
think we can delete relations whose kind is RELKIND_HAS_PARTITIONS()
from the list instead of creating a new list scan_oids. Then, we can
set cstate->partition to the list.
yech, we can use foreach_delete_current to delete list elements on the fly.
--- tupDesc = RelationGetDescr(cstate->rel); + cstate->partitions = list_copy(scan_oids); }Why do we need to copy the list here?
yech, list_copy is not needed.
---
With the patch we have:/*
* If COPY TO source table is a partitioned table, then open each
* partition and process each individual partition.
*/
if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
foreach_oid(scan_oid, cstate->partitions)
{
Relation scan_rel;/* We already got the needed lock in BeginCopyTo */
scan_rel = table_open(scan_oid, NoLock);
CopyRelTo(cstate, scan_rel, cstate->rel, &processed);
table_close(scan_rel, NoLock);
}
}
else if (cstate->rel)
CopyRelTo(cstate, cstate->rel, NULL, &processed);
else
{
/* run the plan --- the dest receiver will send tuples */I think we can refactor the code structure as follow:
if (cstate->rel)
{
if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
do CopyRelTo() for each OIDs in cstate->partition here.
}
else
CopyRelTo(cstate, cstate->rel, NULL, &processed);
}
else
{
...
sure, this may increase readability.
+ if (root_rel != NULL) + { + rootdesc = RelationGetDescr(root_rel); + root_slot = table_slot_create(root_rel, NULL); + map = build_attrmap_by_name_if_req(rootdesc, tupdesc, false); + }rootdesc can be declared inside this if statement or we can directly
pass 'RelationGetDescr(root_rel)' to build_attrmap_by_name_if_req().
sure. good idea.
--- + /* Deconstruct the tuple ... */ + if (map != NULL) + copyslot = execute_attr_map_slot(map, slot, root_slot); + else + { + slot_getallattrs(slot); + copyslot = slot; + }ISTM that the comment "Deconstruct the tuple" needs to move to before
slot_getallattrs(slot).
ok.
How about doing "slot = execute_attr_map_slot(map, slot, root_slot);"
instead? (i.e., no need to have 'copyslot')
I tried but it seems not possible.
table_scan_getnextslot function require certain type of "slot", if we do
"slot = execute_attr_map_slot(map, slot, root_slot);"
then pointer "slot" type becomes virtual slot, then
it will fail on second time call table_scan_getnextslot
--- + if (root_slot != NULL) + ExecDropSingleTupleTableSlot(root_slot); + table_endscan(scandesc);We might want to pfree 'map' if we create it.
ok.
Please check the attached v16.
Attachments:
v16-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=UTF-8; name=v16-0001-support-COPY-partitioned_table-TO.patchDownload
From a9189e99050a0a9387df797d3a0bfb11afed5887 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 9 Oct 2025 15:05:52 +0800
Subject: [PATCH v16 1/1] support COPY partitioned_table TO
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This is for implementation of ``COPY (partitioned_table) TO``. it will be
faster than ``COPY (select * from partitioned_table) TO``.
If the destination table is a partitioned table, COPY table TO copies the same
rows as SELECT * FROM table.
reivewed by: vignesh C <vignesh21@gmail.com>
reivewed by: David Rowley <dgrowleyml@gmail.com>
reivewed by: Melih Mutlu <m.melihmutlu@gmail.com>
reivewed by: Kirill Reshke <reshkekirill@gmail.com>
reivewed by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
reivewed by: Álvaro Herrera <alvherre@kurilemu.de>
reivewed by: Masahiko Sawada <sawada.mshk@gmail.com>
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5467
---
.../postgres_fdw/expected/postgres_fdw.out | 8 +
contrib/postgres_fdw/sql/postgres_fdw.sql | 4 +
doc/src/sgml/ref/copy.sgml | 9 +-
src/backend/commands/copyto.c | 155 ++++++++++++++----
src/test/regress/expected/copy.out | 18 ++
src/test/regress/sql/copy.sql | 15 ++
6 files changed, 176 insertions(+), 33 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 91bbd0d8c73..2f9e315fd57 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11599,6 +11599,14 @@ SELECT * FROM result_tbl ORDER BY a;
(3 rows)
DELETE FROM result_tbl;
+-- Test COPY TO with a foreign table or when the foreign table is a partition
+COPY async_p3 TO stdout; --error
+ERROR: cannot copy from foreign table "async_p3"
+HINT: Try the COPY (SELECT ...) TO variant.
+COPY async_pt TO stdout; --error
+ERROR: cannot copy from foreign table "async_p1"
+DETAIL: Partition "async_p1" is a foreign table in the partitioned table "async_pt"
+HINT: Try the COPY (SELECT ...) TO variant.
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
-- Check case where the partitioned table has local/remote partitions
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 3b7da128519..8a672f05039 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3941,6 +3941,10 @@ INSERT INTO result_tbl SELECT * FROM async_pt WHERE b === 505;
SELECT * FROM result_tbl ORDER BY a;
DELETE FROM result_tbl;
+-- Test COPY TO with a foreign table or when the foreign table is a partition
+COPY async_p3 TO stdout; --error
+COPY async_pt TO stdout; --error
+
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c2d1fbc1fbe..ecd300097fc 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -539,13 +539,16 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used with plain
- tables and populated materialized views.
- For example,
+ tables, populated materialized views and partitioned tables.
+ For example, if <replaceable class="parameter">table</replaceable> is a plain table,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal> copies the same rows as
<literal>SELECT * FROM ONLY <replaceable class="parameter">table</replaceable></literal>.
+ If <replaceable class="parameter">table</replaceable> is a partitioned table or a materialized view
+ <literal>COPY <replaceable class="parameter">table</replaceable> TO</literal>
+ copies the same rows as <literal>SELECT * FROM <replaceable class="parameter">table</replaceable></literal>.
However it doesn't directly support other relation types,
- such as partitioned tables, inheritance child tables, or views.
+ such as inheritance child tables, or views.
To copy all rows from such relations, use <literal>COPY (SELECT * FROM
<replaceable class="parameter">table</replaceable>) TO</literal>.
</para>
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e5781155cdf..aba76eb0173 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -18,7 +18,9 @@
#include <unistd.h>
#include <sys/stat.h>
+#include "access/table.h"
#include "access/tableam.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ List *partitions; /* oid list of partition oid for copy to */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -116,6 +119,8 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed);
/* built-in format-specific routines */
static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -643,6 +648,7 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -673,11 +679,36 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ /*
+ * Collect a list of partitions containing data, so that later
+ * DoCopyTo can copy the data from them.
+ */
+ children = find_all_inheritors(RelationGetRelid(rel),
+ AccessShareLock,
+ NULL);
+
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ {
+ char *relation_name;
+
+ relation_name = get_rel_name(childreloid);
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"", relation_name),
+ errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"",
+ relation_name, RelationGetRelationName(rel)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+ }
+
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ children = foreach_delete_current(children, childreloid);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -713,6 +744,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = children;
}
else
{
@@ -722,6 +754,7 @@ BeginCopyTo(ParseState *pstate,
DestReceiver *dest;
cstate->rel = NULL;
+ cstate->partitions = NIL;
/*
* Run parse analysis and rewrite. Note this also acquires sufficient
@@ -1030,7 +1063,7 @@ DoCopyTo(CopyToState cstate)
TupleDesc tupDesc;
int num_phys_attrs;
ListCell *cur;
- uint64 processed;
+ uint64 processed = 0;
if (fe_copy)
SendCopyBegin(cstate);
@@ -1070,33 +1103,24 @@ DoCopyTo(CopyToState cstate)
if (cstate->rel)
{
- TupleTableSlot *slot;
- TableScanDesc scandesc;
-
- scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
- slot = table_slot_create(cstate->rel, NULL);
-
- processed = 0;
- while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ /*
+ * If COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
- CHECK_FOR_INTERRUPTS();
+ foreach_oid(scan_oid, cstate->partitions)
+ {
+ Relation scan_rel;
- /* Deconstruct the tuple ... */
- slot_getallattrs(slot);
-
- /* Format and send the data */
- CopyOneRowTo(cstate, slot);
-
- /*
- * Increment the number of processed tuples, and report the
- * progress.
- */
- pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
- ++processed);
+ /* We already got the needed lock in BeginCopyTo */
+ scan_rel = table_open(scan_oid, NoLock);
+ CopyRelTo(cstate, scan_rel, cstate->rel, &processed);
+ table_close(scan_rel, NoLock);
+ }
}
-
- ExecDropSingleTupleTableSlot(slot);
- table_endscan(scandesc);
+ else
+ CopyRelTo(cstate, cstate->rel, NULL, &processed);
}
else
{
@@ -1115,6 +1139,77 @@ DoCopyTo(CopyToState cstate)
return processed;
}
+/*
+ * Scan a single table (which may be a partition) and export its rows to the
+ * COPY destination.
+ *
+ * rel: the table from which the actual data will be copied.
+ * root_rel: if not NULL, it indicates that COPY TO command copy partitioned
+ * table data to the destination, and "rel" is the partition of "root_rel".
+ * processed: number of tuples processed.
+*/
+static void
+CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed)
+{
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ AttrMap *map = NULL;
+ TupleTableSlot *root_slot = NULL;
+
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(rel, NULL);
+
+ /*
+ * A partition's rowtype might differ from the root table's. If we are
+ * exporting partition data here, we must convert it back to the root
+ * table's rowtype.
+ */
+ if (root_rel != NULL)
+ {
+ root_slot = table_slot_create(root_rel, NULL);
+ map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+ RelationGetDescr(rel),
+ false);
+ }
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ TupleTableSlot *copyslot;
+
+ CHECK_FOR_INTERRUPTS();
+
+ if (map != NULL)
+ copyslot = execute_attr_map_slot(map, slot, root_slot);
+ else
+ {
+ /* Deconstruct the tuple ... */
+ slot_getallattrs(slot);
+ copyslot = slot;
+ }
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, copyslot);
+
+ /*
+ * Increment the number of processed tuples, and report the
+ * progress.
+ */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++(*processed));
+ }
+
+ ExecDropSingleTupleTableSlot(slot);
+
+ if (root_slot != NULL)
+ ExecDropSingleTupleTableSlot(root_slot);
+
+ if (map != NULL)
+ free_attrmap(map);
+
+ table_endscan(scandesc);
+}
+
/*
* Emit one row during DoCopyTo().
*/
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index ac66eb55aee..3bf5ecf469e 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -373,3 +373,21 @@ COPY copytest_mv(id) TO stdout WITH (header);
id
1
DROP MATERIALIZED VIEW copytest_mv;
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+COPY pp TO stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index a1316c73bac..3d84764c65f 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -405,3 +405,18 @@ COPY copytest_mv(id) TO stdout WITH (header);
REFRESH MATERIALIZED VIEW copytest_mv;
COPY copytest_mv(id) TO stdout WITH (header);
DROP MATERIALIZED VIEW copytest_mv;
+
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+
+COPY pp TO stdout(header);
+DROP TABLE PP;
--
2.34.1
Hi Jian,
Thanks for the patch. After reviewing it, I got a few small comments:
On Oct 9, 2025, at 15:10, jian he <jian.universality@gmail.com> wrote:
Please check the attached v16.
<v16-0001-support-COPY-partitioned_table-TO.patch>
1
```
+ List *partitions; /* oid list of partition oid for copy to */
```
The comment doesn’t look very good. First, it repeats “oid”; second, as “List *partitions” implies multiple partitions, the comment should use plural OIDs. Maybe change the comment to “/* list of partition OIDs for COPY TO */"
2
```
+ /*
+ * Collect a list of partitions containing data, so that later
+ * DoCopyTo can copy the data from them.
+ */
+ children = find_all_inheritors(RelationGetRelid(rel),
+ AccessShareLock,
+ NULL);
+
+ foreach_oid(childreloid, children)
+ {
+ char relkind = get_rel_relkind(childreloid);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ {
+ char *relation_name;
+
+ relation_name = get_rel_name(childreloid);
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"", relation_name),
+ errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"",
+ relation_name, RelationGetRelationName(rel)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+ }
+
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ children = foreach_delete_current(children, childreloid);
+ }
```
Is it better to move the RELKIND_HAS_PARTIONS() check to before FOREIGH_TABLE check and continue after foreach_delete_current()? Now every childreloid goes through the both checks, if we do the movement, then HAS_PARTIONS child will go through 1 check. This is a tiny optimization.
3
```
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ children = foreach_delete_current(children, childreloid);
+ }
```
I wonder if there is any specially consideration of using RELKIND_HAS_PARTITIONS() here? Because according to the function comment of find_all_inheritors(), it will only return OIDs of relations; while RELKIND_HAS_PARTITIONS checks for both relations and views. Logically using this macro works, but it may lead to some confusion to code readers.
4
```
@@ -722,6 +754,7 @@ BeginCopyTo(ParseState *pstate,
DestReceiver *dest;
cstate->rel = NULL;
+ cstate->partitions = NIL;
```
Both NULL assignment are not needed as cstate is allocated by palloc0().
5
```
+static void
+CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed)
```
Instead of using a pointer to pass out processed count, I think it’s better to return the process count. I understand the current implementation allows continuous increment while calling this function in a loop. However, it’s a bit error-prone, a caller must make sure “processed” is well initialized. With returning a unit64, the caller’s code is still simple:
```
processed += CopyRelTo(cstate, …);
```
6. In BeginCopyTo(), “children” list is created before “cstate” is created, it is not allocated under “cstate->copycontext”, so in EndCopyTo(), we should also free memory of “cstate->partitions”.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Thu, Oct 9, 2025 at 4:14 PM Chao Li <li.evan.chao@gmail.com> wrote:
Hi Jian,
Thanks for the patch. After reviewing it, I got a few small comments:
On Oct 9, 2025, at 15:10, jian he <jian.universality@gmail.com> wrote:
Please check the attached v16.
<v16-0001-support-COPY-partitioned_table-TO.patch>
1
```
+ List *partitions; /* oid list of partition oid for copy to */
```The comment doesn’t look very good. First, it repeats “oid”; second, as “List *partitions” implies multiple partitions, the comment should use plural OIDs. Maybe change the comment to “/* list of partition OIDs for COPY TO */"
yech, I need to improve these comments.
2 ``` + /* + * Collect a list of partitions containing data, so that later + * DoCopyTo can copy the data from them. + */ + children = find_all_inheritors(RelationGetRelid(rel), + AccessShareLock, + NULL); + + foreach_oid(childreloid, children) + { + char relkind = get_rel_relkind(childreloid); + + if (relkind == RELKIND_FOREIGN_TABLE) + { + char *relation_name; + + relation_name = get_rel_name(childreloid); + ereport(ERROR, + errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot copy from foreign table \"%s\"", relation_name), + errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"", + relation_name, RelationGetRelationName(rel)), + errhint("Try the COPY (SELECT ...) TO variant.")); + } + + if (RELKIND_HAS_PARTITIONS(relkind)) + children = foreach_delete_current(children, childreloid); + } ```Is it better to move the RELKIND_HAS_PARTIONS() check to before FOREIGH_TABLE check and continue after foreach_delete_current()? Now every childreloid goes through the both checks, if we do the movement, then HAS_PARTIONS child will go through 1 check. This is a tiny optimization.
I think the current handling is fine.
3 ``` + if (RELKIND_HAS_PARTITIONS(relkind)) + children = foreach_delete_current(children, childreloid); + } ```I wonder if there is any specially consideration of using RELKIND_HAS_PARTITIONS() here? Because according to the function comment of find_all_inheritors(), it will only return OIDs of relations; while RELKIND_HAS_PARTITIONS checks for both relations and views. Logically using this macro works, but it may lead to some confusion to code readers.
find_all_inheritors comments says:
* Returns a list of relation OIDs including the given rel plus
* all relations that inherit from it, directly or indirectly.
CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
If we copy partitioned table "pp" data out, but partitioned table "pp_1"
don't have storage, so we have to skip it, using RELKIND_HAS_PARTITIONS
to skip it should be fine.
4
```
@@ -722,6 +754,7 @@ BeginCopyTo(ParseState *pstate,
DestReceiver *dest;cstate->rel = NULL;
+ cstate->partitions = NIL;
```Both NULL assignment are not needed as cstate is allocated by palloc0().
I guess this is just a code convention. Such not necessary is quite common
within the codebase.
5 ``` +static void +CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel, + uint64 *processed) ```Instead of using a pointer to pass out processed count, I think it’s better to return the process count. I understand the current implementation allows continuous increment while calling this function in a loop. However, it’s a bit error-prone, a caller must make sure “processed” is well initialized. With returning a unit64, the caller’s code is still simple:
```
processed += CopyRelTo(cstate, …);
```
pgstat_progress_update_param was within CopyRelTo.
so we have to pass (uint64 *processed) to CopyRelTo.
Am I missing something?
6. In BeginCopyTo(), “children” list is created before “cstate” is created, it is not allocated under “cstate->copycontext”, so in EndCopyTo(), we should also free memory of “cstate->partitions”.
I think so.
On Thu, Oct 9, 2025 at 12:10 AM jian he <jian.universality@gmail.com> wrote:
On Thu, Oct 9, 2025 at 9:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
--- + relation_name = get_rel_name(childreloid); + ereport(ERROR, + errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot copy from foreign table \"%s\"", relation_name), + errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"", + relation_name, RelationGetRelationName(rel)), + errhint("Try the COPY (SELECT ...) TO variant."));I think we don't need "the" in the error message.
It's conventional to put all err*() macros in parentheses (i.e.,
"(errcode(), ...)", it's technically omittable though.https://www.postgresql.org/docs/current/error-message-reporting.html
QUOTE:
<<<<>>>>>
The extra parentheses were required before PostgreSQL version 12, but
are now optional.
Here is a more complex example:
.....
<<<<>>>>>related commit:
https://git.postgresql.org/cgit/postgresql.git/commit/?id=e3a87b4991cc2d00b7a3082abb54c5f12baedfd1
Less parenthesis is generally more readable, I think.
Yes, but I think it's more consistent given that we use the
parentheses in all other places in copyto.c.
How about doing "slot = execute_attr_map_slot(map, slot, root_slot);"
instead? (i.e., no need to have 'copyslot')I tried but it seems not possible.
table_scan_getnextslot function require certain type of "slot", if we do
"slot = execute_attr_map_slot(map, slot, root_slot);"
then pointer "slot" type becomes virtual slot, then
it will fail on second time call table_scan_getnextslot
Right. Let's keep as it is.
I've attached a patch for cosmetic changes including comment updates,
indent fixes by pgindent, and renaming variable names. Some fixes are
just my taste, so please check the changes.
Also I have a few comments on new tests:
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+
I think it's better to have both cases: partitions' rowtype match the
root's rowtype and partition's rowtype doesn't match the root's
rowtype.
---
+-- Test COPY TO with a foreign table or when the foreign table is a partition
+COPY async_p3 TO stdout; --error
+ERROR: cannot copy from foreign table "async_p3"
+HINT: Try the COPY (SELECT ...) TO variant.
async_p3 is a foreign table so it seems not related to this patch.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachments:
v16_fix_masahiko.patch.txttext/plain; charset=US-ASCII; name=v16_fix_masahiko.patch.txtDownload
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index aba76eb0173..7d5067d90b5 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -119,8 +119,8 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
-static void CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
- uint64 *processed);
+static void CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed);
/* built-in format-specific routines */
static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -681,32 +681,30 @@ BeginCopyTo(ParseState *pstate,
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
/*
- * Collect a list of partitions containing data, so that later
+ * Collect OIDs of relation containing data, so that later
* DoCopyTo can copy the data from them.
- */
- children = find_all_inheritors(RelationGetRelid(rel),
- AccessShareLock,
- NULL);
+ */
+ children = find_all_inheritors(RelationGetRelid(rel), AccessShareLock, NULL);
- foreach_oid(childreloid, children)
+ foreach_oid(child, children)
{
- char relkind = get_rel_relkind(childreloid);
+ char relkind = get_rel_relkind(child);
if (relkind == RELKIND_FOREIGN_TABLE)
{
- char *relation_name;
+ char *relation_name = get_rel_name(child);
- relation_name = get_rel_name(childreloid);
ereport(ERROR,
- errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from foreign table \"%s\"", relation_name),
- errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"",
- relation_name, RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant."));
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"", relation_name),
+ errdetail("Partition \"%s\" is a foreign table in partitioned table \"%s\"",
+ relation_name, RelationGetRelationName(rel)),
+ errhint("Try the COPY (SELECT ...) TO variant.")));
}
+ /* Exclude tables with no data */
if (RELKIND_HAS_PARTITIONS(relkind))
- children = foreach_delete_current(children, childreloid);
+ children = foreach_delete_current(children, child);
}
}
else
@@ -1106,21 +1104,21 @@ DoCopyTo(CopyToState cstate)
/*
* If COPY TO source table is a partitioned table, then open each
* partition and process each individual partition.
- */
+ */
if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
- foreach_oid(scan_oid, cstate->partitions)
+ foreach_oid(child, cstate->partitions)
{
- Relation scan_rel;
+ Relation scan_rel;
- /* We already got the needed lock in BeginCopyTo */
- scan_rel = table_open(scan_oid, NoLock);
- CopyRelTo(cstate, scan_rel, cstate->rel, &processed);
+ /* We already got the lock in BeginCopyTo */
+ scan_rel = table_open(child, NoLock);
+ CopyRelationTo(cstate, scan_rel, cstate->rel, &processed);
table_close(scan_rel, NoLock);
}
}
else
- CopyRelTo(cstate, cstate->rel, NULL, &processed);
+ CopyRelationTo(cstate, cstate->rel, NULL, &processed);
}
else
{
@@ -1140,22 +1138,19 @@ DoCopyTo(CopyToState cstate)
}
/*
- * Scan a single table (which may be a partition) and export its rows to the
- * COPY destination.
+ * Scans a single table and exports its rows to the COPY destination.
*
- * rel: the table from which the actual data will be copied.
- * root_rel: if not NULL, it indicates that COPY TO command copy partitioned
- * table data to the destination, and "rel" is the partition of "root_rel".
- * processed: number of tuples processed.
+ * root_rel can be set to the root table of rel if rel is a partition
+ * table so that we can send tuples in root_rel's rowtype, which might
+ * differ from individual partitions.
*/
static void
-CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
- uint64 *processed)
+CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *processed)
{
TupleTableSlot *slot;
TableScanDesc scandesc;
- AttrMap *map = NULL;
- TupleTableSlot *root_slot = NULL;
+ AttrMap *map = NULL;
+ TupleTableSlot *root_slot = NULL;
scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
slot = table_slot_create(rel, NULL);
@@ -1164,7 +1159,7 @@ CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
* A partition's rowtype might differ from the root table's. If we are
* exporting partition data here, we must convert it back to the root
* table's rowtype.
- */
+ */
if (root_rel != NULL)
{
root_slot = table_slot_create(root_rel, NULL);
@@ -1192,8 +1187,7 @@ CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
CopyOneRowTo(cstate, copyslot);
/*
- * Increment the number of processed tuples, and report the
- * progress.
+ * Increment the number of processed tuples, and report the progress.
*/
pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
++(*processed));
On Oct 9, 2025, at 22:50, jian he <jian.universality@gmail.com> wrote:
3 ``` + if (RELKIND_HAS_PARTITIONS(relkind)) + children = foreach_delete_current(children, childreloid); + } ```I wonder if there is any specially consideration of using RELKIND_HAS_PARTITIONS() here? Because according to the function comment of find_all_inheritors(), it will only return OIDs of relations; while RELKIND_HAS_PARTITIONS checks for both relations and views. Logically using this macro works, but it may lead to some confusion to code readers.
find_all_inheritors comments says:
* Returns a list of relation OIDs including the given rel plus
* all relations that inherit from it, directly or indirectly.CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);If we copy partitioned table "pp" data out, but partitioned table "pp_1"
don't have storage, so we have to skip it, using RELKIND_HAS_PARTITIONS
to skip it should be fine.
My point is that RELKIND_HAS_PARTITIONS is defined as:
#define RELKIND_HAS_PARTITIONS(relkind) \
((relkind) == RELKIND_PARTITIONED_TABLE || \
(relkind) == RELKIND_PARTITIONED_INDEX)
It just checks relkind to be table or index. The example in your explanation seems to not address my concern. Why do we need to check against index?
4
```
@@ -722,6 +754,7 @@ BeginCopyTo(ParseState *pstate,
DestReceiver *dest;cstate->rel = NULL;
+ cstate->partitions = NIL;
```Both NULL assignment are not needed as cstate is allocated by palloc0().
I guess this is just a code convention. Such not necessary is quite common
within the codebase.
I don’t agree. cstate has a lot of more fields with pointer types, why don’t set NULL to them?
5 ``` +static void +CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel, + uint64 *processed) ```Instead of using a pointer to pass out processed count, I think it’s better to return the process count. I understand the current implementation allows continuous increment while calling this function in a loop. However, it’s a bit error-prone, a caller must make sure “processed” is well initialized. With returning a unit64, the caller’s code is still simple:
```
processed += CopyRelTo(cstate, …);
```pgstat_progress_update_param was within CopyRelTo.
so we have to pass (uint64 *processed) to CopyRelTo.
Am I missing something?
Make sense. I didn’t notice postage_progress_update_param. So, “processed” is both input and output. In that case, I think the comment for parameter “processed” should be enhanced, for example:
```
* processed: on entry, contains the current count of processed count;
* this function increments it by the number of rows copied
* from this relation and writes back the updated total.
```
Or a short version:
```
* processed: input/output; cumulative count of tuples processed, incremented here.
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Fri, Oct 10, 2025 at 9:02 AM Chao Li <li.evan.chao@gmail.com> wrote:
On Oct 9, 2025, at 22:50, jian he <jian.universality@gmail.com> wrote:
3 ``` + if (RELKIND_HAS_PARTITIONS(relkind)) + children = foreach_delete_current(children, childreloid); + } ```I wonder if there is any specially consideration of using RELKIND_HAS_PARTITIONS() here? Because according to the function comment of find_all_inheritors(), it will only return OIDs of relations; while RELKIND_HAS_PARTITIONS checks for both relations and views. Logically using this macro works, but it may lead to some confusion to code readers.
find_all_inheritors comments says:
* Returns a list of relation OIDs including the given rel plus
* all relations that inherit from it, directly or indirectly.CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);If we copy partitioned table "pp" data out, but partitioned table "pp_1"
don't have storage, so we have to skip it, using RELKIND_HAS_PARTITIONS
to skip it should be fine.My point is that RELKIND_HAS_PARTITIONS is defined as:
#define RELKIND_HAS_PARTITIONS(relkind) \
((relkind) == RELKIND_PARTITIONED_TABLE || \
(relkind) == RELKIND_PARTITIONED_INDEX)It just checks relkind to be table or index. The example in your explanation seems to not address my concern. Why do we need to check against index?
the macro name RELKIND_HAS_PARTITIONS improves the readability, I think.
also we don't need to worry about partitioned index here, because
we are in the
``
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
}
``
loop.
sure we can change it ``if (relkind == RELKIND_PARTITIONED_TABLE)``.
5 ``` +static void +CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel, + uint64 *processed) ```Instead of using a pointer to pass out processed count, I think it’s better to return the process count. I understand the current implementation allows continuous increment while calling this function in a loop. However, it’s a bit error-prone, a caller must make sure “processed” is well initialized. With returning a unit64, the caller’s code is still simple:
```
processed += CopyRelTo(cstate, …);
```pgstat_progress_update_param was within CopyRelTo.
so we have to pass (uint64 *processed) to CopyRelTo.
Am I missing something?Make sense. I didn’t notice postage_progress_update_param. So, “processed” is both input and output. In that case, I think the comment for parameter “processed” should be enhanced, for example:
if your function is:
static processed CopyRelationTo(CopyToState cstate, Relation rel,
Relation root_rel, uint64 *processed);
where function return value is also passed as function argument,
I think it will lead to more confusion.
On Oct 10, 2025, at 10:54, jian he <jian.universality@gmail.com> wrote:
5 ``` +static void +CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel, + uint64 *processed) ```Instead of using a pointer to pass out processed count, I think it’s better to return the process count. I understand the current implementation allows continuous increment while calling this function in a loop. However, it’s a bit error-prone, a caller must make sure “processed” is well initialized. With returning a unit64, the caller’s code is still simple:
```
processed += CopyRelTo(cstate, …);
```pgstat_progress_update_param was within CopyRelTo.
so we have to pass (uint64 *processed) to CopyRelTo.
Am I missing something?Make sense. I didn’t notice postage_progress_update_param. So, “processed” is both input and output. In that case, I think the comment for parameter “processed” should be enhanced, for example:
if your function is:
static processed CopyRelationTo(CopyToState cstate, Relation rel,
Relation root_rel, uint64 *processed);where function return value is also passed as function argument,
I think it will lead to more confusion.
I am not suggesting add a return value to the function. My comment was just to enhance the parameter comment of “processed”.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Fri, Oct 10, 2025 at 6:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
--- + relation_name = get_rel_name(childreloid); + ereport(ERROR, + errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot copy from foreign table \"%s\"", relation_name), + errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"", + relation_name, RelationGetRelationName(rel)), + errhint("Try the COPY (SELECT ...) TO variant."));I think we don't need "the" in the error message.
It's conventional to put all err*() macros in parentheses (i.e.,
"(errcode(), ...)", it's technically omittable though.https://www.postgresql.org/docs/current/error-message-reporting.html
QUOTE:
<<<<>>>>>
The extra parentheses were required before PostgreSQL version 12, but
are now optional.
Here is a more complex example:
.....
<<<<>>>>>related commit:
https://git.postgresql.org/cgit/postgresql.git/commit/?id=e3a87b4991cc2d00b7a3082abb54c5f12baedfd1
Less parenthesis is generally more readable, I think.Yes, but I think it's more consistent given that we use the
parentheses in all other places in copyto.c.
If you look at tablecmds.c, like ATExecSetNotNull, there are
parentheses and no parentheses cases.
Overall, I think less parentheses improves readability and makes the
code more future-proof.
How about doing "slot = execute_attr_map_slot(map, slot, root_slot);"
instead? (i.e., no need to have 'copyslot')I tried but it seems not possible.
table_scan_getnextslot function require certain type of "slot", if we do
"slot = execute_attr_map_slot(map, slot, root_slot);"
then pointer "slot" type becomes virtual slot, then
it will fail on second time call table_scan_getnextslotRight. Let's keep as it is.
I've attached a patch for cosmetic changes including comment updates,
indent fixes by pgindent, and renaming variable names. Some fixes are
just my taste, so please check the changes.
thanks!
I have applied most of it. expect points I mentioned in this email.
Also I have a few comments on new tests:
+-- Tests for COPY TO with partitioned tables. +CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id); +CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id); +CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id); +ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5); +ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10); + +CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5); +CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10); + +INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g; +I think it's better to have both cases: partitions' rowtype match the
root's rowtype and partition's rowtype doesn't match the root's
rowtype.
sure.
--- +-- Test COPY TO with a foreign table or when the foreign table is a partition +COPY async_p3 TO stdout; --error +ERROR: cannot copy from foreign table "async_p3" +HINT: Try the COPY (SELECT ...) TO variant.async_p3 is a foreign table so it seems not related to this patch.
I replied in
/messages/by-id/CACJufxGkkMtRUJEbLczRnWp7x2YWqu4r1gEJEv9Po1UPxS6kGQ@mail.gmail.com
I kind of doubt anyone would submit a patch just to rewrite a coverage test for
the sake of coverage itself. While we're here, adding nearby coverage tests
should be fine?
i just found out I ignored the case when partitioned tables have RLS.
when exporting a partitioned table,
find_all_inheritors will sort the returned partition by oid.
in DoCopy, we can do the same:
make a SortBy node for SelectStmt->sortClause also mark the
RangeVar->inh as true.
OR
ereport(ERRCODE_FEATURE_NOT_SUPPORTED...) for partitioned tables with RLS.
please see the change I made in DoCopy.
Attachments:
v17-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=UTF-8; name=v17-0001-support-COPY-partitioned_table-TO.patchDownload
From 63cd35bf535cbc06b09d07171cecf57ed21e89cc Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 10 Oct 2025 14:36:15 +0800
Subject: [PATCH v17 1/1] support COPY partitioned_table TO
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This is for implementation of ``COPY (partitioned_table) TO``. it will be
faster than ``COPY (select * from partitioned_table) TO``.
If the destination table is a partitioned table, COPY table TO copies the same
rows as SELECT * FROM table.
reivewed by: vignesh C <vignesh21@gmail.com>
reivewed by: David Rowley <dgrowleyml@gmail.com>
reivewed by: Melih Mutlu <m.melihmutlu@gmail.com>
reivewed by: Kirill Reshke <reshkekirill@gmail.com>
reivewed by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
reivewed by: Álvaro Herrera <alvherre@kurilemu.de>
reivewed by: Masahiko Sawada <sawada.mshk@gmail.com>
reivewed by: Chao Li <li.evan.chao@gmail.com>
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5467
---
.../postgres_fdw/expected/postgres_fdw.out | 8 +
contrib/postgres_fdw/sql/postgres_fdw.sql | 4 +
doc/src/sgml/ref/copy.sgml | 9 +-
src/backend/commands/copy.c | 32 ++++
src/backend/commands/copyto.c | 153 ++++++++++++++----
src/test/regress/expected/copy.out | 18 +++
src/test/regress/expected/rowsecurity.out | 54 +++++++
src/test/regress/sql/copy.sql | 15 ++
src/test/regress/sql/rowsecurity.sql | 6 +
9 files changed, 266 insertions(+), 33 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 91bbd0d8c73..aa1329eee37 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11599,6 +11599,14 @@ SELECT * FROM result_tbl ORDER BY a;
(3 rows)
DELETE FROM result_tbl;
+-- Test COPY TO with a foreign table or when the foreign table is a partition
+COPY async_p3 TO stdout; --error
+ERROR: cannot copy from foreign table "async_p3"
+HINT: Try the COPY (SELECT ...) TO variant.
+COPY async_pt TO stdout; --error
+ERROR: cannot copy from foreign table "async_p1"
+DETAIL: Partition "async_p1" is a foreign table in partitioned table "async_pt"
+HINT: Try the COPY (SELECT ...) TO variant.
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
-- Check case where the partitioned table has local/remote partitions
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 3b7da128519..8a672f05039 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3941,6 +3941,10 @@ INSERT INTO result_tbl SELECT * FROM async_pt WHERE b === 505;
SELECT * FROM result_tbl ORDER BY a;
DELETE FROM result_tbl;
+-- Test COPY TO with a foreign table or when the foreign table is a partition
+COPY async_p3 TO stdout; --error
+COPY async_pt TO stdout; --error
+
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c2d1fbc1fbe..ecd300097fc 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -539,13 +539,16 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used with plain
- tables and populated materialized views.
- For example,
+ tables, populated materialized views and partitioned tables.
+ For example, if <replaceable class="parameter">table</replaceable> is a plain table,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal> copies the same rows as
<literal>SELECT * FROM ONLY <replaceable class="parameter">table</replaceable></literal>.
+ If <replaceable class="parameter">table</replaceable> is a partitioned table or a materialized view
+ <literal>COPY <replaceable class="parameter">table</replaceable> TO</literal>
+ copies the same rows as <literal>SELECT * FROM <replaceable class="parameter">table</replaceable></literal>.
However it doesn't directly support other relation types,
- such as partitioned tables, inheritance child tables, or views.
+ such as inheritance child tables, or views.
To copy all rows from such relations, use <literal>COPY (SELECT * FROM
<replaceable class="parameter">table</replaceable>) TO</literal>.
</para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fae9c41db65..d09b54308bd 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -186,6 +186,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
ResTarget *target;
RangeVar *from;
List *targetList = NIL;
+ bool rel_is_partitioned;
if (is_from)
ereport(ERROR,
@@ -193,6 +194,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
errmsg("COPY FROM not supported with row-level security"),
errhint("Use INSERT statements instead.")));
+ rel_is_partitioned = (get_rel_relkind(relid) == RELKIND_PARTITIONED_TABLE);
+
/*
* Build target list
*
@@ -251,17 +254,46 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
* relation which we have opened and locked. Use "ONLY" so that
* COPY retrieves rows from only the target table not any
* inheritance children, the same as when RLS doesn't apply.
+ *
+ * However, when COPY data from a partitioned table, we should not
+ * use "ONLY", since we also need to retrieve rows from its child
+ * partitions too.
*/
from = makeRangeVar(get_namespace_name(RelationGetNamespace(rel)),
pstrdup(RelationGetRelationName(rel)),
-1);
from->inh = false; /* apply ONLY */
+ if (rel_is_partitioned)
+ from->inh = true;
/* Build query */
select = makeNode(SelectStmt);
select->targetList = targetList;
select->fromClause = list_make1(from);
+ /*
+ * To COPY data from multiple partitions, we rely on the order of
+ * the partitions' tableoids, which matches the order produced by
+ * find_all_inheritors.
+ */
+ if (rel_is_partitioned)
+ {
+ SortBy *sortby;
+ ColumnRef *colref;
+ List *orderlist = NIL;
+
+ colref = makeNode(ColumnRef);
+ colref->fields = list_make1(makeString("tableoid"));
+ colref->location = -1;
+
+ sortby = makeNode(SortBy);
+ sortby->node = (Node *) colref;
+ sortby->location = -1;
+
+ orderlist = lappend(orderlist, sortby);
+ select->sortClause = orderlist;
+ }
+
query = makeNode(RawStmt);
query->stmt = (Node *) select;
query->stmt_location = stmt_location;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e5781155cdf..74497240105 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -18,7 +18,9 @@
#include <unistd.h>
#include <sys/stat.h>
+#include "access/table.h"
#include "access/tableam.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ List *partitions; /* OID list of partitions to copy data from */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -116,6 +119,8 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed);
/* built-in format-specific routines */
static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -602,6 +607,10 @@ EndCopy(CopyToState cstate)
pgstat_progress_end_command();
MemoryContextDelete(cstate->copycontext);
+
+ if (cstate->partitions)
+ list_free(cstate->partitions);
+
pfree(cstate);
}
@@ -643,6 +652,7 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -673,11 +683,34 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ /*
+ * Collect OIDs of relation containing data, so that later
+ * DoCopyTo can copy the data from them.
+ */
+ children = find_all_inheritors(RelationGetRelid(rel), AccessShareLock, NULL);
+
+ foreach_oid(child, children)
+ {
+ char relkind = get_rel_relkind(child);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ {
+ char *relation_name = get_rel_name(child);
+
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"", relation_name),
+ errdetail("Partition \"%s\" is a foreign table in partitioned table \"%s\"",
+ relation_name, RelationGetRelationName(rel)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+ }
+
+ /* Exclude tables with no data */
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ children = foreach_delete_current(children, child);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -713,6 +746,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = children;
}
else
{
@@ -722,6 +756,7 @@ BeginCopyTo(ParseState *pstate,
DestReceiver *dest;
cstate->rel = NULL;
+ cstate->partitions = NIL;
/*
* Run parse analysis and rewrite. Note this also acquires sufficient
@@ -1030,7 +1065,7 @@ DoCopyTo(CopyToState cstate)
TupleDesc tupDesc;
int num_phys_attrs;
ListCell *cur;
- uint64 processed;
+ uint64 processed = 0;
if (fe_copy)
SendCopyBegin(cstate);
@@ -1070,33 +1105,24 @@ DoCopyTo(CopyToState cstate)
if (cstate->rel)
{
- TupleTableSlot *slot;
- TableScanDesc scandesc;
-
- scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
- slot = table_slot_create(cstate->rel, NULL);
-
- processed = 0;
- while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ /*
+ * If COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
- CHECK_FOR_INTERRUPTS();
+ foreach_oid(child, cstate->partitions)
+ {
+ Relation scan_rel;
- /* Deconstruct the tuple ... */
- slot_getallattrs(slot);
-
- /* Format and send the data */
- CopyOneRowTo(cstate, slot);
-
- /*
- * Increment the number of processed tuples, and report the
- * progress.
- */
- pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
- ++processed);
+ /* We already got the lock in BeginCopyTo */
+ scan_rel = table_open(child, NoLock);
+ CopyRelationTo(cstate, scan_rel, cstate->rel, &processed);
+ table_close(scan_rel, NoLock);
+ }
}
-
- ExecDropSingleTupleTableSlot(slot);
- table_endscan(scandesc);
+ else
+ CopyRelationTo(cstate, cstate->rel, NULL, &processed);
}
else
{
@@ -1115,6 +1141,73 @@ DoCopyTo(CopyToState cstate)
return processed;
}
+/*
+ * Scans a single table and exports its rows to the COPY destination.
+ *
+ * root_rel can be set to the root table of rel if rel is a partition
+ * table so that we can send tuples in root_rel's rowtype, which might
+ * differ from individual partitions.
+*/
+static void
+CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *processed)
+{
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ AttrMap *map = NULL;
+ TupleTableSlot *root_slot = NULL;
+
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(rel, NULL);
+
+ /*
+ * A partition's rowtype might differ from the root table's. If we are
+ * exporting partition data here, we must convert it back to the root
+ * table's rowtype.
+ */
+ if (root_rel != NULL)
+ {
+ root_slot = table_slot_create(root_rel, NULL);
+ map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+ RelationGetDescr(rel),
+ false);
+ }
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ TupleTableSlot *copyslot;
+
+ CHECK_FOR_INTERRUPTS();
+
+ if (map != NULL)
+ copyslot = execute_attr_map_slot(map, slot, root_slot);
+ else
+ {
+ /* Deconstruct the tuple ... */
+ slot_getallattrs(slot);
+ copyslot = slot;
+ }
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, copyslot);
+
+ /*
+ * Increment the number of processed tuples, and report the progress.
+ */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++(*processed));
+ }
+
+ ExecDropSingleTupleTableSlot(slot);
+
+ if (root_slot != NULL)
+ ExecDropSingleTupleTableSlot(root_slot);
+
+ if (map != NULL)
+ free_attrmap(map);
+
+ table_endscan(scandesc);
+}
+
/*
* Emit one row during DoCopyTo().
*/
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index ac66eb55aee..af01e84cea1 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -373,3 +373,21 @@ COPY copytest_mv(id) TO stdout WITH (header);
id
1
DROP MATERIALIZED VIEW copytest_mv;
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (id int, val int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+COPY pp TO stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 5a172c5d91c..0b88b1eed44 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -986,6 +986,20 @@ NOTICE: f_leak => my first satire
9 | 11 | 1 | regress_rls_dave | awesome science fiction
(4 rows)
+SELECT * FROM part_document ORDER BY tableoid;
+ did | cid | dlevel | dauthor | dtitle
+-----+-----+--------+-------------------+-------------------------
+ 1 | 11 | 1 | regress_rls_bob | my first novel
+ 6 | 11 | 1 | regress_rls_carol | great science fiction
+ 9 | 11 | 1 | regress_rls_dave | awesome science fiction
+ 4 | 55 | 1 | regress_rls_bob | my first satire
+(4 rows)
+
+COPY part_document TO stdout WITH (DELIMITER ',');
+1,11,1,regress_rls_bob,my first novel
+6,11,1,regress_rls_carol,great science fiction
+9,11,1,regress_rls_dave,awesome science fiction
+4,55,1,regress_rls_bob,my first satire
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
QUERY PLAN
-------------------------------------------------------------------------
@@ -1028,6 +1042,32 @@ NOTICE: f_leak => awesome technology book
10 | 99 | 2 | regress_rls_dave | awesome technology book
(10 rows)
+SELECT * FROM part_document ORDER BY tableoid;
+ did | cid | dlevel | dauthor | dtitle
+-----+-----+--------+-------------------+-------------------------
+ 1 | 11 | 1 | regress_rls_bob | my first novel
+ 2 | 11 | 2 | regress_rls_bob | my second novel
+ 6 | 11 | 1 | regress_rls_carol | great science fiction
+ 9 | 11 | 1 | regress_rls_dave | awesome science fiction
+ 4 | 55 | 1 | regress_rls_bob | my first satire
+ 8 | 55 | 2 | regress_rls_carol | great satire
+ 3 | 99 | 2 | regress_rls_bob | my science textbook
+ 5 | 99 | 2 | regress_rls_bob | my history book
+ 7 | 99 | 2 | regress_rls_carol | great technology book
+ 10 | 99 | 2 | regress_rls_dave | awesome technology book
+(10 rows)
+
+COPY part_document TO stdout WITH (DELIMITER ',');
+1,11,1,regress_rls_bob,my first novel
+2,11,2,regress_rls_bob,my second novel
+6,11,1,regress_rls_carol,great science fiction
+9,11,1,regress_rls_dave,awesome science fiction
+4,55,1,regress_rls_bob,my first satire
+8,55,2,regress_rls_carol,great satire
+3,99,2,regress_rls_bob,my science textbook
+5,99,2,regress_rls_bob,my history book
+7,99,2,regress_rls_carol,great technology book
+10,99,2,regress_rls_dave,awesome technology book
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
QUERY PLAN
-------------------------------------------------------------------------
@@ -1058,6 +1098,20 @@ NOTICE: f_leak => awesome science fiction
9 | 11 | 1 | regress_rls_dave | awesome science fiction
(4 rows)
+SELECT * FROM part_document ORDER BY tableoid;
+ did | cid | dlevel | dauthor | dtitle
+-----+-----+--------+-------------------+-------------------------
+ 1 | 11 | 1 | regress_rls_bob | my first novel
+ 2 | 11 | 2 | regress_rls_bob | my second novel
+ 6 | 11 | 1 | regress_rls_carol | great science fiction
+ 9 | 11 | 1 | regress_rls_dave | awesome science fiction
+(4 rows)
+
+COPY part_document TO stdout WITH (DELIMITER ',');
+1,11,1,regress_rls_bob,my first novel
+2,11,2,regress_rls_bob,my second novel
+6,11,1,regress_rls_carol,great science fiction
+9,11,1,regress_rls_dave,awesome science fiction
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
QUERY PLAN
----------------------------------------------------------------------------------
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index a1316c73bac..56d506ad4c6 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -405,3 +405,18 @@ COPY copytest_mv(id) TO stdout WITH (header);
REFRESH MATERIALIZED VIEW copytest_mv;
COPY copytest_mv(id) TO stdout WITH (header);
DROP MATERIALIZED VIEW copytest_mv;
+
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (id int, val int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+
+COPY pp TO stdout(header);
+DROP TABLE PP;
diff --git a/src/test/regress/sql/rowsecurity.sql b/src/test/regress/sql/rowsecurity.sql
index 21ac0ca51ee..d1306071070 100644
--- a/src/test/regress/sql/rowsecurity.sql
+++ b/src/test/regress/sql/rowsecurity.sql
@@ -362,16 +362,22 @@ SELECT * FROM pg_policies WHERE schemaname = 'regress_rls_schema' AND tablename
SET SESSION AUTHORIZATION regress_rls_bob;
SET row_security TO ON;
SELECT * FROM part_document WHERE f_leak(dtitle) ORDER BY did;
+SELECT * FROM part_document ORDER BY tableoid;
+COPY part_document TO stdout WITH (DELIMITER ',');
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
-- viewpoint from regress_rls_carol
SET SESSION AUTHORIZATION regress_rls_carol;
SELECT * FROM part_document WHERE f_leak(dtitle) ORDER BY did;
+SELECT * FROM part_document ORDER BY tableoid;
+COPY part_document TO stdout WITH (DELIMITER ',');
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
-- viewpoint from regress_rls_dave
SET SESSION AUTHORIZATION regress_rls_dave;
SELECT * FROM part_document WHERE f_leak(dtitle) ORDER BY did;
+SELECT * FROM part_document ORDER BY tableoid;
+COPY part_document TO stdout WITH (DELIMITER ',');
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
-- pp1 ERROR
--
2.34.1
On 2025-Oct-10, jian he wrote:
On Fri, Oct 10, 2025 at 6:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Yes, but I think it's more consistent given that we use the
parentheses in all other places in copyto.c.If you look at tablecmds.c, like ATExecSetNotNull, there are
parentheses and no parentheses cases.
Overall, I think less parentheses improves readability and makes the
code more future-proof.
I strive to remove those extra parentheses when I edit some part of the
code (though I may forget at times), but leave them alone from other
places that I'm not editing. I don't add them in new code. Most
likely, this is why ATExecSetNotNull has some cases with them and other
cases without.
Given sufficient time, the Postgres of Theseus would eventually have
zero of those extra parens.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Niemand ist mehr Sklave, als der sich für frei hält, ohne es zu sein."
Nadie está tan esclavizado como el que se cree libre no siéndolo
(Johann Wolfgang von Goethe)
On Fri, Oct 10, 2025 at 3:04 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
On 2025-Oct-10, jian he wrote:
On Fri, Oct 10, 2025 at 6:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Yes, but I think it's more consistent given that we use the
parentheses in all other places in copyto.c.If you look at tablecmds.c, like ATExecSetNotNull, there are
parentheses and no parentheses cases.
Overall, I think less parentheses improves readability and makes the
code more future-proof.I strive to remove those extra parentheses when I edit some part of the
code (though I may forget at times), but leave them alone from other
places that I'm not editing. I don't add them in new code. Most
likely, this is why ATExecSetNotNull has some cases with them and other
cases without.Given sufficient time, the Postgres of Theseus would eventually have
zero of those extra parens.
Thank you for the input. I didn't know some files or functions already
have mixed style. I'll use this style for future changes.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Fri, Oct 10, 2025 at 12:10 AM jian he <jian.universality@gmail.com> wrote:
On Fri, Oct 10, 2025 at 6:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
--- + relation_name = get_rel_name(childreloid); + ereport(ERROR, + errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot copy from foreign table \"%s\"", relation_name), + errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"", + relation_name, RelationGetRelationName(rel)), + errhint("Try the COPY (SELECT ...) TO variant."));I think we don't need "the" in the error message.
It's conventional to put all err*() macros in parentheses (i.e.,
"(errcode(), ...)", it's technically omittable though.https://www.postgresql.org/docs/current/error-message-reporting.html
QUOTE:
<<<<>>>>>
The extra parentheses were required before PostgreSQL version 12, but
are now optional.
Here is a more complex example:
.....
<<<<>>>>>related commit:
https://git.postgresql.org/cgit/postgresql.git/commit/?id=e3a87b4991cc2d00b7a3082abb54c5f12baedfd1
Less parenthesis is generally more readable, I think.Yes, but I think it's more consistent given that we use the
parentheses in all other places in copyto.c.If you look at tablecmds.c, like ATExecSetNotNull, there are
parentheses and no parentheses cases.
Overall, I think less parentheses improves readability and makes the
code more future-proof.
Understood.
--- +-- Test COPY TO with a foreign table or when the foreign table is a partition +COPY async_p3 TO stdout; --error +ERROR: cannot copy from foreign table "async_p3" +HINT: Try the COPY (SELECT ...) TO variant.async_p3 is a foreign table so it seems not related to this patch.
I replied in
/messages/by-id/CACJufxGkkMtRUJEbLczRnWp7x2YWqu4r1gEJEv9Po1UPxS6kGQ@mail.gmail.com
I kind of doubt anyone would submit a patch just to rewrite a coverage test for
the sake of coverage itself. While we're here, adding nearby coverage tests
should be fine?
For me, it's perfectly fine to have patches just for improving the
test coverage and I think we have had such patches ever. Given this
patch expands the supported relation kind, I guess it makes sense to
cover other cases as well in this patch (i.e., foreign tables and
sequences) or to have a separate patch to increase the overall test
coverage of copyto.c.
i just found out I ignored the case when partitioned tables have RLS.
when exporting a partitioned table,
find_all_inheritors will sort the returned partition by oid.
in DoCopy, we can do the same:
make a SortBy node for SelectStmt->sortClause also mark the
RangeVar->inh as true.
OR
ereport(ERRCODE_FEATURE_NOT_SUPPORTED...) for partitioned tables with RLS.please see the change I made in DoCopy.
Good catch. However, I guess adding a SortBy node with "tableoid"
doesn't necessarily work in the same way as the 'COPY TO' using
find_all_inheritors():
+ /*
+ * To COPY data from multiple partitions, we rely on the order of
+ * the partitions' tableoids, which matches the order produced by
+ * find_all_inheritors.
+ */
The table list returned by find_all_inheritors() is deterministic, but
it doesn't sort the whole list by their OIDs. If I understand
correctly, it retrieves all descendant tables in a BFS order. For
example, if I create the tables in the following sequence:
create table p (i int) partition by list (i);
create table c12 partition of p for values in (1, 2) partition by list (i);
create table c12_1 partition of c12 for values in (1);
create table c12_2 partition of c12 for values in (2);
create table c3 partition of p for values in (3);
insert into p values (1), (2), (3);
alter table p enable row level security;
create policy policy_p on p using (i > 0);
create user test_user;
grant select on table p to test_user;
I got the result without RLS:
copy p to stdout;
3
1
2
whereas I got the results with RLS:
copy p to stdout;
1
2
3
I think that adding SortBy doesn't help more than making the results
deterministic. Or we can re-sort the OID list returned by
find_all_inheritors() to match it. However, I'm not sure that we need
to make COPY TO results deterministic in the first place. It's not
guaranteed that the order of tuples returned from 'COPY TO rel' where
rel is not a partitioned table is sorted nor even deterministic (e.g.,
due to sync scans). If 'rel' is a partitioned table without RLS, the
order of tables to scan is deterministic but returned tuples within a
single partition is not. Given that sorting the whole results is not
cost free, I'm not sure that guaranteeing this ordering also for
partitioned tables with RLS would be useful for users.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Tue, Oct 14, 2025 at 4:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
+-- Test COPY TO with a foreign table or when the foreign table is a partition +COPY async_p3 TO stdout; --error +ERROR: cannot copy from foreign table "async_p3" +HINT: Try the COPY (SELECT ...) TO variant.async_p3 is a foreign table so it seems not related to this patch.
I replied in
/messages/by-id/CACJufxGkkMtRUJEbLczRnWp7x2YWqu4r1gEJEv9Po1UPxS6kGQ@mail.gmail.com
I kind of doubt anyone would submit a patch just to rewrite a coverage test for
the sake of coverage itself. While we're here, adding nearby coverage tests
should be fine?For me, it's perfectly fine to have patches just for improving the
test coverage and I think we have had such patches ever. Given this
patch expands the supported relation kind, I guess it makes sense to
cover other cases as well in this patch (i.e., foreign tables and
sequences) or to have a separate patch to increase the overall test
coverage of copyto.c.
Let's have a seperate patch to handle COPY test coverage.
i just found out I ignored the case when partitioned tables have RLS.
when exporting a partitioned table,
find_all_inheritors will sort the returned partition by oid.
in DoCopy, we can do the same:
make a SortBy node for SelectStmt->sortClause also mark the
RangeVar->inh as true.
OR
ereport(ERRCODE_FEATURE_NOT_SUPPORTED...) for partitioned tables with RLS.please see the change I made in DoCopy.
Good catch. However, I guess adding a SortBy node with "tableoid"
doesn't necessarily work in the same way as the 'COPY TO' using
find_all_inheritors():+ /* + * To COPY data from multiple partitions, we rely on the order of + * the partitions' tableoids, which matches the order produced by + * find_all_inheritors. + */The table list returned by find_all_inheritors() is deterministic, but
it doesn't sort the whole list by their OIDs. If I understand
correctly, it retrieves all descendant tables in a BFS order. For
example, if I create the tables in the following sequence:create table p (i int) partition by list (i);
create table c12 partition of p for values in (1, 2) partition by list (i);
create table c12_1 partition of c12 for values in (1);
create table c12_2 partition of c12 for values in (2);
create table c3 partition of p for values in (3);
insert into p values (1), (2), (3);
alter table p enable row level security;
create policy policy_p on p using (i > 0);
create user test_user;
grant select on table p to test_user;I got the result without RLS:
copy p to stdout;
3
1
2whereas I got the results with RLS:
copy p to stdout;
1
2
3I think that adding SortBy doesn't help more than making the results
deterministic. Or we can re-sort the OID list returned by
find_all_inheritors() to match it. However, I'm not sure that we need
to make COPY TO results deterministic in the first place. It's not
guaranteed that the order of tuples returned from 'COPY TO rel' where
rel is not a partitioned table is sorted nor even deterministic (e.g.,
due to sync scans). If 'rel' is a partitioned table without RLS, the
order of tables to scan is deterministic but returned tuples within a
single partition is not. Given that sorting the whole results is not
cost free, I'm not sure that guaranteeing this ordering also for
partitioned tables with RLS would be useful for users.
I removed the "SortBy node", and also double checked the patch again.
Please check the attached v18.
Attachments:
v18-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=UTF-8; name=v18-0001-support-COPY-partitioned_table-TO.patchDownload
From fc87e123872b60ccaa37c02b80bae6765d27f4a8 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Wed, 15 Oct 2025 09:44:00 +0800
Subject: [PATCH v18 1/1] support COPY partitioned_table TO
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This is for implementation of ``COPY (partitioned_table) TO``. it will be
faster than ``COPY (select * from partitioned_table) TO``.
If the destination table is a partitioned table, COPY table TO copies the same
rows as SELECT * FROM table.
reivewed by: vignesh C <vignesh21@gmail.com>
reivewed by: David Rowley <dgrowleyml@gmail.com>
reivewed by: Melih Mutlu <m.melihmutlu@gmail.com>
reivewed by: Kirill Reshke <reshkekirill@gmail.com>
reivewed by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
reivewed by: Álvaro Herrera <alvherre@kurilemu.de>
reivewed by: Masahiko Sawada <sawada.mshk@gmail.com>
reivewed by: Chao Li <li.evan.chao@gmail.com>
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5467
---
.../postgres_fdw/expected/postgres_fdw.out | 5 +
contrib/postgres_fdw/sql/postgres_fdw.sql | 3 +
doc/src/sgml/ref/copy.sgml | 9 +-
src/backend/commands/copy.c | 6 +
src/backend/commands/copyto.c | 153 ++++++++++++++----
src/test/regress/expected/copy.out | 18 +++
src/test/regress/expected/rowsecurity.out | 21 +++
src/test/regress/sql/copy.sql | 15 ++
src/test/regress/sql/rowsecurity.sql | 3 +
9 files changed, 200 insertions(+), 33 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 91bbd0d8c73..cd28126049d 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11599,6 +11599,11 @@ SELECT * FROM result_tbl ORDER BY a;
(3 rows)
DELETE FROM result_tbl;
+-- Test COPY TO when foreign table is partition
+COPY async_pt TO stdout; --error
+ERROR: cannot copy from foreign table "async_p1"
+DETAIL: Partition "async_p1" is a foreign table in partitioned table "async_pt"
+HINT: Try the COPY (SELECT ...) TO variant.
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
-- Check case where the partitioned table has local/remote partitions
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 3b7da128519..9a8f9e28135 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3941,6 +3941,9 @@ INSERT INTO result_tbl SELECT * FROM async_pt WHERE b === 505;
SELECT * FROM result_tbl ORDER BY a;
DELETE FROM result_tbl;
+-- Test COPY TO when foreign table is partition
+COPY async_pt TO stdout; --error
+
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c2d1fbc1fbe..ecd300097fc 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -539,13 +539,16 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used with plain
- tables and populated materialized views.
- For example,
+ tables, populated materialized views and partitioned tables.
+ For example, if <replaceable class="parameter">table</replaceable> is a plain table,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal> copies the same rows as
<literal>SELECT * FROM ONLY <replaceable class="parameter">table</replaceable></literal>.
+ If <replaceable class="parameter">table</replaceable> is a partitioned table or a materialized view
+ <literal>COPY <replaceable class="parameter">table</replaceable> TO</literal>
+ copies the same rows as <literal>SELECT * FROM <replaceable class="parameter">table</replaceable></literal>.
However it doesn't directly support other relation types,
- such as partitioned tables, inheritance child tables, or views.
+ such as inheritance child tables, or views.
To copy all rows from such relations, use <literal>COPY (SELECT * FROM
<replaceable class="parameter">table</replaceable>) TO</literal>.
</para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fae9c41db65..9e12adb81a1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -251,11 +251,17 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
* relation which we have opened and locked. Use "ONLY" so that
* COPY retrieves rows from only the target table not any
* inheritance children, the same as when RLS doesn't apply.
+ *
+ * However, when COPY data from a partitioned table, we should not
+ * use "ONLY", since we also need to retrieve rows from its child
+ * partitions too.
*/
from = makeRangeVar(get_namespace_name(RelationGetNamespace(rel)),
pstrdup(RelationGetRelationName(rel)),
-1);
from->inh = false; /* apply ONLY */
+ if (get_rel_relkind(relid) == RELKIND_PARTITIONED_TABLE)
+ from->inh = true;
/* Build query */
select = makeNode(SelectStmt);
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e5781155cdf..74497240105 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -18,7 +18,9 @@
#include <unistd.h>
#include <sys/stat.h>
+#include "access/table.h"
#include "access/tableam.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ List *partitions; /* OID list of partitions to copy data from */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -116,6 +119,8 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed);
/* built-in format-specific routines */
static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -602,6 +607,10 @@ EndCopy(CopyToState cstate)
pgstat_progress_end_command();
MemoryContextDelete(cstate->copycontext);
+
+ if (cstate->partitions)
+ list_free(cstate->partitions);
+
pfree(cstate);
}
@@ -643,6 +652,7 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -673,11 +683,34 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ /*
+ * Collect OIDs of relation containing data, so that later
+ * DoCopyTo can copy the data from them.
+ */
+ children = find_all_inheritors(RelationGetRelid(rel), AccessShareLock, NULL);
+
+ foreach_oid(child, children)
+ {
+ char relkind = get_rel_relkind(child);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ {
+ char *relation_name = get_rel_name(child);
+
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"", relation_name),
+ errdetail("Partition \"%s\" is a foreign table in partitioned table \"%s\"",
+ relation_name, RelationGetRelationName(rel)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+ }
+
+ /* Exclude tables with no data */
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ children = foreach_delete_current(children, child);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -713,6 +746,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = children;
}
else
{
@@ -722,6 +756,7 @@ BeginCopyTo(ParseState *pstate,
DestReceiver *dest;
cstate->rel = NULL;
+ cstate->partitions = NIL;
/*
* Run parse analysis and rewrite. Note this also acquires sufficient
@@ -1030,7 +1065,7 @@ DoCopyTo(CopyToState cstate)
TupleDesc tupDesc;
int num_phys_attrs;
ListCell *cur;
- uint64 processed;
+ uint64 processed = 0;
if (fe_copy)
SendCopyBegin(cstate);
@@ -1070,33 +1105,24 @@ DoCopyTo(CopyToState cstate)
if (cstate->rel)
{
- TupleTableSlot *slot;
- TableScanDesc scandesc;
-
- scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
- slot = table_slot_create(cstate->rel, NULL);
-
- processed = 0;
- while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ /*
+ * If COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
- CHECK_FOR_INTERRUPTS();
+ foreach_oid(child, cstate->partitions)
+ {
+ Relation scan_rel;
- /* Deconstruct the tuple ... */
- slot_getallattrs(slot);
-
- /* Format and send the data */
- CopyOneRowTo(cstate, slot);
-
- /*
- * Increment the number of processed tuples, and report the
- * progress.
- */
- pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
- ++processed);
+ /* We already got the lock in BeginCopyTo */
+ scan_rel = table_open(child, NoLock);
+ CopyRelationTo(cstate, scan_rel, cstate->rel, &processed);
+ table_close(scan_rel, NoLock);
+ }
}
-
- ExecDropSingleTupleTableSlot(slot);
- table_endscan(scandesc);
+ else
+ CopyRelationTo(cstate, cstate->rel, NULL, &processed);
}
else
{
@@ -1115,6 +1141,73 @@ DoCopyTo(CopyToState cstate)
return processed;
}
+/*
+ * Scans a single table and exports its rows to the COPY destination.
+ *
+ * root_rel can be set to the root table of rel if rel is a partition
+ * table so that we can send tuples in root_rel's rowtype, which might
+ * differ from individual partitions.
+*/
+static void
+CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *processed)
+{
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ AttrMap *map = NULL;
+ TupleTableSlot *root_slot = NULL;
+
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(rel, NULL);
+
+ /*
+ * A partition's rowtype might differ from the root table's. If we are
+ * exporting partition data here, we must convert it back to the root
+ * table's rowtype.
+ */
+ if (root_rel != NULL)
+ {
+ root_slot = table_slot_create(root_rel, NULL);
+ map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+ RelationGetDescr(rel),
+ false);
+ }
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ TupleTableSlot *copyslot;
+
+ CHECK_FOR_INTERRUPTS();
+
+ if (map != NULL)
+ copyslot = execute_attr_map_slot(map, slot, root_slot);
+ else
+ {
+ /* Deconstruct the tuple ... */
+ slot_getallattrs(slot);
+ copyslot = slot;
+ }
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, copyslot);
+
+ /*
+ * Increment the number of processed tuples, and report the progress.
+ */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++(*processed));
+ }
+
+ ExecDropSingleTupleTableSlot(slot);
+
+ if (root_slot != NULL)
+ ExecDropSingleTupleTableSlot(root_slot);
+
+ if (map != NULL)
+ free_attrmap(map);
+
+ table_endscan(scandesc);
+}
+
/*
* Emit one row during DoCopyTo().
*/
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index ac66eb55aee..af01e84cea1 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -373,3 +373,21 @@ COPY copytest_mv(id) TO stdout WITH (header);
id
1
DROP MATERIALIZED VIEW copytest_mv;
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (id int, val int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+COPY pp TO stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 5a172c5d91c..42b78a24603 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -986,6 +986,11 @@ NOTICE: f_leak => my first satire
9 | 11 | 1 | regress_rls_dave | awesome science fiction
(4 rows)
+COPY part_document TO stdout WITH (DELIMITER ',');
+1,11,1,regress_rls_bob,my first novel
+6,11,1,regress_rls_carol,great science fiction
+9,11,1,regress_rls_dave,awesome science fiction
+4,55,1,regress_rls_bob,my first satire
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
QUERY PLAN
-------------------------------------------------------------------------
@@ -1028,6 +1033,17 @@ NOTICE: f_leak => awesome technology book
10 | 99 | 2 | regress_rls_dave | awesome technology book
(10 rows)
+COPY part_document TO stdout WITH (DELIMITER ',');
+1,11,1,regress_rls_bob,my first novel
+2,11,2,regress_rls_bob,my second novel
+6,11,1,regress_rls_carol,great science fiction
+9,11,1,regress_rls_dave,awesome science fiction
+4,55,1,regress_rls_bob,my first satire
+8,55,2,regress_rls_carol,great satire
+3,99,2,regress_rls_bob,my science textbook
+5,99,2,regress_rls_bob,my history book
+7,99,2,regress_rls_carol,great technology book
+10,99,2,regress_rls_dave,awesome technology book
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
QUERY PLAN
-------------------------------------------------------------------------
@@ -1058,6 +1074,11 @@ NOTICE: f_leak => awesome science fiction
9 | 11 | 1 | regress_rls_dave | awesome science fiction
(4 rows)
+COPY part_document TO stdout WITH (DELIMITER ',');
+1,11,1,regress_rls_bob,my first novel
+2,11,2,regress_rls_bob,my second novel
+6,11,1,regress_rls_carol,great science fiction
+9,11,1,regress_rls_dave,awesome science fiction
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
QUERY PLAN
----------------------------------------------------------------------------------
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index a1316c73bac..56d506ad4c6 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -405,3 +405,18 @@ COPY copytest_mv(id) TO stdout WITH (header);
REFRESH MATERIALIZED VIEW copytest_mv;
COPY copytest_mv(id) TO stdout WITH (header);
DROP MATERIALIZED VIEW copytest_mv;
+
+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (id int, val int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+
+COPY pp TO stdout(header);
+DROP TABLE PP;
diff --git a/src/test/regress/sql/rowsecurity.sql b/src/test/regress/sql/rowsecurity.sql
index 21ac0ca51ee..2d1be543391 100644
--- a/src/test/regress/sql/rowsecurity.sql
+++ b/src/test/regress/sql/rowsecurity.sql
@@ -362,16 +362,19 @@ SELECT * FROM pg_policies WHERE schemaname = 'regress_rls_schema' AND tablename
SET SESSION AUTHORIZATION regress_rls_bob;
SET row_security TO ON;
SELECT * FROM part_document WHERE f_leak(dtitle) ORDER BY did;
+COPY part_document TO stdout WITH (DELIMITER ',');
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
-- viewpoint from regress_rls_carol
SET SESSION AUTHORIZATION regress_rls_carol;
SELECT * FROM part_document WHERE f_leak(dtitle) ORDER BY did;
+COPY part_document TO stdout WITH (DELIMITER ',');
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
-- viewpoint from regress_rls_dave
SET SESSION AUTHORIZATION regress_rls_dave;
SELECT * FROM part_document WHERE f_leak(dtitle) ORDER BY did;
+COPY part_document TO stdout WITH (DELIMITER ',');
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
-- pp1 ERROR
--
2.34.1
On Tue, Oct 14, 2025 at 6:53 PM jian he <jian.universality@gmail.com> wrote:
On Tue, Oct 14, 2025 at 4:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
+-- Test COPY TO with a foreign table or when the foreign table is a partition +COPY async_p3 TO stdout; --error +ERROR: cannot copy from foreign table "async_p3" +HINT: Try the COPY (SELECT ...) TO variant.async_p3 is a foreign table so it seems not related to this patch.
I replied in
/messages/by-id/CACJufxGkkMtRUJEbLczRnWp7x2YWqu4r1gEJEv9Po1UPxS6kGQ@mail.gmail.com
I kind of doubt anyone would submit a patch just to rewrite a coverage test for
the sake of coverage itself. While we're here, adding nearby coverage tests
should be fine?For me, it's perfectly fine to have patches just for improving the
test coverage and I think we have had such patches ever. Given this
patch expands the supported relation kind, I guess it makes sense to
cover other cases as well in this patch (i.e., foreign tables and
sequences) or to have a separate patch to increase the overall test
coverage of copyto.c.Let's have a seperate patch to handle COPY test coverage.
i just found out I ignored the case when partitioned tables have RLS.
when exporting a partitioned table,
find_all_inheritors will sort the returned partition by oid.
in DoCopy, we can do the same:
make a SortBy node for SelectStmt->sortClause also mark the
RangeVar->inh as true.
OR
ereport(ERRCODE_FEATURE_NOT_SUPPORTED...) for partitioned tables with RLS.please see the change I made in DoCopy.
Good catch. However, I guess adding a SortBy node with "tableoid"
doesn't necessarily work in the same way as the 'COPY TO' using
find_all_inheritors():+ /* + * To COPY data from multiple partitions, we rely on the order of + * the partitions' tableoids, which matches the order produced by + * find_all_inheritors. + */The table list returned by find_all_inheritors() is deterministic, but
it doesn't sort the whole list by their OIDs. If I understand
correctly, it retrieves all descendant tables in a BFS order. For
example, if I create the tables in the following sequence:create table p (i int) partition by list (i);
create table c12 partition of p for values in (1, 2) partition by list (i);
create table c12_1 partition of c12 for values in (1);
create table c12_2 partition of c12 for values in (2);
create table c3 partition of p for values in (3);
insert into p values (1), (2), (3);
alter table p enable row level security;
create policy policy_p on p using (i > 0);
create user test_user;
grant select on table p to test_user;I got the result without RLS:
copy p to stdout;
3
1
2whereas I got the results with RLS:
copy p to stdout;
1
2
3I think that adding SortBy doesn't help more than making the results
deterministic. Or we can re-sort the OID list returned by
find_all_inheritors() to match it. However, I'm not sure that we need
to make COPY TO results deterministic in the first place. It's not
guaranteed that the order of tuples returned from 'COPY TO rel' where
rel is not a partitioned table is sorted nor even deterministic (e.g.,
due to sync scans). If 'rel' is a partitioned table without RLS, the
order of tables to scan is deterministic but returned tuples within a
single partition is not. Given that sorting the whole results is not
cost free, I'm not sure that guaranteeing this ordering also for
partitioned tables with RLS would be useful for users.I removed the "SortBy node", and also double checked the patch again.
Please check the attached v18.
Thank you for updating the patch!
I've reviewed the patch and here is one review comment:
from->inh = false; /* apply ONLY */
+ if (get_rel_relkind(relid) == RELKIND_PARTITIONED_TABLE)
+ from->inh = true;
It's better to check rel->rd_rel->relkind instead of calling
get_rel_relkind() as it checks syscache.
I've attached a patch to fix the above and includes some cosmetic
changes. Please review it.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachments:
v18_masahiko_fix.patchapplication/octet-stream; name=v18_masahiko_fix.patchDownload
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 9e12adb81a1..eac501753c8 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -252,16 +252,14 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
* COPY retrieves rows from only the target table not any
* inheritance children, the same as when RLS doesn't apply.
*
- * However, when COPY data from a partitioned table, we should not
- * use "ONLY", since we also need to retrieve rows from its child
- * partitions too.
+ * However, when copying data from a partitioned table, we don't
+ * not use "ONLY", since we need to retrieve rows from its
+ * descendant tables too.
*/
from = makeRangeVar(get_namespace_name(RelationGetNamespace(rel)),
pstrdup(RelationGetRelationName(rel)),
-1);
- from->inh = false; /* apply ONLY */
- if (get_rel_relkind(relid) == RELKIND_PARTITIONED_TABLE)
- from->inh = true;
+ from->inh = (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
/* Build query */
select = makeNode(SelectStmt);
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 74497240105..a1919c6db43 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -84,11 +84,11 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
- List *partitions; /* OID list of partitions to copy data from */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
Node *whereClause; /* WHERE condition (or NULL) */
+ List *partitions; /* OID list of partitions to copy data from */
/*
* Working state
@@ -1160,9 +1160,9 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
slot = table_slot_create(rel, NULL);
/*
- * A partition's rowtype might differ from the root table's. If we are
- * exporting partition data here, we must convert it back to the root
- * table's rowtype.
+ * If we are exporting partition data here, we check if converting tuples
+ * to the root table's rowtype, because a partition might have column
+ * order different than its root table.
*/
if (root_rel != NULL)
{
@@ -1182,7 +1182,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
copyslot = execute_attr_map_slot(map, slot, root_slot);
else
{
- /* Deconstruct the tuple ... */
+ /* Deconstruct the tuple */
slot_getallattrs(slot);
copyslot = slot;
}
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index af01e84cea1..24e0f472f14 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -374,6 +374,8 @@ id
1
DROP MATERIALIZED VIEW copytest_mv;
-- Tests for COPY TO with partitioned tables.
+-- The child table pp_2 has a different column order than the root table pp.
+-- Check if COPY TO exports tuples as the root table's column order.
CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
CREATE TABLE pp_2 (id int, val int) PARTITION BY RANGE (id);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 56d506ad4c6..676a8b342b5 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -407,16 +407,15 @@ COPY copytest_mv(id) TO stdout WITH (header);
DROP MATERIALIZED VIEW copytest_mv;
-- Tests for COPY TO with partitioned tables.
+-- The child table pp_2 has a different column order than the root table pp.
+-- Check if COPY TO exports tuples as the root table's column order.
CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
CREATE TABLE pp_2 (id int, val int) PARTITION BY RANGE (id);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
-
CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
-
INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
-
COPY pp TO stdout(header);
DROP TABLE PP;
On Thu, Oct 16, 2025 at 9:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Please check the attached v18.
Thank you for updating the patch!
I've reviewed the patch and here is one review comment:
from->inh = false; /* apply ONLY */ + if (get_rel_relkind(relid) == RELKIND_PARTITIONED_TABLE) + from->inh = true;It's better to check rel->rd_rel->relkind instead of calling
get_rel_relkind() as it checks syscache.I've attached a patch to fix the above and includes some cosmetic
changes. Please review it.
hi.
overall looks good to me, thanks for polishing it.
+ * However, when copying data from a partitioned table, we don't
+ * not use "ONLY", since we need to retrieve rows from its
+ * descendant tables too.
I guess here it should be
"we don't use "ONLY"
?
I’ve incorporated your changes into v19.
Attachments:
v19-0001-support-COPY-partitioned_table-TO.patchtext/x-patch; charset=UTF-8; name=v19-0001-support-COPY-partitioned_table-TO.patchDownload
From 663cb20c6db29765421b8f0cd386d594e208e236 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 16 Oct 2025 10:53:51 +0800
Subject: [PATCH v19 1/1] support COPY partitioned_table TO
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This is for implementation of ``COPY (partitioned_table) TO``. it will be
faster than ``COPY (select * from partitioned_table) TO``.
If the destination table is a partitioned table, COPY table TO copies the same
rows as SELECT * FROM table.
reivewed by: vignesh C <vignesh21@gmail.com>
reivewed by: David Rowley <dgrowleyml@gmail.com>
reivewed by: Melih Mutlu <m.melihmutlu@gmail.com>
reivewed by: Kirill Reshke <reshkekirill@gmail.com>
reivewed by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
reivewed by: Álvaro Herrera <alvherre@kurilemu.de>
reivewed by: Masahiko Sawada <sawada.mshk@gmail.com>
reivewed by: Chao Li <li.evan.chao@gmail.com>
discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5467
---
.../postgres_fdw/expected/postgres_fdw.out | 5 +
contrib/postgres_fdw/sql/postgres_fdw.sql | 3 +
doc/src/sgml/ref/copy.sgml | 9 +-
src/backend/commands/copy.c | 6 +-
src/backend/commands/copyto.c | 153 ++++++++++++++----
src/test/regress/expected/copy.out | 20 +++
src/test/regress/expected/rowsecurity.out | 21 +++
src/test/regress/sql/copy.sql | 14 ++
src/test/regress/sql/rowsecurity.sql | 3 +
9 files changed, 200 insertions(+), 34 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 91bbd0d8c73..cd28126049d 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11599,6 +11599,11 @@ SELECT * FROM result_tbl ORDER BY a;
(3 rows)
DELETE FROM result_tbl;
+-- Test COPY TO when foreign table is partition
+COPY async_pt TO stdout; --error
+ERROR: cannot copy from foreign table "async_p1"
+DETAIL: Partition "async_p1" is a foreign table in partitioned table "async_pt"
+HINT: Try the COPY (SELECT ...) TO variant.
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
-- Check case where the partitioned table has local/remote partitions
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 3b7da128519..9a8f9e28135 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3941,6 +3941,9 @@ INSERT INTO result_tbl SELECT * FROM async_pt WHERE b === 505;
SELECT * FROM result_tbl ORDER BY a;
DELETE FROM result_tbl;
+-- Test COPY TO when foreign table is partition
+COPY async_pt TO stdout; --error
+
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c2d1fbc1fbe..ecd300097fc 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -539,13 +539,16 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used with plain
- tables and populated materialized views.
- For example,
+ tables, populated materialized views and partitioned tables.
+ For example, if <replaceable class="parameter">table</replaceable> is a plain table,
<literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal> copies the same rows as
<literal>SELECT * FROM ONLY <replaceable class="parameter">table</replaceable></literal>.
+ If <replaceable class="parameter">table</replaceable> is a partitioned table or a materialized view
+ <literal>COPY <replaceable class="parameter">table</replaceable> TO</literal>
+ copies the same rows as <literal>SELECT * FROM <replaceable class="parameter">table</replaceable></literal>.
However it doesn't directly support other relation types,
- such as partitioned tables, inheritance child tables, or views.
+ such as inheritance child tables, or views.
To copy all rows from such relations, use <literal>COPY (SELECT * FROM
<replaceable class="parameter">table</replaceable>) TO</literal>.
</para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fae9c41db65..ff443fb5605 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -251,11 +251,15 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
* relation which we have opened and locked. Use "ONLY" so that
* COPY retrieves rows from only the target table not any
* inheritance children, the same as when RLS doesn't apply.
+ *
+ * However, when copying data from a partitioned table, we don't use
+ * "ONLY", since we need to retrieve rows from its descendant tables
+ * too.
*/
from = makeRangeVar(get_namespace_name(RelationGetNamespace(rel)),
pstrdup(RelationGetRelationName(rel)),
-1);
- from->inh = false; /* apply ONLY */
+ from->inh = (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
/* Build query */
select = makeNode(SelectStmt);
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e5781155cdf..a1919c6db43 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -18,7 +18,9 @@
#include <unistd.h>
#include <sys/stat.h>
+#include "access/table.h"
#include "access/tableam.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -86,6 +88,7 @@ typedef struct CopyToStateData
CopyFormatOptions opts;
Node *whereClause; /* WHERE condition (or NULL) */
+ List *partitions; /* OID list of partitions to copy data from */
/*
* Working state
@@ -116,6 +119,8 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed);
/* built-in format-specific routines */
static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -602,6 +607,10 @@ EndCopy(CopyToState cstate)
pgstat_progress_end_command();
MemoryContextDelete(cstate->copycontext);
+
+ if (cstate->partitions)
+ list_free(cstate->partitions);
+
pfree(cstate);
}
@@ -643,6 +652,7 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -673,11 +683,34 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ /*
+ * Collect OIDs of relation containing data, so that later
+ * DoCopyTo can copy the data from them.
+ */
+ children = find_all_inheritors(RelationGetRelid(rel), AccessShareLock, NULL);
+
+ foreach_oid(child, children)
+ {
+ char relkind = get_rel_relkind(child);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ {
+ char *relation_name = get_rel_name(child);
+
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"", relation_name),
+ errdetail("Partition \"%s\" is a foreign table in partitioned table \"%s\"",
+ relation_name, RelationGetRelationName(rel)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+ }
+
+ /* Exclude tables with no data */
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ children = foreach_delete_current(children, child);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -713,6 +746,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = children;
}
else
{
@@ -722,6 +756,7 @@ BeginCopyTo(ParseState *pstate,
DestReceiver *dest;
cstate->rel = NULL;
+ cstate->partitions = NIL;
/*
* Run parse analysis and rewrite. Note this also acquires sufficient
@@ -1030,7 +1065,7 @@ DoCopyTo(CopyToState cstate)
TupleDesc tupDesc;
int num_phys_attrs;
ListCell *cur;
- uint64 processed;
+ uint64 processed = 0;
if (fe_copy)
SendCopyBegin(cstate);
@@ -1070,33 +1105,24 @@ DoCopyTo(CopyToState cstate)
if (cstate->rel)
{
- TupleTableSlot *slot;
- TableScanDesc scandesc;
-
- scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
- slot = table_slot_create(cstate->rel, NULL);
-
- processed = 0;
- while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ /*
+ * If COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
- CHECK_FOR_INTERRUPTS();
+ foreach_oid(child, cstate->partitions)
+ {
+ Relation scan_rel;
- /* Deconstruct the tuple ... */
- slot_getallattrs(slot);
-
- /* Format and send the data */
- CopyOneRowTo(cstate, slot);
-
- /*
- * Increment the number of processed tuples, and report the
- * progress.
- */
- pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
- ++processed);
+ /* We already got the lock in BeginCopyTo */
+ scan_rel = table_open(child, NoLock);
+ CopyRelationTo(cstate, scan_rel, cstate->rel, &processed);
+ table_close(scan_rel, NoLock);
+ }
}
-
- ExecDropSingleTupleTableSlot(slot);
- table_endscan(scandesc);
+ else
+ CopyRelationTo(cstate, cstate->rel, NULL, &processed);
}
else
{
@@ -1115,6 +1141,73 @@ DoCopyTo(CopyToState cstate)
return processed;
}
+/*
+ * Scans a single table and exports its rows to the COPY destination.
+ *
+ * root_rel can be set to the root table of rel if rel is a partition
+ * table so that we can send tuples in root_rel's rowtype, which might
+ * differ from individual partitions.
+*/
+static void
+CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *processed)
+{
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ AttrMap *map = NULL;
+ TupleTableSlot *root_slot = NULL;
+
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(rel, NULL);
+
+ /*
+ * If we are exporting partition data here, we check if converting tuples
+ * to the root table's rowtype, because a partition might have column
+ * order different than its root table.
+ */
+ if (root_rel != NULL)
+ {
+ root_slot = table_slot_create(root_rel, NULL);
+ map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+ RelationGetDescr(rel),
+ false);
+ }
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ TupleTableSlot *copyslot;
+
+ CHECK_FOR_INTERRUPTS();
+
+ if (map != NULL)
+ copyslot = execute_attr_map_slot(map, slot, root_slot);
+ else
+ {
+ /* Deconstruct the tuple */
+ slot_getallattrs(slot);
+ copyslot = slot;
+ }
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, copyslot);
+
+ /*
+ * Increment the number of processed tuples, and report the progress.
+ */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++(*processed));
+ }
+
+ ExecDropSingleTupleTableSlot(slot);
+
+ if (root_slot != NULL)
+ ExecDropSingleTupleTableSlot(root_slot);
+
+ if (map != NULL)
+ free_attrmap(map);
+
+ table_endscan(scandesc);
+}
+
/*
* Emit one row during DoCopyTo().
*/
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index ac66eb55aee..24e0f472f14 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -373,3 +373,23 @@ COPY copytest_mv(id) TO stdout WITH (header);
id
1
DROP MATERIALIZED VIEW copytest_mv;
+-- Tests for COPY TO with partitioned tables.
+-- The child table pp_2 has a different column order than the root table pp.
+-- Check if COPY TO exports tuples as the root table's column order.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (id int, val int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+COPY pp TO stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 5a172c5d91c..42b78a24603 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -986,6 +986,11 @@ NOTICE: f_leak => my first satire
9 | 11 | 1 | regress_rls_dave | awesome science fiction
(4 rows)
+COPY part_document TO stdout WITH (DELIMITER ',');
+1,11,1,regress_rls_bob,my first novel
+6,11,1,regress_rls_carol,great science fiction
+9,11,1,regress_rls_dave,awesome science fiction
+4,55,1,regress_rls_bob,my first satire
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
QUERY PLAN
-------------------------------------------------------------------------
@@ -1028,6 +1033,17 @@ NOTICE: f_leak => awesome technology book
10 | 99 | 2 | regress_rls_dave | awesome technology book
(10 rows)
+COPY part_document TO stdout WITH (DELIMITER ',');
+1,11,1,regress_rls_bob,my first novel
+2,11,2,regress_rls_bob,my second novel
+6,11,1,regress_rls_carol,great science fiction
+9,11,1,regress_rls_dave,awesome science fiction
+4,55,1,regress_rls_bob,my first satire
+8,55,2,regress_rls_carol,great satire
+3,99,2,regress_rls_bob,my science textbook
+5,99,2,regress_rls_bob,my history book
+7,99,2,regress_rls_carol,great technology book
+10,99,2,regress_rls_dave,awesome technology book
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
QUERY PLAN
-------------------------------------------------------------------------
@@ -1058,6 +1074,11 @@ NOTICE: f_leak => awesome science fiction
9 | 11 | 1 | regress_rls_dave | awesome science fiction
(4 rows)
+COPY part_document TO stdout WITH (DELIMITER ',');
+1,11,1,regress_rls_bob,my first novel
+2,11,2,regress_rls_bob,my second novel
+6,11,1,regress_rls_carol,great science fiction
+9,11,1,regress_rls_dave,awesome science fiction
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
QUERY PLAN
----------------------------------------------------------------------------------
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index a1316c73bac..676a8b342b5 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -405,3 +405,17 @@ COPY copytest_mv(id) TO stdout WITH (header);
REFRESH MATERIALIZED VIEW copytest_mv;
COPY copytest_mv(id) TO stdout WITH (header);
DROP MATERIALIZED VIEW copytest_mv;
+
+-- Tests for COPY TO with partitioned tables.
+-- The child table pp_2 has a different column order than the root table pp.
+-- Check if COPY TO exports tuples as the root table's column order.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (id int, val int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+COPY pp TO stdout(header);
+DROP TABLE PP;
diff --git a/src/test/regress/sql/rowsecurity.sql b/src/test/regress/sql/rowsecurity.sql
index 21ac0ca51ee..2d1be543391 100644
--- a/src/test/regress/sql/rowsecurity.sql
+++ b/src/test/regress/sql/rowsecurity.sql
@@ -362,16 +362,19 @@ SELECT * FROM pg_policies WHERE schemaname = 'regress_rls_schema' AND tablename
SET SESSION AUTHORIZATION regress_rls_bob;
SET row_security TO ON;
SELECT * FROM part_document WHERE f_leak(dtitle) ORDER BY did;
+COPY part_document TO stdout WITH (DELIMITER ',');
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
-- viewpoint from regress_rls_carol
SET SESSION AUTHORIZATION regress_rls_carol;
SELECT * FROM part_document WHERE f_leak(dtitle) ORDER BY did;
+COPY part_document TO stdout WITH (DELIMITER ',');
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
-- viewpoint from regress_rls_dave
SET SESSION AUTHORIZATION regress_rls_dave;
SELECT * FROM part_document WHERE f_leak(dtitle) ORDER BY did;
+COPY part_document TO stdout WITH (DELIMITER ',');
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
-- pp1 ERROR
--
2.34.1
On Wed, Oct 15, 2025 at 7:57 PM jian he <jian.universality@gmail.com> wrote:
On Thu, Oct 16, 2025 at 9:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Please check the attached v18.
Thank you for updating the patch!
I've reviewed the patch and here is one review comment:
from->inh = false; /* apply ONLY */ + if (get_rel_relkind(relid) == RELKIND_PARTITIONED_TABLE) + from->inh = true;It's better to check rel->rd_rel->relkind instead of calling
get_rel_relkind() as it checks syscache.I've attached a patch to fix the above and includes some cosmetic
changes. Please review it.hi.
overall looks good to me, thanks for polishing it.
+ * However, when copying data from a partitioned table, we don't + * not use "ONLY", since we need to retrieve rows from its + * descendant tables too.I guess here it should be
"we don't use "ONLY"
?
Right, thank you for pointing it out.
I’ve incorporated your changes into v19.
Thank you!
I think the patch is in good shape. I've slightly changed the
documentation changes and updated the commit message. I'm going to
push the attached patch, if there are no objections or further review
comments.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachments:
v20-0001-Support-COPY-TO-for-partitioned-tables.patchapplication/octet-stream; name=v20-0001-Support-COPY-TO-for-partitioned-tables.patchDownload
From 00c6d688da820f1d9794e96280d97c954e57e7be Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 16 Oct 2025 10:53:51 +0800
Subject: [PATCH v20] Support COPY TO for partitioned tables.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Previously, COPY TO command didn't support directly specifying
partitioned tables so users had to use COPY (SELECT ...) TO variant.
This commit adds direct COPY TO support for partitioned
tables, improving both usability and performance. Performance tests
show it's faster than the COPY (SELECT ...) TO variant as it avoids
the overheads of query processing and sending results to the COPY TO
command.
When used with partitioned tables, COPY TO copies the same rows as
SELECT * FROM table. Row-level security policies of the partitioned
table are applied in the same way as when executing COPY TO on a plain
table.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Melih Mutlu <m.melihmutlu@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CACJufxEZt%2BG19Ors3bQUq-42-61__C%3Dy5k2wk%3DsHEFRusu7%3DiQ%40mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 5 +
contrib/postgres_fdw/sql/postgres_fdw.sql | 3 +
doc/src/sgml/ref/copy.sgml | 11 +-
src/backend/commands/copy.c | 6 +-
src/backend/commands/copyto.c | 153 ++++++++++++++----
src/test/regress/expected/copy.out | 20 +++
src/test/regress/expected/rowsecurity.out | 21 +++
src/test/regress/sql/copy.sql | 14 ++
src/test/regress/sql/rowsecurity.sql | 3 +
9 files changed, 200 insertions(+), 36 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 91bbd0d8c73..cd28126049d 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11599,6 +11599,11 @@ SELECT * FROM result_tbl ORDER BY a;
(3 rows)
DELETE FROM result_tbl;
+-- Test COPY TO when foreign table is partition
+COPY async_pt TO stdout; --error
+ERROR: cannot copy from foreign table "async_p1"
+DETAIL: Partition "async_p1" is a foreign table in partitioned table "async_pt"
+HINT: Try the COPY (SELECT ...) TO variant.
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
-- Check case where the partitioned table has local/remote partitions
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 3b7da128519..9a8f9e28135 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3941,6 +3941,9 @@ INSERT INTO result_tbl SELECT * FROM async_pt WHERE b === 505;
SELECT * FROM result_tbl ORDER BY a;
DELETE FROM result_tbl;
+-- Test COPY TO when foreign table is partition
+COPY async_pt TO stdout; --error
+
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c2d1fbc1fbe..fdc24b36bb8 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -539,13 +539,14 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY TO</command> can be used with plain
- tables and populated materialized views.
- For example,
- <literal>COPY <replaceable class="parameter">table</replaceable>
- TO</literal> copies the same rows as
+ tables, populated materialized views, and partitioned tables.
+ For non-partitioned tables, COPY <replaceable class="parameter">table</replaceable>
+ copies the same rows as
<literal>SELECT * FROM ONLY <replaceable class="parameter">table</replaceable></literal>.
+ For partitioned tables, it copies the same rows as
+ <literal>SELECT * FROM <replaceable class="parameter">table</replaceable></literal>.
However it doesn't directly support other relation types,
- such as partitioned tables, inheritance child tables, or views.
+ such as inheritance child tables, or views.
To copy all rows from such relations, use <literal>COPY (SELECT * FROM
<replaceable class="parameter">table</replaceable>) TO</literal>.
</para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fae9c41db65..44020d0ae80 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -251,11 +251,15 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
* relation which we have opened and locked. Use "ONLY" so that
* COPY retrieves rows from only the target table not any
* inheritance children, the same as when RLS doesn't apply.
+ *
+ * However, when copying data from a partitioned table, we don't
+ * use "ONLY", since we need to retrieve rows from its descendant
+ * tables too.
*/
from = makeRangeVar(get_namespace_name(RelationGetNamespace(rel)),
pstrdup(RelationGetRelationName(rel)),
-1);
- from->inh = false; /* apply ONLY */
+ from->inh = (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
/* Build query */
select = makeNode(SelectStmt);
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e5781155cdf..a1919c6db43 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -18,7 +18,9 @@
#include <unistd.h>
#include <sys/stat.h>
+#include "access/table.h"
#include "access/tableam.h"
+#include "catalog/pg_inherits.h"
#include "commands/copyapi.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
@@ -86,6 +88,7 @@ typedef struct CopyToStateData
CopyFormatOptions opts;
Node *whereClause; /* WHERE condition (or NULL) */
+ List *partitions; /* OID list of partitions to copy data from */
/*
* Working state
@@ -116,6 +119,8 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed);
/* built-in format-specific routines */
static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -602,6 +607,10 @@ EndCopy(CopyToState cstate)
pgstat_progress_end_command();
MemoryContextDelete(cstate->copycontext);
+
+ if (cstate->partitions)
+ list_free(cstate->partitions);
+
pfree(cstate);
}
@@ -643,6 +652,7 @@ BeginCopyTo(ParseState *pstate,
PROGRESS_COPY_COMMAND_TO,
0
};
+ List *children = NIL;
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
@@ -673,11 +683,34 @@ BeginCopyTo(ParseState *pstate,
errmsg("cannot copy from sequence \"%s\"",
RelationGetRelationName(rel))));
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- ereport(ERROR,
- (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("cannot copy from partitioned table \"%s\"",
- RelationGetRelationName(rel)),
- errhint("Try the COPY (SELECT ...) TO variant.")));
+ {
+ /*
+ * Collect OIDs of relation containing data, so that later
+ * DoCopyTo can copy the data from them.
+ */
+ children = find_all_inheritors(RelationGetRelid(rel), AccessShareLock, NULL);
+
+ foreach_oid(child, children)
+ {
+ char relkind = get_rel_relkind(child);
+
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ {
+ char *relation_name = get_rel_name(child);
+
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from foreign table \"%s\"", relation_name),
+ errdetail("Partition \"%s\" is a foreign table in partitioned table \"%s\"",
+ relation_name, RelationGetRelationName(rel)),
+ errhint("Try the COPY (SELECT ...) TO variant."));
+ }
+
+ /* Exclude tables with no data */
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ children = foreach_delete_current(children, child);
+ }
+ }
else
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -713,6 +746,7 @@ BeginCopyTo(ParseState *pstate,
cstate->rel = rel;
tupDesc = RelationGetDescr(cstate->rel);
+ cstate->partitions = children;
}
else
{
@@ -722,6 +756,7 @@ BeginCopyTo(ParseState *pstate,
DestReceiver *dest;
cstate->rel = NULL;
+ cstate->partitions = NIL;
/*
* Run parse analysis and rewrite. Note this also acquires sufficient
@@ -1030,7 +1065,7 @@ DoCopyTo(CopyToState cstate)
TupleDesc tupDesc;
int num_phys_attrs;
ListCell *cur;
- uint64 processed;
+ uint64 processed = 0;
if (fe_copy)
SendCopyBegin(cstate);
@@ -1070,33 +1105,24 @@ DoCopyTo(CopyToState cstate)
if (cstate->rel)
{
- TupleTableSlot *slot;
- TableScanDesc scandesc;
-
- scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
- slot = table_slot_create(cstate->rel, NULL);
-
- processed = 0;
- while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ /*
+ * If COPY TO source table is a partitioned table, then open each
+ * partition and process each individual partition.
+ */
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
- CHECK_FOR_INTERRUPTS();
-
- /* Deconstruct the tuple ... */
- slot_getallattrs(slot);
-
- /* Format and send the data */
- CopyOneRowTo(cstate, slot);
+ foreach_oid(child, cstate->partitions)
+ {
+ Relation scan_rel;
- /*
- * Increment the number of processed tuples, and report the
- * progress.
- */
- pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
- ++processed);
+ /* We already got the lock in BeginCopyTo */
+ scan_rel = table_open(child, NoLock);
+ CopyRelationTo(cstate, scan_rel, cstate->rel, &processed);
+ table_close(scan_rel, NoLock);
+ }
}
-
- ExecDropSingleTupleTableSlot(slot);
- table_endscan(scandesc);
+ else
+ CopyRelationTo(cstate, cstate->rel, NULL, &processed);
}
else
{
@@ -1115,6 +1141,73 @@ DoCopyTo(CopyToState cstate)
return processed;
}
+/*
+ * Scans a single table and exports its rows to the COPY destination.
+ *
+ * root_rel can be set to the root table of rel if rel is a partition
+ * table so that we can send tuples in root_rel's rowtype, which might
+ * differ from individual partitions.
+*/
+static void
+CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *processed)
+{
+ TupleTableSlot *slot;
+ TableScanDesc scandesc;
+ AttrMap *map = NULL;
+ TupleTableSlot *root_slot = NULL;
+
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ slot = table_slot_create(rel, NULL);
+
+ /*
+ * If we are exporting partition data here, we check if converting tuples
+ * to the root table's rowtype, because a partition might have column
+ * order different than its root table.
+ */
+ if (root_rel != NULL)
+ {
+ root_slot = table_slot_create(root_rel, NULL);
+ map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+ RelationGetDescr(rel),
+ false);
+ }
+
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ TupleTableSlot *copyslot;
+
+ CHECK_FOR_INTERRUPTS();
+
+ if (map != NULL)
+ copyslot = execute_attr_map_slot(map, slot, root_slot);
+ else
+ {
+ /* Deconstruct the tuple */
+ slot_getallattrs(slot);
+ copyslot = slot;
+ }
+
+ /* Format and send the data */
+ CopyOneRowTo(cstate, copyslot);
+
+ /*
+ * Increment the number of processed tuples, and report the progress.
+ */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+ ++(*processed));
+ }
+
+ ExecDropSingleTupleTableSlot(slot);
+
+ if (root_slot != NULL)
+ ExecDropSingleTupleTableSlot(root_slot);
+
+ if (map != NULL)
+ free_attrmap(map);
+
+ table_endscan(scandesc);
+}
+
/*
* Emit one row during DoCopyTo().
*/
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index ac66eb55aee..24e0f472f14 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -373,3 +373,23 @@ COPY copytest_mv(id) TO stdout WITH (header);
id
1
DROP MATERIALIZED VIEW copytest_mv;
+-- Tests for COPY TO with partitioned tables.
+-- The child table pp_2 has a different column order than the root table pp.
+-- Check if COPY TO exports tuples as the root table's column order.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (id int, val int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+COPY pp TO stdout(header);
+id val
+1 11
+2 12
+3 13
+4 14
+5 15
+6 16
+DROP TABLE PP;
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 5a172c5d91c..42b78a24603 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -986,6 +986,11 @@ NOTICE: f_leak => my first satire
9 | 11 | 1 | regress_rls_dave | awesome science fiction
(4 rows)
+COPY part_document TO stdout WITH (DELIMITER ',');
+1,11,1,regress_rls_bob,my first novel
+6,11,1,regress_rls_carol,great science fiction
+9,11,1,regress_rls_dave,awesome science fiction
+4,55,1,regress_rls_bob,my first satire
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
QUERY PLAN
-------------------------------------------------------------------------
@@ -1028,6 +1033,17 @@ NOTICE: f_leak => awesome technology book
10 | 99 | 2 | regress_rls_dave | awesome technology book
(10 rows)
+COPY part_document TO stdout WITH (DELIMITER ',');
+1,11,1,regress_rls_bob,my first novel
+2,11,2,regress_rls_bob,my second novel
+6,11,1,regress_rls_carol,great science fiction
+9,11,1,regress_rls_dave,awesome science fiction
+4,55,1,regress_rls_bob,my first satire
+8,55,2,regress_rls_carol,great satire
+3,99,2,regress_rls_bob,my science textbook
+5,99,2,regress_rls_bob,my history book
+7,99,2,regress_rls_carol,great technology book
+10,99,2,regress_rls_dave,awesome technology book
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
QUERY PLAN
-------------------------------------------------------------------------
@@ -1058,6 +1074,11 @@ NOTICE: f_leak => awesome science fiction
9 | 11 | 1 | regress_rls_dave | awesome science fiction
(4 rows)
+COPY part_document TO stdout WITH (DELIMITER ',');
+1,11,1,regress_rls_bob,my first novel
+2,11,2,regress_rls_bob,my second novel
+6,11,1,regress_rls_carol,great science fiction
+9,11,1,regress_rls_dave,awesome science fiction
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
QUERY PLAN
----------------------------------------------------------------------------------
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index a1316c73bac..676a8b342b5 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -405,3 +405,17 @@ COPY copytest_mv(id) TO stdout WITH (header);
REFRESH MATERIALIZED VIEW copytest_mv;
COPY copytest_mv(id) TO stdout WITH (header);
DROP MATERIALIZED VIEW copytest_mv;
+
+-- Tests for COPY TO with partitioned tables.
+-- The child table pp_2 has a different column order than the root table pp.
+-- Check if COPY TO exports tuples as the root table's column order.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (id int, val int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+COPY pp TO stdout(header);
+DROP TABLE PP;
diff --git a/src/test/regress/sql/rowsecurity.sql b/src/test/regress/sql/rowsecurity.sql
index 21ac0ca51ee..2d1be543391 100644
--- a/src/test/regress/sql/rowsecurity.sql
+++ b/src/test/regress/sql/rowsecurity.sql
@@ -362,16 +362,19 @@ SELECT * FROM pg_policies WHERE schemaname = 'regress_rls_schema' AND tablename
SET SESSION AUTHORIZATION regress_rls_bob;
SET row_security TO ON;
SELECT * FROM part_document WHERE f_leak(dtitle) ORDER BY did;
+COPY part_document TO stdout WITH (DELIMITER ',');
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
-- viewpoint from regress_rls_carol
SET SESSION AUTHORIZATION regress_rls_carol;
SELECT * FROM part_document WHERE f_leak(dtitle) ORDER BY did;
+COPY part_document TO stdout WITH (DELIMITER ',');
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
-- viewpoint from regress_rls_dave
SET SESSION AUTHORIZATION regress_rls_dave;
SELECT * FROM part_document WHERE f_leak(dtitle) ORDER BY did;
+COPY part_document TO stdout WITH (DELIMITER ',');
EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
-- pp1 ERROR
--
2.47.3
On Thu, Oct 16, 2025 at 3:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Oct 15, 2025 at 7:57 PM jian he <jian.universality@gmail.com> wrote:
On Thu, Oct 16, 2025 at 9:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Please check the attached v18.
Thank you for updating the patch!
I've reviewed the patch and here is one review comment:
from->inh = false; /* apply ONLY */ + if (get_rel_relkind(relid) == RELKIND_PARTITIONED_TABLE) + from->inh = true;It's better to check rel->rd_rel->relkind instead of calling
get_rel_relkind() as it checks syscache.I've attached a patch to fix the above and includes some cosmetic
changes. Please review it.hi.
overall looks good to me, thanks for polishing it.
+ * However, when copying data from a partitioned table, we don't + * not use "ONLY", since we need to retrieve rows from its + * descendant tables too.I guess here it should be
"we don't use "ONLY"
?Right, thank you for pointing it out.
I’ve incorporated your changes into v19.
Thank you!
I think the patch is in good shape. I've slightly changed the
documentation changes and updated the commit message. I'm going to
push the attached patch, if there are no objections or further review
comments.
Pushed.
Regards
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com