Add support for tuple routing to foreign partitions
Hi,
Here is a patch for $subject. This is based on Amit's original patch
(patch '0007-Tuple-routing-for-partitioned-tables-15.patch' in [1]/messages/by-id/aa874eaf-cd3b-0f75-c647-6c0ea823781e@lab.ntt.co.jp).
Main changes are:
* I like Amit's idea that for each partition that is a foreign table,
the FDW performs query planning in the same way as in non-partition
cases, but the changes to make_modifytable() in the original patch that
creates a working-copy of Query to pass to PlanForeignModify() seem
unacceptable. So, I modified the changes so that we create
more-valid-looking copies of Query and ModifyTable with the foreign
partition as target, as I said before. In relation to this, I also
allowed expand_inherited_rtentry() to build an RTE and AppendRelInfo for
each partition of a partitioned table that is an INSERT target, as
mentioned by Amit in [1]/messages/by-id/aa874eaf-cd3b-0f75-c647-6c0ea823781e@lab.ntt.co.jp, by modifying transformInsertStmt() so that we
set the inh flag for the target table's RTE to true when the target
table is a partitioned table. The partition RTEs are not only
referenced by FDWs, but used in selecting relation aliases for EXPLAIN
(see below) as well as extracting plan dependencies in setref.c so that
we force rebuilding of the plan on any change to partitions. (We do
replanning on FDW table-option changes to foreign partitions, currently.
See regression tests in the attached patch.)
* The original patch added tuple routing to foreign partitions in COPY
FROM, but I'm not sure the changes to BeginCopy() in the patch are the
right way to go. For example, the patch passes a dummy empty Query to
PlanForeignModify() to make FDWs perform query planning, but I think
FDWs would be surprised by the Query. Currently, we don't support COPY
to foreign tables, so ISTM that it's better to revisit this issue when
adding support for COPY to foreign tables. So, I dropped the COPY part.
* I modified explain.c so that EXPLAIN for an INSERT into a partitioned
table displays partitions just below the ModifyTable node, and shows
remote queries for foreign partitions in the same way as EXPLAIN for
UPDATE/DELETE cases. Here is an example:
postgres=# explain verbose insert into pt values (1), (2);
QUERY PLAN
-------------------------------------------------------------------
Insert on public.pt (cost=0.00..0.03 rows=2 width=4)
Foreign Insert on public.fp1
Remote SQL: INSERT INTO public.t1(a) VALUES ($1)
Foreign Insert on public.fp2
Remote SQL: INSERT INTO public.t2(a) VALUES ($1)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4)
Output: "*VALUES*".column1
(7 rows)
Comments are welcome! Anyway, I'll add this to the next commitfest.
Best regards,
Etsuro Fujita
[1]: /messages/by-id/aa874eaf-cd3b-0f75-c647-6c0ea823781e@lab.ntt.co.jp
/messages/by-id/aa874eaf-cd3b-0f75-c647-6c0ea823781e@lab.ntt.co.jp
Attachments:
tuple-routing-to-foreign-partitions-v1.patchtext/plain; charset=UTF-8; name=tuple-routing-to-foreign-partitions-v1.patchDownload+552-108
Fujita-san,
On 2017/06/29 20:20, Etsuro Fujita wrote:
Hi,
Here is a patch for $subject. This is based on Amit's original patch
(patch '0007-Tuple-routing-for-partitioned-tables-15.patch' in [1]).
Thanks a lot for taking this up.
Main changes are:
* I like Amit's idea that for each partition that is a foreign table, the
FDW performs query planning in the same way as in non-partition cases, but
the changes to make_modifytable() in the original patch that creates a
working-copy of Query to pass to PlanForeignModify() seem unacceptable.
So, I modified the changes so that we create more-valid-looking copies of
Query and ModifyTable with the foreign partition as target, as I said
before.
This sounds good.
In relation to this, I also allowed expand_inherited_rtentry() to
build an RTE and AppendRelInfo for each partition of a partitioned table
that is an INSERT target, as mentioned by Amit in [1], by modifying
transformInsertStmt() so that we set the inh flag for the target table's
RTE to true when the target table is a partitioned table.
About this part, Robert sounded a little unsure back then [1]/messages/by-id/CA+TgmoYvp6DD1jpwb9sNe08E7jSFEFky32TJU+sJ8OStHYW3nA@mail.gmail.com; his words:
"Doing more inheritance expansion early is probably not a good idea
because it likely sucks for performance, and that's especially unfortunate
if it's only needed for foreign tables."
But...
The partition
RTEs are not only referenced by FDWs, but used in selecting relation
aliases for EXPLAIN (see below) as well as extracting plan dependencies in
setref.c so that we force rebuilding of the plan on any change to
partitions. (We do replanning on FDW table-option changes to foreign
partitions, currently. See regression tests in the attached patch.)
...this part means we cannot really avoid locking at least the foreign
partitions during the planning stage, which we currently don't, as we
don't look at partitions at all before ExecSetupPartitionTupleRouting() is
called by ExecInitModifyTable().
Since we are locking *all* the partitions anyway, it may be better to
shift the cost of locking to the planner by doing the inheritance
expansion even in the insert case (for partitioned tables) and avoid
calling the lock manager again in the executor when setting up
tuple-routing data structures (that is, in ExecSetupPartitionTupleRouting).
An idea that Robert recently mentioned on the nearby "UPDATE partition
key" thread [2]/messages/by-id/CA+TgmoajC0J50=2FqnZLvB10roY+68HgFWhso=V_StkC6PWujQ@mail.gmail.com seems relevant in this regard. By that idea,
expand_inherited_rtentry(), instead of reading in the partition OIDs in
the old catalog order, will instead call
RelationBuildPartitionDispatchInfo(), which locks the partitions and
returns their OIDs in the canonical order. If we store the foreign
partition RT indexes in that order in ModifyTable.partition_rels
introduced by this patch, then the following code added by the patch could
be made more efficient (as also explained in aforementioned Robert's email):
+ foreach(l, node->partition_rels)
+ {
+ Index rti = lfirst_int(l);
+ Oid relid = getrelid(rti, estate->es_range_table);
+ bool found;
+ int j;
+
+ /* First, find the ResultRelInfo for the partition */
+ found = false;
+ for (j = 0; j < mtstate->mt_num_partitions; j++)
+ {
+ resultRelInfo = partitions + j;
+
+ if (RelationGetRelid(resultRelInfo->ri_RelationDesc) ==
relid)
+ {
+ Assert(mtstate->mt_partition_indexes[i] == 0);
+ mtstate->mt_partition_indexes[i] = j;
+ found = true;
+ break;
+ }
+ }
+ if (!found)
+ elog(ERROR, "failed to find ResultRelInfo for rangetable
index %u", rti);
* The original patch added tuple routing to foreign partitions in COPY
FROM, but I'm not sure the changes to BeginCopy() in the patch are the
right way to go. For example, the patch passes a dummy empty Query to
PlanForeignModify() to make FDWs perform query planning, but I think FDWs
would be surprised by the Query. Currently, we don't support COPY to
foreign tables, so ISTM that it's better to revisit this issue when adding
support for COPY to foreign tables. So, I dropped the COPY part.
I agree.
Thanks,
Amit
[1]: /messages/by-id/CA+TgmoYvp6DD1jpwb9sNe08E7jSFEFky32TJU+sJ8OStHYW3nA@mail.gmail.com
/messages/by-id/CA+TgmoYvp6DD1jpwb9sNe08E7jSFEFky32TJU+sJ8OStHYW3nA@mail.gmail.com
[2]: /messages/by-id/CA+TgmoajC0J50=2FqnZLvB10roY+68HgFWhso=V_StkC6PWujQ@mail.gmail.com
/messages/by-id/CA+TgmoajC0J50=2FqnZLvB10roY+68HgFWhso=V_StkC6PWujQ@mail.gmail.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/07/05 9:13, Amit Langote wrote:
On 2017/06/29 20:20, Etsuro Fujita wrote:
In relation to this, I also allowed expand_inherited_rtentry() to
build an RTE and AppendRelInfo for each partition of a partitioned table
that is an INSERT target, as mentioned by Amit in [1], by modifying
transformInsertStmt() so that we set the inh flag for the target table's
RTE to true when the target table is a partitioned table.About this part, Robert sounded a little unsure back then [1]; his words:
"Doing more inheritance expansion early is probably not a good idea
because it likely sucks for performance, and that's especially unfortunate
if it's only needed for foreign tables."But...
The partition
RTEs are not only referenced by FDWs, but used in selecting relation
aliases for EXPLAIN (see below) as well as extracting plan dependencies in
setref.c so that we force rebuilding of the plan on any change to
partitions. (We do replanning on FDW table-option changes to foreign
partitions, currently. See regression tests in the attached patch.)...this part means we cannot really avoid locking at least the foreign
partitions during the planning stage, which we currently don't, as we
don't look at partitions at all before ExecSetupPartitionTupleRouting() is
called by ExecInitModifyTable().
Another case where we need inheritance expansion during the planning
stage would probably be INSERT .. ON CONFLICT into a partitioned table,
I guess. We don't support that yet, but if implementing that, I guess
we would probably need to look at each partition and do something like
infer_arbiter_indexes() for each partition during the planner stage.
Not sure.
Since we are locking *all* the partitions anyway, it may be better to
shift the cost of locking to the planner by doing the inheritance
expansion even in the insert case (for partitioned tables) and avoid
calling the lock manager again in the executor when setting up
tuple-routing data structures (that is, in ExecSetupPartitionTupleRouting).
We need the execution-time lock, anyway. See the comments from Robert
in [3]/messages/by-id/CA+TgmoYiwviCDRi3Zk+QuXj1r7uMu9T_kDNq+17PCWgzrbzw8A@mail.gmail.com.
An idea that Robert recently mentioned on the nearby "UPDATE partition
key" thread [2] seems relevant in this regard. By that idea,
expand_inherited_rtentry(), instead of reading in the partition OIDs in
the old catalog order, will instead call
RelationBuildPartitionDispatchInfo(), which locks the partitions and
returns their OIDs in the canonical order. If we store the foreign
partition RT indexes in that order in ModifyTable.partition_rels
introduced by this patch, then the following code added by the patch could
be made more efficient (as also explained in aforementioned Robert's email):+ foreach(l, node->partition_rels) + { + Index rti = lfirst_int(l); + Oid relid = getrelid(rti, estate->es_range_table); + bool found; + int j; + + /* First, find the ResultRelInfo for the partition */ + found = false; + for (j = 0; j < mtstate->mt_num_partitions; j++) + { + resultRelInfo = partitions + j; + + if (RelationGetRelid(resultRelInfo->ri_RelationDesc) == relid) + { + Assert(mtstate->mt_partition_indexes[i] == 0); + mtstate->mt_partition_indexes[i] = j; + found = true; + break; + } + } + if (!found) + elog(ERROR, "failed to find ResultRelInfo for rangetable index %u", rti);
Agreed. I really want to improve this based on that idea.
Thank you for the comments!
Best regards,
Etsuro Fujita
[3]: /messages/by-id/CA+TgmoYiwviCDRi3Zk+QuXj1r7uMu9T_kDNq+17PCWgzrbzw8A@mail.gmail.com
/messages/by-id/CA+TgmoYiwviCDRi3Zk+QuXj1r7uMu9T_kDNq+17PCWgzrbzw8A@mail.gmail.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 29, 2017 at 6:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
So, I dropped the COPY part.
Ouch. I think we should try to figure out how the COPY part will be
handled before we commit to a design.
I have to admit that I'm a little bit fuzzy about why foreign insert
routing requires all of these changes. I think this patch would
benefit from being accompanied by several paragraphs of explanation
outlining the rationale for each part of the patch.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/07/11 6:56, Robert Haas wrote:
On Thu, Jun 29, 2017 at 6:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:So, I dropped the COPY part.
Ouch. I think we should try to figure out how the COPY part will be
handled before we commit to a design.
I spent some time on this. To handle that, I'd like to propose doing
something similar to \copy (frontend copy): submit a COPY query "COPY
... FROM STDIN" to the remote server and route data from a file to the
remote server. For that, I'd like to add new FDW APIs called during
CopyFrom that allow us to copy to foreign tables:
* BeginForeignCopyIn: this would be called after creating a
ResultRelInfo for the target table (or each leaf partition of the target
partitioned table) if it's a foreign table, and perform any
initialization needed before the remote copy can start. In the
postgres_fdw case, I think this function would be a good place to send
"COPY ... FROM STDIN" to the remote server.
* ExecForeignCopyInOneRow: this would be called instead of heap_insert
if the target is a foreign table, and route the tuple read from the file
by NextCopyFrom to the remote server. In the postgres_fdw case, I think
this function would convert the tuple to text format for portability,
and then send the data to the remote server using PQputCopyData.
* EndForeignCopyIn: this would be called at the bottom of CopyFrom, and
release resources such as connections to the remote server. In the
postgres_fdw case, this function would do PQputCopyEnd to terminate data
transfer.
I think that would be much more efficient than INSERTing tuples into the
remote server one by one. What do you think about that?
I have to admit that I'm a little bit fuzzy about why foreign insert
routing requires all of these changes. I think this patch would
benefit from being accompanied by several paragraphs of explanation
outlining the rationale for each part of the patch.
Will do.
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 17, 2017 at 1:57 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
On 2017/07/11 6:56, Robert Haas wrote:
On Thu, Jun 29, 2017 at 6:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:So, I dropped the COPY part.
Ouch. I think we should try to figure out how the COPY part will be
handled before we commit to a design.I spent some time on this. To handle that, I'd like to propose doing
something similar to \copy (frontend copy): submit a COPY query "COPY ...
FROM STDIN" to the remote server and route data from a file to the remote
server. For that, I'd like to add new FDW APIs called during CopyFrom that
allow us to copy to foreign tables:
The description seems to support only COPY to a foreign table from a
file, but probably we need the support other way round as well. This
looks like a feature (support copy to and from a foreign table) to be
handled by itself.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/08/17 20:37, Ashutosh Bapat wrote:
On Thu, Aug 17, 2017 at 1:57 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:I spent some time on this. To handle that, I'd like to propose doing
something similar to \copy (frontend copy): submit a COPY query "COPY ...
FROM STDIN" to the remote server and route data from a file to the remote
server. For that, I'd like to add new FDW APIs called during CopyFrom that
allow us to copy to foreign tables:The description seems to support only COPY to a foreign table from a
file, but probably we need the support other way round as well. This
looks like a feature (support copy to and from a foreign table) to be
handled by itself.
Agreed. I'll consider how to handle copy-from-a-foreign-table as well.
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 17, 2017 at 05:27:05PM +0900, Etsuro Fujita wrote:
On 2017/07/11 6:56, Robert Haas wrote:
On Thu, Jun 29, 2017 at 6:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:So, I dropped the COPY part.
Ouch. I think we should try to figure out how the COPY part will be
handled before we commit to a design.I spent some time on this. To handle that, I'd like to propose doing
something similar to \copy (frontend copy): submit a COPY query "COPY ...
FROM STDIN" to the remote server and route data from a file to the remote
server. For that, I'd like to add new FDW APIs called during CopyFrom that
allow us to copy to foreign tables:* BeginForeignCopyIn: this would be called after creating a ResultRelInfo
for the target table (or each leaf partition of the target partitioned
table) if it's a foreign table, and perform any initialization needed before
the remote copy can start. In the postgres_fdw case, I think this function
would be a good place to send "COPY ... FROM STDIN" to the remote server.* ExecForeignCopyInOneRow: this would be called instead of heap_insert if
the target is a foreign table, and route the tuple read from the file by
NextCopyFrom to the remote server. In the postgres_fdw case, I think this
function would convert the tuple to text format for portability, and then
send the data to the remote server using PQputCopyData.* EndForeignCopyIn: this would be called at the bottom of CopyFrom, and
release resources such as connections to the remote server. In the
postgres_fdw case, this function would do PQputCopyEnd to terminate data
transfer.
These primitives look good. I know it seems unlikely at first blush,
but do we know of bulk load APIs for non-PostgreSQL data stores that
this would be unable to serve?
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/08/17 23:48, David Fetter wrote:
On Thu, Aug 17, 2017 at 05:27:05PM +0900, Etsuro Fujita wrote:
On 2017/07/11 6:56, Robert Haas wrote:
On Thu, Jun 29, 2017 at 6:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:So, I dropped the COPY part.
Ouch. I think we should try to figure out how the COPY part will be
handled before we commit to a design.I spent some time on this. To handle that, I'd like to propose doing
something similar to \copy (frontend copy): submit a COPY query "COPY ...
FROM STDIN" to the remote server and route data from a file to the remote
server. For that, I'd like to add new FDW APIs called during CopyFrom that
allow us to copy to foreign tables:* BeginForeignCopyIn: this would be called after creating a ResultRelInfo
for the target table (or each leaf partition of the target partitioned
table) if it's a foreign table, and perform any initialization needed before
the remote copy can start. In the postgres_fdw case, I think this function
would be a good place to send "COPY ... FROM STDIN" to the remote server.* ExecForeignCopyInOneRow: this would be called instead of heap_insert if
the target is a foreign table, and route the tuple read from the file by
NextCopyFrom to the remote server. In the postgres_fdw case, I think this
function would convert the tuple to text format for portability, and then
send the data to the remote server using PQputCopyData.* EndForeignCopyIn: this would be called at the bottom of CopyFrom, and
release resources such as connections to the remote server. In the
postgres_fdw case, this function would do PQputCopyEnd to terminate data
transfer.These primitives look good. I know it seems unlikely at first blush,
but do we know of bulk load APIs for non-PostgreSQL data stores that
this would be unable to serve?
Maybe I'm missing something, but I think these would allow the FDW to do
the remote copy the way it would like; in ExecForeignCopyInOneRow, for
example, the FDW could buffer tuples in a buffer memory and transmit the
buffered data to the remote server if the data size exceeds a threshold.
The naming is not so good? Suggestions are welcome.
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Aug 18, 2017 at 05:10:29PM +0900, Etsuro Fujita wrote:
On 2017/08/17 23:48, David Fetter wrote:
On Thu, Aug 17, 2017 at 05:27:05PM +0900, Etsuro Fujita wrote:
On 2017/07/11 6:56, Robert Haas wrote:
On Thu, Jun 29, 2017 at 6:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:So, I dropped the COPY part.
Ouch. I think we should try to figure out how the COPY part will be
handled before we commit to a design.I spent some time on this. To handle that, I'd like to propose doing
something similar to \copy (frontend copy): submit a COPY query "COPY ...
FROM STDIN" to the remote server and route data from a file to the remote
server. For that, I'd like to add new FDW APIs called during CopyFrom that
allow us to copy to foreign tables:* BeginForeignCopyIn: this would be called after creating a ResultRelInfo
for the target table (or each leaf partition of the target partitioned
table) if it's a foreign table, and perform any initialization needed before
the remote copy can start. In the postgres_fdw case, I think this function
would be a good place to send "COPY ... FROM STDIN" to the remote server.* ExecForeignCopyInOneRow: this would be called instead of heap_insert if
the target is a foreign table, and route the tuple read from the file by
NextCopyFrom to the remote server. In the postgres_fdw case, I think this
function would convert the tuple to text format for portability, and then
send the data to the remote server using PQputCopyData.* EndForeignCopyIn: this would be called at the bottom of CopyFrom, and
release resources such as connections to the remote server. In the
postgres_fdw case, this function would do PQputCopyEnd to terminate data
transfer.These primitives look good. I know it seems unlikely at first
blush, but do we know of bulk load APIs for non-PostgreSQL data
stores that this would be unable to serve?Maybe I'm missing something, but I think these would allow the FDW
to do the remote copy the way it would like; in
ExecForeignCopyInOneRow, for example, the FDW could buffer tuples in
a buffer memory and transmit the buffered data to the remote server
if the data size exceeds a threshold. The naming is not so good?
Suggestions are welcome.
The naming seems reasonable.
I was trying to figure out whether this fits well enough with the bulk
load APIs for databases other than PostgreSQL. I'm guessing it's
"well enough" based on checking MySQL, Oracle, and MS SQL Server.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 17, 2017 at 7:58 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
The description seems to support only COPY to a foreign table from a
file, but probably we need the support other way round as well. This
looks like a feature (support copy to and from a foreign table) to be
handled by itself.Agreed. I'll consider how to handle copy-from-a-foreign-table as well.
That's a completely different feature which has nothing to do with
tuple routing.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 17, 2017 at 4:27 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
I think that would be much more efficient than INSERTing tuples into the
remote server one by one. What do you think about that?
I agree, but I wonder if we ought to make it work first using the
existing APIs and then add these new APIs as an optimization.
postgres_fdw isn't the only FDW in the world, and it's better if
getting a working implementation doesn't require too many interfaces.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/08/18 22:41, David Fetter wrote:
On Fri, Aug 18, 2017 at 05:10:29PM +0900, Etsuro Fujita wrote:
On 2017/08/17 23:48, David Fetter wrote:
On Thu, Aug 17, 2017 at 05:27:05PM +0900, Etsuro Fujita wrote:
On 2017/07/11 6:56, Robert Haas wrote:
On Thu, Jun 29, 2017 at 6:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:So, I dropped the COPY part.
Ouch. I think we should try to figure out how the COPY part will be
handled before we commit to a design.I spent some time on this. To handle that, I'd like to propose doing
something similar to \copy (frontend copy): submit a COPY query "COPY ...
FROM STDIN" to the remote server and route data from a file to the remote
server. For that, I'd like to add new FDW APIs called during CopyFrom that
allow us to copy to foreign tables:* BeginForeignCopyIn: this would be called after creating a ResultRelInfo
for the target table (or each leaf partition of the target partitioned
table) if it's a foreign table, and perform any initialization needed before
the remote copy can start. In the postgres_fdw case, I think this function
would be a good place to send "COPY ... FROM STDIN" to the remote server.* ExecForeignCopyInOneRow: this would be called instead of heap_insert if
the target is a foreign table, and route the tuple read from the file by
NextCopyFrom to the remote server. In the postgres_fdw case, I think this
function would convert the tuple to text format for portability, and then
send the data to the remote server using PQputCopyData.* EndForeignCopyIn: this would be called at the bottom of CopyFrom, and
release resources such as connections to the remote server. In the
postgres_fdw case, this function would do PQputCopyEnd to terminate data
transfer.These primitives look good. I know it seems unlikely at first
blush, but do we know of bulk load APIs for non-PostgreSQL data
stores that this would be unable to serve?Maybe I'm missing something, but I think these would allow the FDW
to do the remote copy the way it would like; in
ExecForeignCopyInOneRow, for example, the FDW could buffer tuples in
a buffer memory and transmit the buffered data to the remote server
if the data size exceeds a threshold. The naming is not so good?
Suggestions are welcome.The naming seems reasonable.
I was trying to figure out whether this fits well enough with the bulk
load APIs for databases other than PostgreSQL. I'm guessing it's
"well enough" based on checking MySQL, Oracle, and MS SQL Server.
Good to know.
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/08/19 2:12, Robert Haas wrote:
On Thu, Aug 17, 2017 at 4:27 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:I think that would be much more efficient than INSERTing tuples into the
remote server one by one. What do you think about that?I agree, but I wonder if we ought to make it work first using the
existing APIs and then add these new APIs as an optimization.
I'm not sure that's a good idea because that once we support INSERT
tuple-routing for foreign partitions, we would have a workaround: INSERT
INTO partitioned_table SELECT * from data_table where data_table is a
foreign table defined for an input file using file_fdw.
postgres_fdw isn't the only FDW in the world, and it's better if
getting a working implementation doesn't require too many interfaces.
Agreed. My vote would be to leave the COPY part as-is until we have
these new APIs.
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Aug 20, 2017 at 11:25 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
I agree, but I wonder if we ought to make it work first using the
existing APIs and then add these new APIs as an optimization.I'm not sure that's a good idea because that once we support INSERT
tuple-routing for foreign partitions, we would have a workaround: INSERT
INTO partitioned_table SELECT * from data_table where data_table is a
foreign table defined for an input file using file_fdw.
That's true, but I don't see how it refutes the point I was trying to raise.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/08/26 1:43, Robert Haas wrote:
On Sun, Aug 20, 2017 at 11:25 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:I agree, but I wonder if we ought to make it work first using the
existing APIs and then add these new APIs as an optimization.I'm not sure that's a good idea because that once we support INSERT
tuple-routing for foreign partitions, we would have a workaround: INSERT
INTO partitioned_table SELECT * from data_table where data_table is a
foreign table defined for an input file using file_fdw.That's true, but I don't see how it refutes the point I was trying to raise.
My concern is: the existing APIs can really work well for any FDW to do
COPY tuple-routing? I know the original version of the patch showed the
existing APIs would work well for postgres_fdw but nothing beyond. For
example: the original version made a dummy Query/Plan only containing a
leaf partition as its result relation, and passed it to
PlanForeignModify, IIRC. That would work well for postgres_fdw because
it doesn't look at the Query/Plan except the result relation, but might
not for other FDWs that look at more stuff of the given Query/Plan and
do something based on that information. Some FDW might check to see
whether the Plan is to do INSERT .. VALUES with a single VALUES sublist
or INSERT .. VALUES with multiple VALUES sublists, for optimizing remote
INSERT, for example.
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/08/17 17:27, Etsuro Fujita wrote:
On 2017/07/11 6:56, Robert Haas wrote:
I have to admit that I'm a little bit fuzzy about why foreign insert
routing requires all of these changes. I think this patch would
benefit from being accompanied by several paragraphs of explanation
outlining the rationale for each part of the patch.Will do.
Here is an updated version of the patch.
* Query planning: the patch creates copies of Query/Plan with a foreign
partition as target from the original Query/Plan for each foreign
partition and invokes PlanForeignModify with those copies, to allow the
FDW to do query planning for remote INSERT with the existing API. To
make such Queries the similar way inheritance_planner does, I modified
transformInsertStmt so that the inh flag for the partitioned table's RTE
is set to true, which allows (1) expand_inherited_rtentry to build an
RTE and AppendRelInfo for each partition in the partitioned table and
(2) make_modifytable to build such Queries using adjust_appendrel_attrs
and those AppendRelInfos.
* explain.c: I modified show_modifytable_info so that we can show remote
queries for foreign partitions in EXPLAIN for INSERT into a partitioned
table the same way as for inherited UPDATE/DELETE cases. Here is an
example:
postgres=# explain verbose insert into pt values (1), (2);
QUERY PLAN
-------------------------------------------------------------------
Insert on public.pt (cost=0.00..0.03 rows=2 width=4)
Foreign Insert on public.fp1
Remote SQL: INSERT INTO public.t1(a) VALUES ($1)
Foreign Insert on public.fp2
Remote SQL: INSERT INTO public.t2(a) VALUES ($1)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4)
Output: "*VALUES*".column1
(7 rows)
I think I should add more explanation about the patch, but I don't have
time today, so I'll write additional explanation in the next email.
Sorry about that.
Best regards,
Etsuro Fujita
Attachments:
tuple-routing-to-foreign-partitions-v2.patchtext/plain; charset=UTF-8; name=tuple-routing-to-foreign-partitions-v2.patchDownload+762-275
Hi Fujita-san!
On 11.09.2017 16:01, Etsuro Fujita wrote:
Here is an updated version of the patch.
* Query planning: the patch creates copies of Query/Plan with a
foreign partition as target from the original Query/Plan for each
foreign partition and invokes PlanForeignModify with those copies, to
allow the FDW to do query planning for remote INSERT with the existing
API. To make such Queries the similar way inheritance_planner does, I
modified transformInsertStmt so that the inh flag for the partitioned
table's RTE is set to true, which allows (1) expand_inherited_rtentry
to build an RTE and AppendRelInfo for each partition in the
partitioned table and (2) make_modifytable to build such Queries using
adjust_appendrel_attrs and those AppendRelInfos.* explain.c: I modified show_modifytable_info so that we can show
remote queries for foreign partitions in EXPLAIN for INSERT into a
partitioned table the same way as for inherited UPDATE/DELETE cases.
Here is an example:postgres=# explain verbose insert into pt values (1), (2);
QUERY PLAN
-------------------------------------------------------------------
Insert on public.pt (cost=0.00..0.03 rows=2 width=4)
Foreign Insert on public.fp1
Remote SQL: INSERT INTO public.t1(a) VALUES ($1)
Foreign Insert on public.fp2
Remote SQL: INSERT INTO public.t2(a) VALUES ($1)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4)
Output: "*VALUES*".column1
(7 rows)I think I should add more explanation about the patch, but I don't
have time today, so I'll write additional explanation in the next
email. Sorry about that.
Could you update your patch, it isn't applied on the actual state of
master. Namely conflict in src/backend/executor/execMain.c
--
Regards,
Maksim Milyutin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Maksim,
On 2017/10/02 21:37, Maksim Milyutin wrote:
On 11.09.2017 16:01, Etsuro Fujita wrote:
* Query planning: the patch creates copies of Query/Plan with a
foreign partition as target from the original Query/Plan for each
foreign partition and invokes PlanForeignModify with those copies, to
allow the FDW to do query planning for remote INSERT with the existing
API. To make such Queries the similar way inheritance_planner does, I
modified transformInsertStmt so that the inh flag for the partitioned
table's RTE is set to true, which allows (1) expand_inherited_rtentry
to build an RTE and AppendRelInfo for each partition in the
partitioned table and (2) make_modifytable to build such Queries using
adjust_appendrel_attrs and those AppendRelInfos.* explain.c: I modified show_modifytable_info so that we can show
remote queries for foreign partitions in EXPLAIN for INSERT into a
partitioned table the same way as for inherited UPDATE/DELETE cases.
Here is an example:postgres=# explain verbose insert into pt values (1), (2);
QUERY PLAN
-------------------------------------------------------------------
Insert on public.pt (cost=0.00..0.03 rows=2 width=4)
Foreign Insert on public.fp1
Remote SQL: INSERT INTO public.t1(a) VALUES ($1)
Foreign Insert on public.fp2
Remote SQL: INSERT INTO public.t2(a) VALUES ($1)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4)
Output: "*VALUES*".column1
(7 rows)I think I should add more explanation about the patch, but I don't
have time today, so I'll write additional explanation in the next
email. Sorry about that.Could you update your patch, it isn't applied on the actual state of
master. Namely conflict in src/backend/executor/execMain.c
Attached is an updated version.
* As mentioned in "Query planning", the patch builds an RTE for each
partition so that the FDW can make reference to that RTE in eg,
PlanForeignModify. set_plan_references also uses such RTEs to record
plan dependencies for plancache.c to invalidate cached plans. See an
example for that added to the regression tests in postgres_fdw.
* As mentioned in "explain.c", the EXPLAIN shows all partitions beneath
the ModifyTable node. One merit of that is we can show remote queries
for foreign partitions in the output as shown above. Another one I can
think of is when reporting execution stats for triggers. Here is an
example for that:
postgres=# explain analyze insert into list_parted values (1, 'hi
there'), (2, 'hi there');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Insert on list_parted (cost=0.00..0.03 rows=2 width=36) (actual
time=0.375..0.375 rows=0 loops=1)
Insert on part1
Insert on part2
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=36)
(actual time=0.004..0.007 rows=2 loops=1)
Planning time: 0.089 ms
Trigger part1brtrig on part1: time=0.059 calls=1
Trigger part2brtrig on part2: time=0.021 calls=1
Execution time: 0.422 ms
(8 rows)
This would allow the user to understand easily that "part1" and "part2"
in the trigger lines are the partitions of list_parted. So, I think
it's useful to show partition info in the ModifyTable node even in the
case where a partitioned table only contains plain tables.
* The patch modifies make_modifytable and ExecInitModifyTable so that
the former can allow the FDW to construct private plan data for each
foreign partition and accumulate it all into a list, and that the latter
can perform BeginForeignModify for each partition using that plan data
stored in the list passed from make_modifytable. Other changes I made
to the executor are: (1) currently, we set the RT index for the root
partitioned table to ri_RangeTableIndex of partitions' ResultRelInfos,
but the proposed EXPLAIN requires that the partition's
ri_RangeTableIndex is set to the RT index for that partition's RTE, to
show that partition info in the output. So, I made that change.
Because of that, ExecPartitionCheck, ExecConstraints, and
ExecWithCheckOptions are adjusted accordingly. (2) I added a new member
to ResultRelInfo (ie, ri_PartitionIsValid), and modified
CheckValidResultRel so that it checks a given partition without throwing
an error and save the result in that flag so that ExecInsert determines
using that flag whether a partition chosen by ExecFindPartition is valid
for tuple-routing as proposed before.
* copy.c: I still don't think it's a good idea to implement COPY
tuple-routing for foreign partitions using PlanForeignModify. (I plan
to propose new FDW APIs for bulkload as discussed before, to implement
this feature.) So, I kept that as-is. Two things I changed there are:
(1) currently, ExecSetupPartitionTupleRouting verifies partitions using
CheckValidResultRel, but I don't think we need the CheckValidResultRel
check in the COPY case. So, I refactored that function and checked
partitions directly. (2) I think it'd be better to distinguish the
error message "cannot route inserted tuples to a foreign partition" in
the COPY case from the INSERT case, I changed it to "cannot route copied
tuples to a foreign partition".
* Fixed some bugs, revised comments, added a bit more regression tests,
and rebased the patch.
Comments are welcome!
My apologies for the very late reply.
Best regards,
Etsuro Fujita
Attachments:
tuple-routing-to-foreign-partitions-v3.patchtext/plain; charset=UTF-8; name=tuple-routing-to-foreign-partitions-v3.patchDownload+924-329
On 2017/10/26 16:40, Etsuro Fujita wrote:
Other changes I made
to the executor are: (1) currently, we set the RT index for the root
partitioned table to ri_RangeTableIndex of partitions' ResultRelInfos,
but the proposed EXPLAIN requires that the partition's
ri_RangeTableIndex is set to the RT index for that partition's RTE, to
show that partition info in the output. So, I made that change.
One thing I forgot to mention is: that would be also required to call
BeginForeignModify, ExecForeignInsert, and EndForeignModify with the
partition's ResultRelInfo.
I updated docs in doc/src/sgml/ddl.sgml the same way as [1]/messages/by-id/b19a8e2b-e000-f592-3e0b-3e90ba0fa816@lab.ntt.co.jp. (I used
only the ddl.sgml change proposed by [1]/messages/by-id/b19a8e2b-e000-f592-3e0b-3e90ba0fa816@lab.ntt.co.jp, not all the changes.) I did
some cleanup as well. Please find attached an updated version of the patch.
Best regards,
Etsuro Fujita
[1]: /messages/by-id/b19a8e2b-e000-f592-3e0b-3e90ba0fa816@lab.ntt.co.jp
/messages/by-id/b19a8e2b-e000-f592-3e0b-3e90ba0fa816@lab.ntt.co.jp