Add support for tuple routing to foreign partitions

Started by Etsuro Fujitaalmost 9 years ago69 messageshackers

fujita.etsuro@lab.ntt.co.jp

almost 9 years ago

Hi,

Here is a patch for $subject. This is based on Amit's original patch
(patch '0007-Tuple-routing-for-partitioned-tables-15.patch' in [1]/messages/by-id/aa874eaf-cd3b-0f75-c647-6c0ea823781e@lab.ntt.co.jp).
Main changes are:

* I like Amit's idea that for each partition that is a foreign table,
the FDW performs query planning in the same way as in non-partition
cases, but the changes to make_modifytable() in the original patch that
creates a working-copy of Query to pass to PlanForeignModify() seem
unacceptable. So, I modified the changes so that we create
more-valid-looking copies of Query and ModifyTable with the foreign
partition as target, as I said before. In relation to this, I also
allowed expand_inherited_rtentry() to build an RTE and AppendRelInfo for
each partition of a partitioned table that is an INSERT target, as
mentioned by Amit in [1]/messages/by-id/aa874eaf-cd3b-0f75-c647-6c0ea823781e@lab.ntt.co.jp, by modifying transformInsertStmt() so that we
set the inh flag for the target table's RTE to true when the target
table is a partitioned table. The partition RTEs are not only
referenced by FDWs, but used in selecting relation aliases for EXPLAIN
(see below) as well as extracting plan dependencies in setref.c so that
we force rebuilding of the plan on any change to partitions. (We do
replanning on FDW table-option changes to foreign partitions, currently.
See regression tests in the attached patch.)

* The original patch added tuple routing to foreign partitions in COPY
FROM, but I'm not sure the changes to BeginCopy() in the patch are the
right way to go. For example, the patch passes a dummy empty Query to
PlanForeignModify() to make FDWs perform query planning, but I think
FDWs would be surprised by the Query. Currently, we don't support COPY
to foreign tables, so ISTM that it's better to revisit this issue when
adding support for COPY to foreign tables. So, I dropped the COPY part.

* I modified explain.c so that EXPLAIN for an INSERT into a partitioned
table displays partitions just below the ModifyTable node, and shows
remote queries for foreign partitions in the same way as EXPLAIN for
UPDATE/DELETE cases. Here is an example:

postgres=# explain verbose insert into pt values (1), (2);
QUERY PLAN
-------------------------------------------------------------------
Insert on public.pt (cost=0.00..0.03 rows=2 width=4)
Foreign Insert on public.fp1
Remote SQL: INSERT INTO public.t1(a) VALUES ($1)
Foreign Insert on public.fp2
Remote SQL: INSERT INTO public.t2(a) VALUES ($1)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4)
Output: "*VALUES*".column1
(7 rows)

Comments are welcome! Anyway, I'll add this to the next commitfest.

Best regards,
Etsuro Fujita

[1]: /messages/by-id/aa874eaf-cd3b-0f75-c647-6c0ea823781e@lab.ntt.co.jp
/messages/by-id/aa874eaf-cd3b-0f75-c647-6c0ea823781e@lab.ntt.co.jp

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

almost 9 years ago

In reply to: Etsuro Fujita (#1)

Re: Add support for tuple routing to foreign partitions

Fujita-san,

On 2017/06/29 20:20, Etsuro Fujita wrote:

Hi,

Here is a patch for $subject. This is based on Amit's original patch
(patch '0007-Tuple-routing-for-partitioned-tables-15.patch' in [1]).

Thanks a lot for taking this up.

Main changes are:

* I like Amit's idea that for each partition that is a foreign table, the
FDW performs query planning in the same way as in non-partition cases, but
the changes to make_modifytable() in the original patch that creates a
working-copy of Query to pass to PlanForeignModify() seem unacceptable.
So, I modified the changes so that we create more-valid-looking copies of
Query and ModifyTable with the foreign partition as target, as I said
before.

This sounds good.

In relation to this, I also allowed expand_inherited_rtentry() to
build an RTE and AppendRelInfo for each partition of a partitioned table
that is an INSERT target, as mentioned by Amit in [1], by modifying
transformInsertStmt() so that we set the inh flag for the target table's
RTE to true when the target table is a partitioned table.

About this part, Robert sounded a little unsure back then [1]/messages/by-id/CA+TgmoYvp6DD1jpwb9sNe08E7jSFEFky32TJU+sJ8OStHYW3nA@mail.gmail.com; his words:

"Doing more inheritance expansion early is probably not a good idea
because it likely sucks for performance, and that's especially unfortunate
if it's only needed for foreign tables."

But...

The partition
RTEs are not only referenced by FDWs, but used in selecting relation
aliases for EXPLAIN (see below) as well as extracting plan dependencies in
setref.c so that we force rebuilding of the plan on any change to
partitions. (We do replanning on FDW table-option changes to foreign
partitions, currently. See regression tests in the attached patch.)

...this part means we cannot really avoid locking at least the foreign
partitions during the planning stage, which we currently don't, as we
don't look at partitions at all before ExecSetupPartitionTupleRouting() is
called by ExecInitModifyTable().

Since we are locking *all* the partitions anyway, it may be better to
shift the cost of locking to the planner by doing the inheritance
expansion even in the insert case (for partitioned tables) and avoid
calling the lock manager again in the executor when setting up
tuple-routing data structures (that is, in ExecSetupPartitionTupleRouting).

An idea that Robert recently mentioned on the nearby "UPDATE partition
key" thread [2]/messages/by-id/CA+TgmoajC0J50=2FqnZLvB10roY+68HgFWhso=V_StkC6PWujQ@mail.gmail.com seems relevant in this regard. By that idea,
expand_inherited_rtentry(), instead of reading in the partition OIDs in
the old catalog order, will instead call
RelationBuildPartitionDispatchInfo(), which locks the partitions and
returns their OIDs in the canonical order. If we store the foreign
partition RT indexes in that order in ModifyTable.partition_rels
introduced by this patch, then the following code added by the patch could
be made more efficient (as also explained in aforementioned Robert's email):

+        foreach(l, node->partition_rels)
+        {
+            Index       rti = lfirst_int(l);
+            Oid         relid = getrelid(rti, estate->es_range_table);
+            bool        found;
+            int         j;
+
+            /* First, find the ResultRelInfo for the partition */
+            found = false;
+            for (j = 0; j < mtstate->mt_num_partitions; j++)
+            {
+                resultRelInfo = partitions + j;
+
+                if (RelationGetRelid(resultRelInfo->ri_RelationDesc) ==
relid)
+                {
+                    Assert(mtstate->mt_partition_indexes[i] == 0);
+                    mtstate->mt_partition_indexes[i] = j;
+                    found = true;
+                    break;
+                }
+            }
+            if (!found)
+                elog(ERROR, "failed to find ResultRelInfo for rangetable
index %u", rti);

* The original patch added tuple routing to foreign partitions in COPY
FROM, but I'm not sure the changes to BeginCopy() in the patch are the
right way to go. For example, the patch passes a dummy empty Query to
PlanForeignModify() to make FDWs perform query planning, but I think FDWs
would be surprised by the Query. Currently, we don't support COPY to
foreign tables, so ISTM that it's better to revisit this issue when adding
support for COPY to foreign tables. So, I dropped the COPY part.

I agree.

Thanks,
Amit

[1]: /messages/by-id/CA+TgmoYvp6DD1jpwb9sNe08E7jSFEFky32TJU+sJ8OStHYW3nA@mail.gmail.com
/messages/by-id/CA+TgmoYvp6DD1jpwb9sNe08E7jSFEFky32TJU+sJ8OStHYW3nA@mail.gmail.com
[2]: /messages/by-id/CA+TgmoajC0J50=2FqnZLvB10roY+68HgFWhso=V_StkC6PWujQ@mail.gmail.com
/messages/by-id/CA+TgmoajC0J50=2FqnZLvB10roY+68HgFWhso=V_StkC6PWujQ@mail.gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Add support for tuple routing to foreign partitions

Attachments:

Attachments:

Attachments:

Attachments: