Push down more full joins in postgres_fdw
Hi,
The postgres_fdw join pushdown in 9.6 is great, but it can't handle full
joins on relations with restrictions. The reason for that is,
postgres_fdw can't support deparsing subqueries when creating a remote
join query. So, by adding the deparsing logic to it, I removed that
limitation. Attached is a patch for that, which is based on the patch I
posted before [1]/messages/by-id/5710D7E2.7010302@lab.ntt.co.jp.
Also, by the deparsing logic, I improved the handling of PHVs so that
PHVs are evaluated remotely if it's safe, as discussed in [2]/messages/by-id/b4549406-909f-7d15-dc34-499835a8f0b3@lab.ntt.co.jp. Here is
an example from the regression test, which pushes down multiple levels
of PHVs to the remote:
EXPLAIN (VERBOSE, COSTS OFF)
SELECT ss.*, ft2.c1 FROM ft2 LEFT JOIN (SELECT 13, q.a, ft2.c1 FROM
(SELECT 13 FROM ft1 WHERE c1 = 13) q(a) RIGHT JOIN ft2 ON (q.a = ft2.c1)
WHERE ft2.\
c1 BETWEEN 10 AND 15) ss(f1, f2, f3) ON (ft2.c1 = ss.f1) WHERE ft2.c1
BETWEEN 10 AND 15;
\
QUERY PLAN \
-------------------------------------------------------------------------------------------------------------------------------------------------------\
-------------------------------------------------------------------------------------------------------------------------------------------------------\
---------------------------------------------------------------
Foreign Scan
Output: (13), (13), ft2_1.c1, ft2.c1
Relations: (public.ft2) LEFT JOIN ((public.ft2) LEFT JOIN (public.ft1))
Remote SQL: SELECT r1."C 1", ss2.c1, ss2.c2, ss2.c3 FROM ("S 1"."T
1" r1 LEFT JOIN (SELECT r5."C 1", 13, ss1.c1 FROM ("S 1"."T 1" r5 LEFT
JOIN (SELE\
CT 13 FROM "S 1"."T 1" WHERE (("C 1" = 13))) ss1(c1) ON (((13 = r5."C
1")))) WHERE ((r5."C 1" >= 10)) AND ((r5."C 1" <= 15))) ss2(c1, c2, c3)
ON (((r1.\
"C 1" = 13)))) WHERE ((r1."C 1" >= 10)) AND ((r1."C 1" <= 15))
(4 rows)
Also, in the same way as the PHV handling, I improved the handling of
whole-row reference (and system columns other than ctid), as proposed in
[3]: /messages/by-id/a1fa1c4c-bf96-8ea5-cff5-85b927298e73@lab.ntt.co.jp
conditions any more, as shown in the below example:
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)
ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
\
QUERY PLAN \
-------------------------------------------------------------------------------------------------------------------------------------------------------\
-------------------------------------------------------------------------------------------------------------------------------------------------------\
-----------------------------------------------------------
Limit
Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
-> Foreign Scan
Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
Remote SQL: SELECT ss1.c1, ss1.c2, ss1.c3, ss1.c4, ss2.c1
FROM ((SELECT ctid, ROW("C 1", c2, c3, c4, c5, c6, c7, c8), "C 1", c3
FROM "S 1"."T \
1") ss1(c1, c2, c3, c4) INNER JOIN (SELECT ROW("C 1", c2, c3, c4, c5,
c6, c7, c8), "C 1" FROM "S 1"."T 1") ss2(c1, c2) ON (TRUE)) WHERE
((ss1.c3 = ss2.\
c2)) ORDER BY ss1.c4 ASC NULLS LAST, ss1.c3 ASC NULLS LAST
(6 rows)
I'll add this to the next CF. Comments are welcome!
Best regards,
Etsuro Fujita
[1]: /messages/by-id/5710D7E2.7010302@lab.ntt.co.jp
[2]: /messages/by-id/b4549406-909f-7d15-dc34-499835a8f0b3@lab.ntt.co.jp
/messages/by-id/b4549406-909f-7d15-dc34-499835a8f0b3@lab.ntt.co.jp
[3]: /messages/by-id/a1fa1c4c-bf96-8ea5-cff5-85b927298e73@lab.ntt.co.jp
/messages/by-id/a1fa1c4c-bf96-8ea5-cff5-85b927298e73@lab.ntt.co.jp
Attachments:
postgres-fdw-more-full-join-pushdown.patchbinary/octet-stream; name=postgres-fdw-more-full-join-pushdown.patchDownload+827-553
Thanks Fujita-san for working on this. I took a quick look at the patch.
Here are some quick comments.
1. deparsePlaceHolderVar looks odd - each of the deparse* function is named
as deparse + <name of the parser node the string would parse into>.
PlaceHolderVar is not a parser node, so no string is going to be parsed as
a PlaceHolderVar. May be you want to name it as
deparseExerFromPlaceholderVar or something like that.
2. You will need to check phlevelsup member while assessing whether a PHV
is safe to push down.
3. I think registerAlias stuff should happen really at the time of creating
paths and should be stored in fpinfo. Without that it will be computed
every time we deparse the query. This means every time we try to EXPLAIN
the query at various levels in the join tree and once for the query to be
sent to the foreign server.
4. The changes related to retrieved_attrs look unrelated to the patch.
Either your patch should use the current method of handling retrieved_attrs
or there should be a separate patch for retrieved_attrs changes. May be you
want to take a look at the discussion in join pushdown thread as to why we
assume retrieved_attrs to be non-NIL always.
5. The blocks related to inner and outer relations in
deparseFromExprForRel() look same. We should probably separate that code
out into a function and call it at two places.
6.
! if (is_placeholder)
! errcontext("placeholder expression at position %d in select list",
! errpos->cur_attno);
A user wouldn't know what a placeholder expression is, there is no such
term in the documentation. We have to device a better way to provide an
error context for an expression in general.
On Fri, Aug 19, 2016 at 2:50 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
wrote:
Hi,
The postgres_fdw join pushdown in 9.6 is great, but it can't handle full
joins on relations with restrictions. The reason for that is, postgres_fdw
can't support deparsing subqueries when creating a remote join query. So,
by adding the deparsing logic to it, I removed that limitation. Attached
is a patch for that, which is based on the patch I posted before [1].Also, by the deparsing logic, I improved the handling of PHVs so that PHVs
are evaluated remotely if it's safe, as discussed in [2]. Here is an
example from the regression test, which pushes down multiple levels of PHVs
to the remote:EXPLAIN (VERBOSE, COSTS OFF)
SELECT ss.*, ft2.c1 FROM ft2 LEFT JOIN (SELECT 13, q.a, ft2.c1 FROM
(SELECT 13 FROM ft1 WHERE c1 = 13) q(a) RIGHT JOIN ft2 ON (q.a = ft2.c1)
WHERE ft2.\
c1 BETWEEN 10 AND 15) ss(f1, f2, f3) ON (ft2.c1 = ss.f1) WHERE ft2.c1
BETWEEN 10 AND 15;
\
QUERY PLAN \------------------------------------------------------------
------------------------------------------------------------
-------------------------------\
------------------------------------------------------------
------------------------------------------------------------
-------------------------------\
---------------------------------------------------------------
Foreign Scan
Output: (13), (13), ft2_1.c1, ft2.c1
Relations: (public.ft2) LEFT JOIN ((public.ft2) LEFT JOIN (public.ft1))
Remote SQL: SELECT r1."C 1", ss2.c1, ss2.c2, ss2.c3 FROM ("S 1"."T 1"
r1 LEFT JOIN (SELECT r5."C 1", 13, ss1.c1 FROM ("S 1"."T 1" r5 LEFT JOIN
(SELE\
CT 13 FROM "S 1"."T 1" WHERE (("C 1" = 13))) ss1(c1) ON (((13 = r5."C
1")))) WHERE ((r5."C 1" >= 10)) AND ((r5."C 1" <= 15))) ss2(c1, c2, c3) ON
(((r1.\
"C 1" = 13)))) WHERE ((r1."C 1" >= 10)) AND ((r1."C 1" <= 15))
(4 rows)Also, in the same way as the PHV handling, I improved the handling of
whole-row reference (and system columns other than ctid), as proposed in
[3]. We don't need the ""CASE WHEN ... IS NOT NULL THEN ROW(...) END"
conditions any more, as shown in the below example:EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)
ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
\
QUERY PLAN \------------------------------------------------------------
------------------------------------------------------------
-------------------------------\
------------------------------------------------------------
------------------------------------------------------------
-------------------------------\
-----------------------------------------------------------
Limit
Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
-> Foreign Scan
Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
Remote SQL: SELECT ss1.c1, ss1.c2, ss1.c3, ss1.c4, ss2.c1 FROM
((SELECT ctid, ROW("C 1", c2, c3, c4, c5, c6, c7, c8), "C 1", c3 FROM "S
1"."T \
1") ss1(c1, c2, c3, c4) INNER JOIN (SELECT ROW("C 1", c2, c3, c4, c5, c6,
c7, c8), "C 1" FROM "S 1"."T 1") ss2(c1, c2) ON (TRUE)) WHERE ((ss1.c3 =
ss2.\
c2)) ORDER BY ss1.c4 ASC NULLS LAST, ss1.c3 ASC NULLS LAST
(6 rows)I'll add this to the next CF. Comments are welcome!
Best regards,
Etsuro Fujita[1] /messages/by-id/5710D7E2.7010302@lab.ntt.co.jp
[2] /messages/by-id/b4549406-909f-7d15-dc3
4-499835a8f0b3%40lab.ntt.co.jp
[3] /messages/by-id/a1fa1c4c-bf96-8ea5-cff
5-85b927298e73%40lab.ntt.co.jp--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Hi Ashutosh,
On 2016/08/22 15:49, Ashutosh Bapat wrote:
1. deparsePlaceHolderVar looks odd - each of the deparse* function is
named as deparse + <name of the parser node the string would parse
into>. PlaceHolderVar is not a parser node, so no string is going to be
parsed as a PlaceHolderVar. May be you want to name it as
deparseExerFromPlaceholderVar or something like that.
The name "deparseExerFromPlaceholderVar" seems long to me. How about
"deparsePlaceHolderExpr"?
2. You will need to check phlevelsup member while assessing whether a
PHV is safe to push down.
Good catch! In addition to that, I added another check that the eval_at
set for the PHV should be included in the relids set for the foreign
relation. I think that would make the shippability check more robust.
3. I think registerAlias stuff should happen really at the time of
creating paths and should be stored in fpinfo. Without that it will be
computed every time we deparse the query. This means every time we try
to EXPLAIN the query at various levels in the join tree and once for the
query to be sent to the foreign server.
Hmm. I think the overhead in calculating aliases would be negligible
compared to the overhead in explaining each remote query for costing or
sending the remote query for execution. So, I created aliases in the
same way as remote params created and stored into params_list in
deparse_expr_cxt. I'm not sure it's worth complicating the code.
4. The changes related to retrieved_attrs look unrelated to the patch.
Either your patch should use the current method of handling
retrieved_attrs or there should be a separate patch for retrieved_attrs
changes. May be you want to take a look at the discussion in join
pushdown thread as to why we assume retrieved_attrs to be non-NIL always.
OK, I removed those changes from the patch.
5. The blocks related to inner and outer relations in
deparseFromExprForRel() look same. We should probably separate that code
out into a function and call it at two places.
Done.
6.
! if (is_placeholder)
! errcontext("placeholder expression at position %d in select list",
! errpos->cur_attno);
A user wouldn't know what a placeholder expression is, there is no such
term in the documentation. We have to device a better way to provide an
error context for an expression in general.
Though I proposed that, I don't think that it's that important to let
users know that the expression is from a PHV. How about just saying
"expression", not "placeholder expression", so that we have the message
"expression at position %d in select list" in the context?
Attached is an updated version of the patch.
Other changes:
* Add a bit more regression test
* Revise code/comments
* Cleanups
Thanks for the comments!
Best regards,
Etsuro Fujita
Attachments:
postgres-fdw-more-full-join-pushdown-v2.patchbinary/octet-stream; name=postgres-fdw-more-full-join-pushdown-v2.patchDownload+881-555
On Fri, Sep 2, 2016 at 3:55 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
wrote:
Hi Ashutosh,
On 2016/08/22 15:49, Ashutosh Bapat wrote:
1. deparsePlaceHolderVar looks odd - each of the deparse* function is
named as deparse + <name of the parser node the string would parse
into>. PlaceHolderVar is not a parser node, so no string is going to be
parsed as a PlaceHolderVar. May be you want to name it as
deparseExerFromPlaceholderVar or something like that.The name "deparseExerFromPlaceholderVar" seems long to me. How about
"deparsePlaceHolderExpr"?
There isn't any node with name PlaceHolderExpr.
2. You will need to check phlevelsup member while assessing whether a
PHV is safe to push down.
Good catch! In addition to that, I added another check that the eval_at
set for the PHV should be included in the relids set for the foreign
relation. I think that would make the shippability check more robust.
Thanks.
3. I think registerAlias stuff should happen really at the time of
creating paths and should be stored in fpinfo. Without that it will be
computed every time we deparse the query. This means every time we try
to EXPLAIN the query at various levels in the join tree and once for the
query to be sent to the foreign server.Hmm. I think the overhead in calculating aliases would be negligible
compared to the overhead in explaining each remote query for costing or
sending the remote query for execution. So, I created aliases in the same
way as remote params created and stored into params_list in
deparse_expr_cxt. I'm not sure it's worth complicating the code.
We defer remote parameter creation till deparse time since the the
parameter numbers are dependent upon the sequence in which we deparse the
query. Creating them at the time of path creation and storing them in
fpinfo is not possible because a. those present in the joining relations
will conflict with each other and need some kind of serialization at the
time of deparsing b. those defer for differently parameterized paths or
paths with different pathkeys. We don't defer it because it's very light on
performance.
That's not true with the alias information. As long as we detect which
relations need subqueries, their RTIs are enough to create unique aliases
e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery can have
alias r123 without conflicting with any other alias.
However minimum overhead it might have, we will deparse the query every
time we create a path involving that kind of relation i.e. for different
pathkeys, different parameterization and different joins in which that
relation participates in. This number can be significant. We want to avoid
this overhead if we can.
5. The blocks related to inner and outer relations in
deparseFromExprForRel() look same. We should probably separate that code
out into a function and call it at two places.Done.
Thanks. I see you have created function deparseOperandRelation() for the
same. I guess, this should be renamed as deparseRangeTblRef() since it will
create RangeTblRef node when parsed back.
6.
! if (is_placeholder)
! errcontext("placeholder expression at position %d in select
list",
! errpos->cur_attno);
A user wouldn't know what a placeholder expression is, there is no such
term in the documentation. We have to device a better way to provide an
error context for an expression in general.Though I proposed that, I don't think that it's that important to let
users know that the expression is from a PHV. How about just saying
"expression", not "placeholder expression", so that we have the message
"expression at position %d in select list" in the context?Hmm, that's better than placeholder expression, but not as explanatory as
it should be since we won't be printing the "select list" in the error
context and user won't have a clue as to what error context is about. But I
don't have any good suggestions right now. May be we should print the whole
expression? But that can be very long, I don't know.
This patch tries to do two things at a time 1. support join pushdown for
FULL join when the joining relations have remote conditions 2. better
support for fetching placeholder vars, whole row references and some system
columns. To make reviews easy, I think we should split the patch into two
1. supporting subqueries to be deparsed with support for one of the above
(I will suggest FULL join support) 2. support for the other.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On 2016/09/06 22:07, Ashutosh Bapat wrote:
On Fri, Sep 2, 2016 at 3:55 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
On 2016/08/22 15:49, Ashutosh Bapat wrote:
1. deparsePlaceHolderVar looks odd - each of the deparse*
function is
named as deparse + <name of the parser node the string would parse
into>. PlaceHolderVar is not a parser node, so no string is
going to be
parsed as a PlaceHolderVar. May be you want to name it as
deparseExerFromPlaceholderVar or something like that.
The name "deparseExerFromPlaceholderVar" seems long to me. How
about "deparsePlaceHolderExpr"?
There isn't any node with name PlaceHolderExpr.
I'll rename it to "deparseExerInPlaceholderVar", then.
3. I think registerAlias stuff should happen really at the time of
creating paths and should be stored in fpinfo. Without that it
will be
computed every time we deparse the query. This means every time
we try
to EXPLAIN the query at various levels in the join tree and once
for the
query to be sent to the foreign server.
Hmm. I think the overhead in calculating aliases would be
negligible compared to the overhead in explaining each remote query
for costing or sending the remote query for execution. So, I
created aliases in the same way as remote params created and stored
into params_list in deparse_expr_cxt. I'm not sure it's worth
complicating the code.
We defer remote parameter creation till deparse time since the the
parameter numbers are dependent upon the sequence in which we deparse
the query. Creating them at the time of path creation and storing them
in fpinfo is not possible because a. those present in the joining
relations will conflict with each other and need some kind of
serialization at the time of deparsing b. those defer for differently
parameterized paths or paths with different pathkeys. We don't defer it
because it's very light on performance.That's not true with the alias information. As long as we detect which
relations need subqueries, their RTIs are enough to create unique
aliases e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery
can have alias r123 without conflicting with any other alias.
Hmm. But another concern about the approach of generating an subselect
alias for a path, if needed, at the path creation time would be that the
path might be rejected by add_path, which would result in totally
wasting cycles for generating the subselect alias for the path.
However minimum overhead it might have, we will deparse the query every
time we create a path involving that kind of relation i.e. for different
pathkeys, different parameterization and different joins in which that
relation participates in. This number can be significant. We want to
avoid this overhead if we can.
Exactly. As I said above, I think the overhead would be negligible
compared to the overhead in explaining each remote query for costing or
the overhead in sending the final remote query for execution, though.
5. The blocks related to inner and outer relations in
deparseFromExprForRel() look same. We should probably separate
that code
out into a function and call it at two places.
Done.
I see you have created function deparseOperandRelation() for the
same. I guess, this should be renamed as deparseRangeTblRef() since it
will create RangeTblRef node when parsed back.
OK, if there no opinions of others, I'll rename it to the name proposed
by you, "deparseRangeTblRef".
6.
! if (is_placeholder)
! errcontext("placeholder expression at position %d in
select list",
! errpos->cur_attno);
A user wouldn't know what a placeholder expression is, there is
no such
term in the documentation. We have to device a better way to
provide an
error context for an expression in general.
Though I proposed that, I don't think that it's that important to
let users know that the expression is from a PHV. How about just
saying "expression", not "placeholder expression", so that we have
the message "expression at position %d in select list" in the context?
Hmm, that's better than placeholder expression, but not as explanatory
as it should be since we won't be printing the "select list" in the
error context and user won't have a clue as to what error context is
about.
I don't think so. Consider an example of the conversion error message,
which is from the regression test:
SELECT ft1.c1, ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND
ft1.c1 = 1;
ERROR: invalid input syntax for integer: "foo"
CONTEXT: whole-row reference to foreign table "ft1"
As shown in the example, the error message is displayed under a remote
query for execution. So, ISTM it's reasonable to print something like
"expression at position %d in select list" in the context if an
expression in a PHV.
But I don't have any good suggestions right now. May be we should
print the whole expression? But that can be very long, I don't know.
ISTM that it's a bit too expensive to print the whole expression in the
error context.
This patch tries to do two things at a time 1. support join pushdown for
FULL join when the joining relations have remote conditions 2. better
support for fetching placeholder vars, whole row references and some
system columns. To make reviews easy, I think we should split the patch
into two 1. supporting subqueries to be deparsed with support for one of
the above (I will suggest FULL join support) 2. support for the other.
OK, will try.
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016/09/08 19:51, Etsuro Fujita wrote:
On 2016/09/06 22:07, Ashutosh Bapat wrote:
This patch tries to do two things at a time 1. support join pushdown for
FULL join when the joining relations have remote conditions 2. better
support for fetching placeholder vars, whole row references and some
system columns. To make reviews easy, I think we should split the patch
into two 1. supporting subqueries to be deparsed with support for one of
the above (I will suggest FULL join support) 2. support for the other.
OK, will try.
I extracted #1 from the patch. Attached is a patch for that. If that
patch is reasonable, I'll create a patch for #2 on top of it.
Comments welcome!
Best regards,
Etsuro Fujita
Attachments:
postgres-fdw-more-full-join-pushdown-v3.patchbinary/octet-stream; name=postgres-fdw-more-full-join-pushdown-v3.patchDownload+511-270
On 2016/09/09 21:35, Etsuro Fujita wrote:
On 2016/09/08 19:51, Etsuro Fujita wrote:
On 2016/09/06 22:07, Ashutosh Bapat wrote:
This patch tries to do two things at a time 1. support join pushdown for
FULL join when the joining relations have remote conditions 2. better
support for fetching placeholder vars, whole row references and some
system columns. To make reviews easy, I think we should split the patch
into two 1. supporting subqueries to be deparsed with support for one of
the above (I will suggest FULL join support) 2. support for the other.
OK, will try.
I extracted #1 from the patch. Attached is a patch for that. If that
patch is reasonable, I'll create a patch for #2 on top of it.
Attached is a patch for #2. In that patch I fixed some bugs and added a
bit more comments. For testing, please apply the patch for #1 first.
Best regards,
Etsuro Fujita
Attachments:
postgres-fdw-phv-pushdown-v1.patchbinary/octet-stream; name=postgres-fdw-phv-pushdown-v1.patchDownload+461-357
3. I think registerAlias stuff should happen really at the time of
creating paths and should be stored in fpinfo. Without that it
will be
computed every time we deparse the query. This means every time
we try
to EXPLAIN the query at various levels in the join tree and once
for the
query to be sent to the foreign server.Hmm. I think the overhead in calculating aliases would be
negligible compared to the overhead in explaining each remote query
for costing or sending the remote query for execution. So, I
created aliases in the same way as remote params created and stored
into params_list in deparse_expr_cxt. I'm not sure it's worth
complicating the code.We defer remote parameter creation till deparse time since the the
parameter numbers are dependent upon the sequence in which we deparse
the query. Creating them at the time of path creation and storing them
in fpinfo is not possible because a. those present in the joining
relations will conflict with each other and need some kind of
serialization at the time of deparsing b. those defer for differently
parameterized paths or paths with different pathkeys. We don't defer it
because it's very light on performance.That's not true with the alias information. As long as we detect which
relations need subqueries, their RTIs are enough to create unique
aliases e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery
can have alias r123 without conflicting with any other alias.Hmm. But another concern about the approach of generating an subselect
alias for a path, if needed, at the path creation time would be that the
path might be rejected by add_path, which would result in totally wasting
cycles for generating the subselect alias for the path.
A path may get rejected but the relation is not rejected. The alias applies
to a relation and its targetlist, which will remain same for all paths
created for that relation, esp when it's a base relation or join. So, AFAIU
that work never gets wasted. Also, for costing paths with
use_remote_estimate, we deparse the query, which builds the alias
information again and again for very path. That's much worse than building
it once for a given relation.
However minimum overhead it might have, we will deparse the query every
time we create a path involving that kind of relation i.e. for different
pathkeys, different parameterization and different joins in which that
relation participates in. This number can be significant. We want to
avoid this overhead if we can.Exactly. As I said above, I think the overhead would be negligible
compared to the overhead in explaining each remote query for costing or the
overhead in sending the final remote query for execution, though.
It won't remain minimal as the number of paths created increases,
increasing the number of times a query is deparsed. We deparse query every
time time we cost a path for a relation with use_remote_estimates true. As
we try to push down more and more stuff, we will create more paths and
deparse the query more time.
Also, that makes the interface better. Right now, in your patch, you have
changed the order of deparsing in the existing code, so that the aliases
are registered while deparsing FROM clause and before any Var nodes are
deparsed. If we create aliases at the time of path creation, only once in
GetForeignJoinPaths or GetForeignPaths as appropriate, that would require
less code churn and would save some CPU cycles as well.
6.
! if (is_placeholder)
! errcontext("placeholder expression at position %d in
select list",
! errpos->cur_attno);
A user wouldn't know what a placeholder expression is, there is
no such
term in the documentation. We have to device a better way to
provide an
error context for an expression in general.Though I proposed that, I don't think that it's that important to
let users know that the expression is from a PHV. How about just
saying "expression", not "placeholder expression", so that we have
the message "expression at position %d in select list" in the context?Hmm, that's better than placeholder expression, but not as explanatory
as it should be since we won't be printing the "select list" in the
error context and user won't have a clue as to what error context is
about.I don't think so. Consider an example of the conversion error message,
which is from the regression test:SELECT ft1.c1, ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND
ft1.c1 = 1;
ERROR: invalid input syntax for integer: "foo"
CONTEXT: whole-row reference to foreign table "ft1"As shown in the example, the error message is displayed under a remote
query for execution. So, ISTM it's reasonable to print something like
"expression at position %d in select list" in the context if an expression
in a PHV.
I missed it. Sorry. Looks ok.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On Tue, Sep 6, 2016 at 9:07 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
That's not true with the alias information. As long as we detect which
relations need subqueries, their RTIs are enough to create unique aliases
e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery can have
alias r123 without conflicting with any other alias.
What if RTI 123 is also used in the query?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 13, 2016 at 10:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 6, 2016 at 9:07 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:That's not true with the alias information. As long as we detect which
relations need subqueries, their RTIs are enough to create unique aliases
e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery can have
alias r123 without conflicting with any other alias.What if RTI 123 is also used in the query?
Good catch. I actually meant some combination of 1, 2 and 3, which is
unique for a join between r1, r2 and r3. How about r1_2_3 or
r1_r2_r3?
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 13, 2016 at 11:38 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
On Tue, Sep 13, 2016 at 10:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 6, 2016 at 9:07 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:That's not true with the alias information. As long as we detect which
relations need subqueries, their RTIs are enough to create unique aliases
e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery can have
alias r123 without conflicting with any other alias.What if RTI 123 is also used in the query?
Good catch. I actually meant some combination of 1, 2 and 3, which is
unique for a join between r1, r2 and r3. How about r1_2_3 or
r1_r2_r3?
Sure, something like that can work, but if you have a big enough join
the identifier might get too long. I'm not sure why it wouldn't work
to just use the lowest RTI involved in the join, though; the others
won't appear anywhere else at that query level.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 14, 2016 at 8:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 13, 2016 at 11:38 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:On Tue, Sep 13, 2016 at 10:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 6, 2016 at 9:07 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:That's not true with the alias information. As long as we detect which
relations need subqueries, their RTIs are enough to create unique aliases
e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery can have
alias r123 without conflicting with any other alias.What if RTI 123 is also used in the query?
Good catch. I actually meant some combination of 1, 2 and 3, which is
unique for a join between r1, r2 and r3. How about r1_2_3 or
r1_r2_r3?Sure, something like that can work, but if you have a big enough join
the identifier might get too long. I'm not sure why it wouldn't work
to just use the lowest RTI involved in the join, though; the others
won't appear anywhere else at that query level.
Yes, that will work too and is much more preferred than long r1_2_3.
The idea is to come with a unique alias name from RTIs involved in
that relation. Thinking loudly, r1_2_3 is more descriptive to debug
issues. It tells that the subquery it refers to covers RTIs 1, 2 and
3. That information may be quite helpful to understand the remote SQL.
r1 on the other hand can refer to relation with RTI 1 or a join
relation where least RTI is 1. That can be solved by using s<RTI> for
subquery and r<RTI> for a bare relation. But it still doesn't tell us
which all relations are involved.
But since the subquery is part of remote SQL and we have to take a
look at it to find out meaning of individual columns, that benefit may
be smaller compared to the convenience of smaller alias. So +1 for
using the smallest RTI to indicate a subquery.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016/09/13 14:17, Ashutosh Bapat wrote:
But another concern about the approach of generating an
subselect alias for a path, if needed, at the path creation time
would be that the path might be rejected by add_path, which would
result in totally wasting cycles for generating the subselect alias
for the path.
A path may get rejected but the relation is not rejected. The alias
applies to a relation and its targetlist, which will remain same for all
paths created for that relation, esp when it's a base relation or join.
So, AFAIU that work never gets wasted.
I think there is no guarantee that add_path won't reject foreign join
paths. The possibility would be very low, though.
However minimum overhead it might have, we will deparse the
query every
time we create a path involving that kind of relation i.e. for
different
pathkeys, different parameterization and different joins in
which that
relation participates in. This number can be significant. We want to
avoid this overhead if we can.
Exactly. As I said above, I think the overhead would be negligible
compared to the overhead in explaining each remote query for costing
or the overhead in sending the final remote query for execution, though.
It won't remain minimal as the number of paths created increases,
increasing the number of times a query is deparsed. We deparse query
every time time we cost a path for a relation with use_remote_estimates
true. As we try to push down more and more stuff, we will create more
paths and deparse the query more time.
Also, that makes the interface better. Right now, in your patch, you
have changed the order of deparsing in the existing code, so that the
aliases are registered while deparsing FROM clause and before any Var
nodes are deparsed. If we create aliases at the time of path creation,
only once in GetForeignJoinPaths or GetForeignPaths as appropriate, that
would require less code churn and would save some CPU cycles as well.
Agreed. Will fix.
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016/09/15 15:29, Ashutosh Bapat wrote:
On Wed, Sep 14, 2016 at 8:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I'm not sure why it wouldn't work
to just use the lowest RTI involved in the join, though; the others
won't appear anywhere else at that query level.
So +1 for
using the smallest RTI to indicate a subquery.
+1 for the general idea.
ISTM that the use of the same RTI for subqueries in multi-levels in a
remote SQL makes the SQL a bit difficult to read. How about using the
position of the join rel in join_rel_list, (more precisely, the position
plus list_length(root->parse->rtable)), instead?
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
On 2016/09/15 15:29, Ashutosh Bapat wrote:
On Wed, Sep 14, 2016 at 8:52 PM, Robert Haas <robertmhaas@gmail.com>
wrote:I'm not sure why it wouldn't work
to just use the lowest RTI involved in the join, though; the others
won't appear anywhere else at that query level.So +1 for
using the smallest RTI to indicate a subquery.+1 for the general idea.
ISTM that the use of the same RTI for subqueries in multi-levels in a remote
SQL makes the SQL a bit difficult to read. How about using the position of
the join rel in join_rel_list, (more precisely, the position plus
list_length(root->parse->rtable)), instead?
We switch to hash table to maintain the join RelOptInfos when the
number of joins grows larger, where the position won't make much
sense. We might differentiate between a base relation alias and
subquery alias by using different prefixes like "r" and "s" resp.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016/09/26 18:06, Ashutosh Bapat wrote:
On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
ISTM that the use of the same RTI for subqueries in multi-levels in a remote
SQL makes the SQL a bit difficult to read. How about using the position of
the join rel in join_rel_list, (more precisely, the position plus
list_length(root->parse->rtable)), instead?
We switch to hash table to maintain the join RelOptInfos when the
number of joins grows larger, where the position won't make much
sense.
That's right, but we still store the joinrel into join_rel_list after
creating that hash table. That hash table is just for fast lookup. See
build_join_rel.
We might differentiate between a base relation alias and
subquery alias by using different prefixes like "r" and "s" resp.
Hmm. My concern about that would the recursive use of "s" with the same
RTI as subquery aliases for different level subqueries in a single
remote SQL. For example, if those subqueries include a base rel with
RTI=1, then "s1" would be used again and again within that SQL. That
would be logically correct, but seems confusing to me.
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 26, 2016 at 4:06 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
On 2016/09/26 18:06, Ashutosh Bapat wrote:
On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:ISTM that the use of the same RTI for subqueries in multi-levels in a
remote
SQL makes the SQL a bit difficult to read. How about using the position
of
the join rel in join_rel_list, (more precisely, the position plus
list_length(root->parse->rtable)), instead?We switch to hash table to maintain the join RelOptInfos when the
number of joins grows larger, where the position won't make much
sense.That's right, but we still store the joinrel into join_rel_list after
creating that hash table. That hash table is just for fast lookup. See
build_join_rel.
Sorry, I wasn't clear in my reply. As the list grows, it will take
longer to locate the RelOptInfo of the given join. Doing that for
creating an alias seems an overkill.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016/09/26 20:20, Ashutosh Bapat wrote:
On Mon, Sep 26, 2016 at 4:06 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:On 2016/09/26 18:06, Ashutosh Bapat wrote:
On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
ISTM that the use of the same RTI for subqueries in multi-levels in a
remote
SQL makes the SQL a bit difficult to read. How about using the position
of
the join rel in join_rel_list, (more precisely, the position plus
list_length(root->parse->rtable)), instead?
We switch to hash table to maintain the join RelOptInfos when the
number of joins grows larger, where the position won't make much
sense.
That's right, but we still store the joinrel into join_rel_list after
creating that hash table.
As the list grows, it will take
longer to locate the RelOptInfo of the given join. Doing that for
creating an alias seems an overkill.
The join rel is appended to the end of the list, so I was thinking to
get the position info by list_length during postgresGetForeignJoinPaths.
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 27, 2016 at 8:48 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
On 2016/09/26 20:20, Ashutosh Bapat wrote:
On Mon, Sep 26, 2016 at 4:06 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:On 2016/09/26 18:06, Ashutosh Bapat wrote:
On Mon, Sep 26, 2016 at 1:05 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:ISTM that the use of the same RTI for subqueries in multi-levels in a
remote
SQL makes the SQL a bit difficult to read. How about using the
position
of
the join rel in join_rel_list, (more precisely, the position plus
list_length(root->parse->rtable)), instead?We switch to hash table to maintain the join RelOptInfos when the
number of joins grows larger, where the position won't make much
sense.That's right, but we still store the joinrel into join_rel_list after
creating that hash table.As the list grows, it will take
longer to locate the RelOptInfo of the given join. Doing that for
creating an alias seems an overkill.The join rel is appended to the end of the list, so I was thinking to get
the position info by list_length during postgresGetForeignJoinPaths.
That's true only when the paths are being added to a newly created
joinrel. But that's not true always. We may add paths with different
joining order to an existing joinrel, in which case list_length would
not give its position. Am I missing something?
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016/09/27 13:33, Ashutosh Bapat wrote:
I wrote:
ISTM that the use of the same RTI for subqueries in multi-levels in a
remote
SQL makes the SQL a bit difficult to read. How about using the
position
of
the join rel in join_rel_list, (more precisely, the position plus
list_length(root->parse->rtable)), instead?
I wrote:
The join rel is appended to the end of the list, so I was thinking to get
the position info by list_length during postgresGetForeignJoinPaths.
That's true only when the paths are being added to a newly created
joinrel. But that's not true always. We may add paths with different
joining order to an existing joinrel, in which case list_length would
not give its position. Am I missing something?
I think you are right, but postgresGetForeignJoinPaths only allows us to
add a foreign join path to a newly created joinrel. The reason is
because that routine skips all its work after the first call for that
joinrel, by checking to see if joinrel->fdw_private is not NULL. So, I
think it's reasonable to get the position by list_length in that routine.
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers