Pulling up direct-correlated ANY_SUBLINK
Hi,
Currently we do not try to pull up sub-select of type ANY_SUBLINK if it
refers to any Vars of the parent query, as indicated in the code snippet
below:
JoinExpr *
convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
Relids available_rels)
{
...
if (contain_vars_of_level((Node *) subselect, 1))
return NULL;
Why do we have this check?
Can we try to pull up direct-correlated ANY SubLink with the help of
LATERAL? That is, do the pull up in the same way as uncorrelated ANY
SubLink, by adding the SubLink's subselect to the query's rangetable,
but explicitly set LATERAL for its RangeTblEntry, like:
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1226,13 +1226,6 @@ convert_ANY_sublink_to_join(PlannerInfo *root,
SubLink *sublink,
Assert(sublink->subLinkType == ANY_SUBLINK);
/*
- * The sub-select must not refer to any Vars of the parent query.
(Vars of
- * higher levels should be okay, though.)
- */
- if (contain_vars_of_level((Node *) subselect, 1))
- return NULL;
-
- /*
* The test expression must contain some Vars of the parent query,
else
* it's not gonna be a join. (Note that it won't have Vars
referring to
* the subquery, rather Params.)
@@ -1267,7 +1260,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root,
SubLink *sublink,
rte = addRangeTableEntryForSubquery(pstate,
subselect,
makeAlias("ANY_subquery", NIL),
- false,
+ contain_vars_of_level((Node *)
subselect, 1), /* lateral */
false);
parse->rtable = lappend(parse->rtable, rte);
rtindex = list_length(parse->rtable);
By this way, we can convert the query:
select * from a where a.i = ANY(select i from b where a.j > b.j);
To:
select * from a SEMI JOIN lateral(select * from b where a.j > b.j) sub on
a.i = sub.i;
Does this make sense?
Thanks
Richard
Richard Guo <riguo@pivotal.io> wrote:
Can we try to pull up direct-correlated ANY SubLink with the help of
LATERAL?
By this way, we can convert the query:
select * from a where a.i = ANY(select i from b where a.j > b.j);
To:
select * from a SEMI JOIN lateral(select * from b where a.j > b.j)
sub on a.i = sub.i;
I tried this a few years ago. This is where the problems started:
/messages/by-id/1386716782.5203.YahooMailNeo@web162905.mail.bf1.yahoo.com
I'm not sure I remember enough, but the problem has something to do with one
possible strategy to plan SEMI JOIN: unique-ify the inner path and then
perform plain INNER JOIN instead.
I think the problemm was that the WHERE clause of the subquery didn't
participate in the SEMI JOIN evaluation and was used as filter instead. Thus
the clause's Vars were not used in unique keys of the inner path and so the
SEMI JOIN didn't work well.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
Richard Guo <riguo@pivotal.io> writes:
Currently we do not try to pull up sub-select of type ANY_SUBLINK if it
refers to any Vars of the parent query, as indicated in the code snippet
below:
if (contain_vars_of_level((Node *) subselect, 1))
return NULL;
Why do we have this check?
Because the result would not be a join between two independent tables.
Can we try to pull up direct-correlated ANY SubLink with the help of
LATERAL?
Perhaps. But what's the argument that you'd end up with a better
plan? LATERAL pretty much constrains things to use a nestloop,
so I'm not sure there's anything fundamentally different.
regards, tom lane
Hi Antonin,
On Tue, Sep 10, 2019 at 4:31 PM Antonin Houska <ah@cybertec.at> wrote:
Richard Guo <riguo@pivotal.io> wrote:
Can we try to pull up direct-correlated ANY SubLink with the help of
LATERAL?By this way, we can convert the query:
select * from a where a.i = ANY(select i from b where a.j > b.j);
To:
select * from a SEMI JOIN lateral(select * from b where a.j > b.j)
sub on a.i = sub.i;I tried this a few years ago. This is where the problems started:
/messages/by-id/1386716782.5203.YahooMailNeo@web162905.mail.bf1.yahoo.com
Thank you for this link. Good to know the discussions years ago.
I'm not sure I remember enough, but the problem has something to do with
one
possible strategy to plan SEMI JOIN: unique-ify the inner path and then
perform plain INNER JOIN instead.I think the problemm was that the WHERE clause of the subquery didn't
participate in the SEMI JOIN evaluation and was used as filter instead.
Thus
the clause's Vars were not used in unique keys of the inner path and so the
SEMI JOIN didn't work well.
This used to be a problem until it was fixed by commit 043f6ff0, which
includes the postponed qual from a LATERAL subquery into the quals seen
by make_outerjoininfo().
Thanks
Richard
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Can we try to pull up direct-correlated ANY SubLink with the help of
LATERAL?Perhaps. But what's the argument that you'd end up with a better
plan? LATERAL pretty much constrains things to use a nestloop,
so I'm not sure there's anything fundamentally different.
I think that subquery pull-up is most beneficial when the queries (both the
subquery and the upper query) contain more than a few tables. In such a case,
if only a few tables reference the upper query (or if just a single one does),
the constraints imposed by LATERAL might be less significant.
Nevertheless, I don't know how to overcome the problems that I mentioned
upthread.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
Hi Antonin,
On Wed, Sep 11, 2019 at 3:25 PM Antonin Houska <ah@cybertec.at> wrote:
Nevertheless, I don't know how to overcome the problems that I mentioned
upthread.
Do you mean the problem "the WHERE clause of the subquery didn't
participate in the SEMI JOIN evaluation"? Good news is it has been fixed
by commit 043f6ff0 as I mentioned upthread.
Thanks
Richard
Hi Tom,
On Tue, Sep 10, 2019 at 9:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Can we try to pull up direct-correlated ANY SubLink with the help of
LATERAL?Perhaps. But what's the argument that you'd end up with a better
plan? LATERAL pretty much constrains things to use a nestloop,
so I'm not sure there's anything fundamentally different.
This is a point I didn't think of. In that case if the pull-up mostly
results in a nestloop then we cannot make sure we will get a better
plan. Thank you for pointing it out.
Thanks
Richard
Richard Guo <riguo@pivotal.io> wrote:
On Wed, Sep 11, 2019 at 3:25 PM Antonin Houska <ah@cybertec.at>
wrote:Nevertheless, I don't know how to overcome the problems that I
mentioned
upthread.Do you mean the problem "the WHERE clause of the subquery didn't
participate in the SEMI JOIN evaluation"? Good news is it has been
fixed
by commit 043f6ff0 as I mentioned upthread.
Do you say that my old patch (rebased) no longer breaks the regression tests?
(I noticed your other email in the thread which seems to indicate that you're
no lo longer interested to work on the feature, but asking out of curiosity.)
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
On Thu, Sep 12, 2019 at 11:35 PM Antonin Houska <ah@cybertec.at> wrote:
Richard Guo <riguo@pivotal.io> wrote:
On Wed, Sep 11, 2019 at 3:25 PM Antonin Houska <ah@cybertec.at>
wrote:Nevertheless, I don't know how to overcome the problems that I
mentioned
upthread.Do you mean the problem "the WHERE clause of the subquery didn't
participate in the SEMI JOIN evaluation"? Good news is it has been
fixed
by commit 043f6ff0 as I mentioned upthread.Do you say that my old patch (rebased) no longer breaks the regression
tests?
I think so.
(I noticed your other email in the thread which seems to indicate that
you're
no lo longer interested to work on the feature, but asking out of
curiosity.)
Tom pointed out that even if we pull up the subquery with the help of
LATERAL, we cannot make sure we will end up with a better plan, since
LATERAL pretty much constrains things to use a nestloop. Hmm, I think
what he said makes sense.
Thanks
Richard
On Tue, Sep 17, 2019 at 4:41 PM Richard Guo <riguo@pivotal.io> wrote:
On Thu, Sep 12, 2019 at 11:35 PM Antonin Houska <ah@cybertec.at> wrote:
Richard Guo <riguo@pivotal.io> wrote:
On Wed, Sep 11, 2019 at 3:25 PM Antonin Houska <ah@cybertec.at>
wrote:Nevertheless, I don't know how to overcome the problems that I
mentioned
upthread.Do you mean the problem "the WHERE clause of the subquery didn't
participate in the SEMI JOIN evaluation"? Good news is it has been
fixed
by commit 043f6ff0 as I mentioned upthread.Do you say that my old patch (rebased) no longer breaks the regression
tests?I think so.
(I noticed your other email in the thread which seems to indicate that
you're
no lo longer interested to work on the feature, but asking out of
curiosity.)Tom pointed out that even if we pull up the subquery with the help of
LATERAL, we cannot make sure we will end up with a better plan, since
LATERAL pretty much constrains things to use a nestloop. Hmm, I think
what he said makes sense.Thanks
Richard
Even if we can't do this for the general case, I still think we can do
something
for some special cases, for example:
select count(*) from j1 where (i) *in* (select i from j2 where* j2.im5 =
j1.im5*);
can be converted to
select count(*) from t1 where (i, im5) in (select i, im5 from j2);
The conversion can happen just before the convert_ANY_sublink_to_join.
@@ -399,6 +483,7 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node
*node,
/* Is it a convertible ANY or EXISTS clause? */
if (sublink->subLinkType == ANY_SUBLINK)
{
+ reduce_sublink_correlation_exprs(root, sublink);
if ((j = convert_ANY_sublink_to_join(root, sublink,
available_rels1)) != NULL)
However we have to do lots of pre checking for this, the below is
something I can think for now.
1). It must be an in-subquery.
2). The op in correlation_expr must be a mergeable op.
3). no aggregation call in subquery->targetList and subquery->havingQual.
4). no limit/offset cause.
5). No volatile function involved for safety.
I can't tell how often it is, I just run into this by my own and search the
maillist and get only 1 report [1]/messages/by-id/3691.1342650974@sss.pgh.pa.us. Is it something worth doing or do we
have
a better strategy to handle it? Thanks!
[1]: /messages/by-id/3691.1342650974@sss.pgh.pa.us
--
Best Regards
Andy Fan
On Tue, Sep 10, 2019 at 9:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Can we try to pull up direct-correlated ANY SubLink with the help of
LATERAL?Perhaps. But what's the argument that you'd end up with a better
plan? LATERAL pretty much constrains things to use a nestloop,
so I'm not sure there's anything fundamentally different.
Sorry for the noise on replying such an old thread, but recently I
realized that pulling up direct-correlated ANY SubLink with LATERAL may
cause another problem that we cannot find any legal join order due to
the constraints imposed by LATERAL references. Below is an example:
select * from A where exists
(select * from B where A.i in (select C.i from C where C.j = B.j));
For this query, after we converting the ANY SubLink to a LATERAL
subquery, the subquery cannot be pulled up further into the parent query
because its qual contains lateral reference to 'B', which is outside a
higher semi join. When considering the join of 'A' and the 'subquery',
we decide it's not legal due to the LATERAL reference. As a result, we
end up with not finding any legal join order for level 2.
Thanks
Richard
Hi All:
On Tue, Sep 10, 2019 at 9:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <riguo@pivotal.io> writes:
Currently we do not try to pull up sub-select of type ANY_SUBLINK if it
refers to any Vars of the parent query, as indicated in the code snippet
below:
if (contain_vars_of_level((Node *) subselect, 1))
return NULL;
Why do we have this check?Because the result would not be a join between two independent tables.
I think this situation is caused by we pull-up the ANY-sublink with 2
steps, the first step is to pull up the sublink as a subquery, and the
next step is to pull up the subquery if it is allowed. The benefits of
this method are obvious, pulling up the subquery has more requirements,
even if we can just finish the first step, we still get huge benefits.
However the bad stuff happens if varlevelsup = 1 involves, step 1 fails!
The solution here is to use the lateral join to overcome the two
independent tables, the issue of this solution includes:
1. LATERAL pretty much constrains things to use a nestloop like below,
but this reason is questioned since if we can pull-up the subquery, if so
the
constraint gone. [1]/messages/by-id/60794.1568104308@antos
2. It has something with unique-ify the inner path. [2]/messages/by-id/60794.1568104308@antos , but Richard
thought
it should be fixed but without an agreement for all people [3]/messages/by-id/CAN_9JTzqa-3RmHAw3wZv099Rk8xX480YdEvGy+JAdVw8dTnHRA@mail.gmail.com.
3. Richard [4]/messages/by-id/CAMbWs49cvkF9akbomz_fCCKS=D5TY=4KGHEQcfHPZCXS1GVhkA@mail.gmail.com found it would fail to get a plan for some query. (the
error is
below per my testing)
ERROR: failed to build any 3-way joins
So back to the root cause of this issue, IIUC, if varlevelsup = 1
involves,
can we just bypass the 2-steps method, just as what we do for EXISTS
sublinks? If so, we just need to convert the ANY-SUBLINK to EXIST-SUBLINK
under the case.
The attached is the one commit which includes the 2 methods discussed
here, controlled by different GUC separately, for easy testing. Per my
test,
Query 2 choosed the Unique Join with the IN-to-EXISTS method, but not
with the Lateral method, and query 3 raises error with the lateral method,
but not with the IN-to-EXISTS method.
[1]: /messages/by-id/60794.1568104308@antos
/messages/by-id/60794.1568104308@antos
[2]: /messages/by-id/60794.1568104308@antos
[3]: /messages/by-id/CAN_9JTzqa-3RmHAw3wZv099Rk8xX480YdEvGy+JAdVw8dTnHRA@mail.gmail.com
/messages/by-id/CAN_9JTzqa-3RmHAw3wZv099Rk8xX480YdEvGy+JAdVw8dTnHRA@mail.gmail.com
[4]: /messages/by-id/CAMbWs49cvkF9akbomz_fCCKS=D5TY=4KGHEQcfHPZCXS1GVhkA@mail.gmail.com
/messages/by-id/CAMbWs49cvkF9akbomz_fCCKS=D5TY=4KGHEQcfHPZCXS1GVhkA@mail.gmail.com
Can we try to pull up direct-correlated ANY SubLink with the help of
LATERAL?Perhaps. But what's the argument that you'd end up with a better
plan? LATERAL pretty much constrains things to use a nestloop,
so I'm not sure there's anything fundamentally different.regards, tom lane
--
Best Regards
Andy Fan
Attachments:
v1-0001-2-methods-for-Pulling-up-direct-correlated-ANY_SU.patchapplication/x-patch; name=v1-0001-2-methods-for-Pulling-up-direct-correlated-ANY_SU.patchDownload
From 1bdc3b9098851ab1f6897f497daf90c601eca27e Mon Sep 17 00:00:00 2001
From: Andy Fan <zhihui.fan1213@gmail.com>
Date: Sun, 9 Oct 2022 17:47:23 +0800
Subject: [PATCH v1] 2 methods for Pulling up direct-correlated ANY_SUBLINK
---
.../postgres_fdw/expected/postgres_fdw.out | 24 +-
src/backend/optimizer/plan/subselect.c | 7 +-
src/backend/optimizer/prep/prepjointree.c | 291 ++++++++++++++++++
src/backend/utils/misc/guc_tables.c | 23 ++
src/test/regress/expected/join.out | 35 ++-
src/test/regress/expected/subselect.out | 121 ++++++++
src/test/regress/sql/subselect.sql | 61 ++++
7 files changed, 531 insertions(+), 31 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index b3c8ce01313..870590df562 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11377,19 +11377,19 @@ CREATE FOREIGN TABLE foreign_tbl2 () INHERITS (foreign_tbl)
SERVER loopback OPTIONS (table_name 'base_tbl');
EXPLAIN (VERBOSE, COSTS OFF)
SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
- QUERY PLAN
------------------------------------------------------------------------------
- Seq Scan on public.base_tbl
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Semi Join
Output: base_tbl.a
- Filter: (SubPlan 1)
- SubPlan 1
- -> Result
- Output: base_tbl.a
- -> Append
- -> Async Foreign Scan on public.foreign_tbl foreign_tbl_1
- Remote SQL: SELECT NULL FROM public.base_tbl
- -> Async Foreign Scan on public.foreign_tbl2 foreign_tbl_2
- Remote SQL: SELECT NULL FROM public.base_tbl
+ -> Seq Scan on public.base_tbl
+ Output: base_tbl.a, base_tbl.b
+ Filter: (base_tbl.a IS NOT NULL)
+ -> Materialize
+ -> Append
+ -> Async Foreign Scan on public.foreign_tbl foreign_tbl_1
+ Remote SQL: SELECT NULL FROM public.base_tbl
+ -> Async Foreign Scan on public.foreign_tbl2 foreign_tbl_2
+ Remote SQL: SELECT NULL FROM public.base_tbl
(11 rows)
SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 92e33385842..c1e8d8e0c2c 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1264,6 +1264,8 @@ inline_cte_walker(Node *node, inline_cte_walker_context *context)
* subselect to the query's rangetable, so that it can be referenced in
* the JoinExpr's rarg.
*/
+extern bool enable_lateral_pullup;
+
JoinExpr *
convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
Relids available_rels)
@@ -1286,7 +1288,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
* The sub-select must not refer to any Vars of the parent query. (Vars of
* higher levels should be okay, though.)
*/
- if (contain_vars_of_level((Node *) subselect, 1))
+ if (contain_vars_of_level((Node *) subselect, 1) && !enable_lateral_pullup)
return NULL;
/*
@@ -1324,7 +1326,8 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
nsitem = addRangeTableEntryForSubquery(pstate,
subselect,
makeAlias("ANY_subquery", NIL),
- false,
+ /* lateral */
+ enable_lateral_pullup ? contain_vars_of_level((Node *) subselect, 1) : false,
false);
rte = nsitem->p_rte;
parse->rtable = lappend(parse->rtable, rte);
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 41c7066d90a..a8470be1c20 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -130,6 +130,7 @@ static void substitute_phv_relids(Node *node,
static void fix_append_rel_relids(List *append_rel_list, int varno,
Relids subrelids);
static Node *find_jointree_node_for_rel(Node *jtnode, int relid);
+static void transform_IN_sublink_to_EXIST_recurse(PlannerInfo *root, Node *jtnode);
/*
@@ -256,6 +257,291 @@ replace_empty_jointree(Query *parse)
parse->jointree->fromlist = list_make1(rtr);
}
+
+/*
+ * sublink_should_be_transformed
+ *
+ * Check if the sublink is a simple IN sublink.
+ */
+static bool
+sublink_should_be_transformed(PlannerInfo *root, SubLink *sublink)
+{
+ const char * operName;
+ Query *subselect = (Query *) sublink->subselect;
+ Node *whereClause;
+
+ if (sublink->subLinkType != ANY_SUBLINK || list_length(sublink->operName) != 1)
+ return false;
+
+ operName = linitial_node(String, sublink->operName)->sval;
+
+ if (strcmp(operName, "=") != 0)
+ return false;
+
+ if (!contain_vars_of_level((Node *) subselect, 1))
+ /* The existing framework can handle it well, so no action needed. */
+ return false;
+
+ if (list_length(subselect->rtable) == 0)
+ /* Get rid of this special cases for safety. */
+ return false;
+
+ /*
+ * The below checks are similar with the checks in convert_EXISTS_sublink_to_join
+ * so that that the new EXISTS-Sublink can be pull-up later.
+ */
+ if (subselect->cteList)
+ return false;
+
+ /* See simplify_EXISTS_query */
+
+ if (subselect->commandType != CMD_SELECT ||
+ subselect->setOperations ||
+ subselect->hasAggs ||
+ subselect->groupingSets ||
+ subselect->hasWindowFuncs ||
+ subselect->hasTargetSRFs ||
+ subselect->hasModifyingCTE ||
+ subselect->havingQual ||
+ subselect->limitOffset ||
+ subselect->rowMarks)
+ return false;
+
+ if (subselect->limitCount)
+ {
+ /*
+ * The LIMIT clause has not yet been through eval_const_expressions,
+ * so we have to apply that here. It might seem like this is a waste
+ * of cycles, since the only case plausibly worth worrying about is
+ * "LIMIT 1" ... but what we'll actually see is "LIMIT int8(1::int4)",
+ * so we have to fold constants or we're not going to recognize it.
+ */
+ Node *node = eval_const_expressions(root, subselect->limitCount);
+ Const *limit;
+
+ /* Might as well update the query if we simplified the clause. */
+
+ /* XXX: we do have the modification, but it is not harmful. */
+ subselect->limitCount = node;
+
+ if (!IsA(node, Const))
+ return false;
+
+ limit = (Const *) node;
+ Assert(limit->consttype == INT8OID);
+ if (!limit->constisnull && DatumGetInt64(limit->constvalue) <= 0)
+ return false;
+ }
+
+ whereClause = subselect->jointree->quals;
+ subselect->jointree->quals = NULL;
+
+ if (contain_vars_of_level((Node *) subselect, 1) ||
+ !contain_vars_of_level(whereClause, 1) ||
+ contain_volatile_functions(whereClause))
+ {
+ subselect->jointree->quals = whereClause;
+ return false;
+ }
+
+ /* Restore the whereClause. */
+ subselect->jointree->quals = whereClause;
+
+ /*
+ * No need to check the avaiable_rels like convert_EXISTS_sublink_to_join
+ * since here we just transform the sublinks type, no SEMIJOIN related.
+ */
+ return true;
+}
+
+/*
+ * replace_param_sublink_node
+ *
+ * Replace the PARAM_SUBLINK in src with target.
+ */
+static Node *
+replace_param_sublink_node(Node *src, Node *target)
+{
+
+ if (IsA(src, Param))
+ return target;
+
+ switch (nodeTag(src))
+ {
+ case T_RelabelType:
+ {
+ RelabelType *rtype = castNode(RelabelType, src);
+ rtype->arg = (Expr *)target;
+ break;
+ }
+ case T_FuncExpr:
+ {
+ FuncExpr *fexpr = castNode(FuncExpr, src);
+ Assert(list_length(fexpr->args));
+ Assert(linitial_node(Param, fexpr->args)->paramkind == PARAM_SUBLINK);
+ linitial(fexpr->args) = target;
+ break;
+ }
+ default:
+ {
+ Assert(false);
+ elog(ERROR, "Unexpected node type: %d", nodeTag(src));
+ }
+ }
+
+ /* src is in-placed updated. */
+ return src;
+
+}
+
+/*
+ * transform_IN_sublink_to_EXIST_qual_recurse
+ *
+ * Transform IN-SUBLINK with level-1 var to EXISTS-SUBLINK recursly.
+ */
+static Node *
+transform_IN_sublink_to_EXIST_qual_recurse(PlannerInfo *root, Node *node)
+{
+ if (node == NULL)
+ return NULL;
+
+ if (IsA(node, SubLink))
+ {
+ SubLink *sublink = (SubLink *) node;
+ Query *subselect = (Query *)sublink->subselect;
+ FromExpr *sub_fromexpr;
+
+ Assert(IsA(subselect, Query));
+
+ if (!sublink_should_be_transformed(root, sublink))
+ {
+ /* We still need to transform the subselect->jointree. */
+ transform_IN_sublink_to_EXIST_recurse(root, (Node *) subselect->jointree);
+ return node;
+ }
+
+ /*
+ * Make up the push-downed node from sublink->testexpr, the testexpr
+ * will be set to NULL later, so in-place update would be OK.
+ */
+ IncrementVarSublevelsUp(sublink->testexpr, 1, 0);
+
+ if (is_andclause(sublink->testexpr))
+ {
+ BoolExpr *and_expr = castNode(BoolExpr, sublink->testexpr);
+ ListCell *l1, *l2;
+ forboth(l1, and_expr->args, l2, subselect->targetList)
+ {
+ OpExpr *opexpr = lfirst_node(OpExpr, l1);
+ TargetEntry *tle = lfirst_node(TargetEntry, l2);
+ lsecond(opexpr->args) = replace_param_sublink_node(lsecond(opexpr->args),
+ (Node *) tle->expr);
+ }
+ }
+ else
+ {
+ OpExpr *opexpr = (OpExpr *) sublink->testexpr;
+ TargetEntry *tle = linitial_node(TargetEntry, subselect->targetList);
+ Assert(IsA(sublink->testexpr, OpExpr));
+ lsecond(opexpr->args) = replace_param_sublink_node(lsecond(opexpr->args),
+ (Node *) tle->expr);
+ }
+
+ /* Push down the transformed testexpr into subselect */
+ sub_fromexpr = subselect->jointree;
+ if (sub_fromexpr->quals == NULL)
+ sub_fromexpr->quals = sublink->testexpr;
+ else
+ sub_fromexpr->quals = make_and_qual(sub_fromexpr->quals,
+ (Node *) sublink->testexpr);
+
+ /*
+ * Turn the IN-Sublink to exist-SUBLINK for the parent query.
+ * sublink->subselect has already been modified.
+ */
+ sublink->subLinkType = EXISTS_SUBLINK;
+ sublink->operName = NIL;
+ sublink->testexpr = NULL;
+
+ /* Now transform the FromExpr in the subselect->jointree. */
+ transform_IN_sublink_to_EXIST_recurse(root, (Node *)sub_fromexpr);
+ return node;
+ }
+
+ if (is_andclause(node))
+ {
+ List *newclauses = NIL;
+ ListCell *l;
+ foreach(l, ((BoolExpr *) node)->args)
+ {
+ Node *oldclause = (Node *) lfirst(l);
+ Node *newclause;
+
+ newclause = transform_IN_sublink_to_EXIST_qual_recurse(root, oldclause);
+ newclauses = lappend(newclauses, newclause);
+ }
+
+ if (newclauses == NIL)
+ return NULL;
+ else if (list_length(newclauses) == 1)
+ return (Node *) linitial(newclauses);
+ else
+ return (Node *) make_andclause(newclauses);
+ }
+ else if (is_notclause(node))
+ {
+ /*
+ * NOT-IN can't be converted into NOT-exists.
+ */
+ return node;
+ }
+
+ return node;
+}
+
+/*
+ * transform_IN_sublink_to_EXIST_recurse
+ *
+ * Transform IN sublink to EXIST sublink if it benefits for sublink
+ * pull-ups.
+ */
+extern bool enable_geqo;
+static void
+transform_IN_sublink_to_EXIST_recurse(PlannerInfo *root, Node *jtnode)
+{
+ if (!enable_geqo)
+ return;
+
+ if (jtnode == NULL || IsA(jtnode, RangeTblRef))
+ {
+ return;
+ }
+ else if (IsA(jtnode, FromExpr))
+ {
+ FromExpr *f = (FromExpr *) jtnode;
+ ListCell *l;
+ foreach(l, f->fromlist)
+ {
+ transform_IN_sublink_to_EXIST_recurse(root, lfirst(l));
+ }
+ f->quals = transform_IN_sublink_to_EXIST_qual_recurse(root, f->quals);
+ }
+ else if (IsA(jtnode, JoinExpr))
+ {
+ JoinExpr *j = (JoinExpr *) jtnode;
+ transform_IN_sublink_to_EXIST_recurse(root, j->larg);
+ transform_IN_sublink_to_EXIST_recurse(root, j->rarg);
+
+ j->quals = transform_IN_sublink_to_EXIST_qual_recurse(root, j->quals);
+ }
+ else
+ {
+ elog(ERROR, "unrecognized node type: %d",
+ (int) nodeTag(jtnode));
+ }
+}
+
+
/*
* pull_up_sublinks
* Attempt to pull up ANY and EXISTS SubLinks to be treated as
@@ -284,12 +570,17 @@ replace_empty_jointree(Query *parse)
* to be AND/OR-flat either. That means we need to recursively search through
* explicit AND clauses. We stop as soon as we hit a non-AND item.
*/
+extern bool enable_in_exists_transfrom;
void
pull_up_sublinks(PlannerInfo *root)
{
Node *jtnode;
Relids relids;
+ if (enable_in_exists_transfrom)
+ transform_IN_sublink_to_EXIST_recurse(root,
+ (Node *)root->parse->jointree);
+
/* Begin recursion through the jointree */
jtnode = pull_up_sublinks_jointree_recurse(root,
(Node *) root->parse->jointree,
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 05ab087934c..600f0506043 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -763,8 +763,31 @@ StaticAssertDecl(lengthof(config_type_names) == (PGC_ENUM + 1),
* variable_is_guc_list_quote() in src/bin/pg_dump/dumputils.c.
*/
+bool enable_in_exists_transfrom = false;
+bool enable_lateral_pullup = false;
+
struct config_bool ConfigureNamesBool[] =
{
+ {
+ {"enable_in_exists_transfrom", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables the transform from in to exists."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_in_exists_transfrom,
+ false,
+ NULL, NULL, NULL
+ },
+ {
+ {"enable_lateral_pullup", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("pull up any sublink with lateral"),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_lateral_pullup,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_seqscan", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of sequential-scan plans."),
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 08334761ae6..6d5ccb7de2f 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5983,8 +5983,8 @@ lateral (select * from int8_tbl t1,
where q2 = (select greatest(t1.q1,t2.q2))
and (select v.id=0)) offset 0) ss2) ss
where t1.q1 = ss.q2) ss0;
- QUERY PLAN
------------------------------------------------------------------
+ QUERY PLAN
+---------------------------------------------------------------------
Nested Loop
Output: "*VALUES*".column1, t1.q1, t1.q2, ss2.q1, ss2.q2
-> Seq Scan on public.int8_tbl t1
@@ -5996,23 +5996,24 @@ lateral (select * from int8_tbl t1,
-> Subquery Scan on ss2
Output: ss2.q1, ss2.q2
Filter: (t1.q1 = ss2.q2)
- -> Seq Scan on public.int8_tbl t2
+ -> Result
Output: t2.q1, t2.q2
- Filter: (SubPlan 3)
- SubPlan 3
+ One-Time Filter: $3
+ InitPlan 2 (returns $3)
-> Result
- Output: t3.q2
- One-Time Filter: $4
- InitPlan 1 (returns $2)
- -> Result
- Output: GREATEST($0, t2.q2)
- InitPlan 2 (returns $4)
- -> Result
- Output: ($3 = 0)
- -> Seq Scan on public.int8_tbl t3
- Output: t3.q1, t3.q2
- Filter: (t3.q2 = $2)
-(27 rows)
+ Output: ($2 = 0)
+ -> Nested Loop Semi Join
+ Output: t2.q1, t2.q2
+ Join Filter: (t2.q1 = t3.q2)
+ -> Seq Scan on public.int8_tbl t2
+ Output: t2.q1, t2.q2
+ Filter: ((SubPlan 1) = t2.q1)
+ SubPlan 1
+ -> Result
+ Output: GREATEST($0, t2.q2)
+ -> Seq Scan on public.int8_tbl t3
+ Output: t3.q1, t3.q2
+(28 rows)
select * from (values (0), (1)) v(id),
lateral (select * from int8_tbl t1,
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 63d26d44fc3..5e0ce397233 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1926,3 +1926,124 @@ select * from x for update;
Output: subselect_tbl.f1, subselect_tbl.f2, subselect_tbl.f3
(2 rows)
+-- Test transform the level-1 in-sublink to existing sublink.
+create temp table temp_t1 (a int, b int, c int) on commit delete rows;
+create temp table temp_t2 (a int, b int, c int) on commit delete rows;
+create temp table temp_t3 (a int, b int, c int) on commit delete rows;
+create temp table temp_t4 (a int, b int, c int, d int) on commit delete rows;
+begin;
+insert into temp_t1 values (1, 1, 1), (2, 2, null), (3, null, null);
+insert into temp_t2 values (1, 1, 1), (2, 2, null), (3, null, null);
+insert into temp_t3 values (1, 1, 1), (2, 2, null), (3, null, null);
+insert into temp_t4 values (1, 1, 1, 1), (2, 2, null, null), (3, null, null, null);
+analyze temp_t1;
+analyze temp_t2;
+analyze temp_t3;
+-- one-elem in subquery
+select * from temp_t1 t1 where a in (select a from temp_t2 t2 where t2.b > t1.b);
+ a | b | c
+---+---+---
+(0 rows)
+
+explain (costs off)
+select * from temp_t1 t1 where a in (select a from temp_t2 t2 where t2.b > t1.b);
+ QUERY PLAN
+------------------------------------
+ Hash Semi Join
+ Hash Cond: (t1.a = t2.a)
+ Join Filter: (t2.b > t1.b)
+ -> Seq Scan on temp_t1 t1
+ -> Hash
+ -> Seq Scan on temp_t2 t2
+(6 rows)
+
+-- two-elem in subquery
+select * from temp_t1 t1 where (a, b) in (select a, b from temp_t2 t2 where t2.c = t1.c);
+ a | b | c
+---+---+---
+ 1 | 1 | 1
+(1 row)
+
+explain (costs off)
+select * from temp_t1 t1 where (a, b) in (select a, b from temp_t2 t2 where t2.c = t1.c);
+ QUERY PLAN
+------------------------------------------------------------------
+ Hash Semi Join
+ Hash Cond: ((t1.c = t2.c) AND (t1.a = t2.a) AND (t1.b = t2.b))
+ -> Seq Scan on temp_t1 t1
+ -> Hash
+ -> Seq Scan on temp_t2 t2
+(5 rows)
+
+-- sublink in sublink
+select * from temp_t1 t1
+where (a, b) in (select a, b from temp_t2 t2
+ where t2.c < t1.c
+ and t2.c in (select c from temp_t3 t3 where t3.b = t2.b));
+ a | b | c
+---+---+---
+(0 rows)
+
+explain (costs off)
+select * from temp_t1 t1
+where (a, b) in (select a, b from temp_t2 t2
+ where t2.c < t1.c
+ and t2.c in (select c from temp_t3 t3 where t3.b = t2.b));
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: ((t2.c < t1.c) AND (t1.b = t3.b) AND (t1.a = t2.a))
+ -> Seq Scan on temp_t1 t1
+ -> Materialize
+ -> Hash Semi Join
+ Hash Cond: ((t2.b = t3.b) AND (t2.c = t3.c))
+ -> Seq Scan on temp_t2 t2
+ -> Hash
+ -> Seq Scan on temp_t3 t3
+(9 rows)
+
+-- sublink in not-in sublinks. not in will not be transformed but the in-clause
+-- in the subselect should be transformed.
+explain (costs off)
+select * from temp_t1 t1
+where (a, b) not in (select a, b from temp_t2 t2
+ where t2.c < t1.c
+ and t1.c in (SELECT c from temp_t3 t3 where t3.b = t2.b ))
+and c > 3;
+ QUERY PLAN
+-------------------------------------------
+ Seq Scan on temp_t1 t1
+ Filter: ((c > 3) AND (NOT (SubPlan 1)))
+ SubPlan 1
+ -> Nested Loop Semi Join
+ Join Filter: (t2.b = t3.b)
+ -> Seq Scan on temp_t2 t2
+ Filter: (c < t1.c)
+ -> Seq Scan on temp_t3 t3
+ Filter: (t1.c = c)
+(9 rows)
+
+-- The clause in the ON-clause should be transformed.
+explain (costs off)
+select * from temp_t1 t1, (temp_t2 t2 join temp_t4 t4
+ on t2.a in (select a from temp_t3 t3 where t4.b = t3.b)) v
+where t1.a = v.d;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Join Filter: (t3.a = t2.a)
+ -> Hash Join
+ Hash Cond: (t4.d = t1.a)
+ -> Hash Join
+ Hash Cond: (t4.b = t3.b)
+ -> Seq Scan on temp_t4 t4
+ -> Hash
+ -> HashAggregate
+ Group Key: t3.b, t3.a
+ -> Seq Scan on temp_t3 t3
+ -> Hash
+ -> Seq Scan on temp_t1 t1
+ -> Seq Scan on temp_t2 t2
+(14 rows)
+
+commit;
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index 40276708c99..83ad18cf9b1 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -968,3 +968,64 @@ select * from (with x as (select 2 as y) select * from x) ss;
explain (verbose, costs off)
with x as (select * from subselect_tbl)
select * from x for update;
+
+
+-- Test transform the level-1 in-sublink to existing sublink.
+create temp table temp_t1 (a int, b int, c int) on commit delete rows;
+create temp table temp_t2 (a int, b int, c int) on commit delete rows;
+create temp table temp_t3 (a int, b int, c int) on commit delete rows;
+create temp table temp_t4 (a int, b int, c int, d int) on commit delete rows;
+
+begin;
+insert into temp_t1 values (1, 1, 1), (2, 2, null), (3, null, null);
+insert into temp_t2 values (1, 1, 1), (2, 2, null), (3, null, null);
+insert into temp_t3 values (1, 1, 1), (2, 2, null), (3, null, null);
+insert into temp_t4 values (1, 1, 1, 1), (2, 2, null, null), (3, null, null, null);
+
+analyze temp_t1;
+analyze temp_t2;
+analyze temp_t3;
+
+-- one-elem in subquery
+select * from temp_t1 t1 where a in (select a from temp_t2 t2 where t2.b > t1.b);
+explain (costs off)
+select * from temp_t1 t1 where a in (select a from temp_t2 t2 where t2.b > t1.b);
+
+-- two-elem in subquery
+select * from temp_t1 t1 where (a, b) in (select a, b from temp_t2 t2 where t2.c = t1.c);
+explain (costs off)
+select * from temp_t1 t1 where (a, b) in (select a, b from temp_t2 t2 where t2.c = t1.c);
+
+-- sublink in sublink
+select * from temp_t1 t1
+where (a, b) in (select a, b from temp_t2 t2
+ where t2.c < t1.c
+ and t2.c in (select c from temp_t3 t3 where t3.b = t2.b));
+explain (costs off)
+select * from temp_t1 t1
+where (a, b) in (select a, b from temp_t2 t2
+ where t2.c < t1.c
+ and t2.c in (select c from temp_t3 t3 where t3.b = t2.b));
+
+
+-- sublink in not-in sublinks. not in will not be transformed but the in-clause
+-- in the subselect should be transformed.
+explain (costs off)
+select * from temp_t1 t1
+where (a, b) not in (select a, b from temp_t2 t2
+ where t2.c < t1.c
+ and t1.c in (SELECT c from temp_t3 t3 where t3.b = t2.b ))
+and c > 3;
+
+
+-- The clause in the ON-clause should be transformed.
+explain (costs off)
+select * from temp_t1 t1, (temp_t2 t2 join temp_t4 t4
+ on t2.a in (select a from temp_t3 t3 where t4.b = t3.b)) v
+where t1.a = v.d;
+
+commit;
+
+
+
+
--
2.21.0