Check SubPlan clause for nonnullable rels/Vars
Hi hackers,
While wandering around the codes of reducing outer joins, I noticed that
when determining which base rels/Vars are forced nonnullable by given
clause, we don't take SubPlan into consideration. Does anyone happen to
know what is the concern behind that?
IMO, for SubPlans of type ALL/ANY/ROWCOMPARE, we should be able to find
additional nonnullable rels/Vars by descending through their testexpr.
As we know, ALL_SUBLINK/ANY_SUBLINK combine results across tuples
produced by the subplan using AND/OR semantics. ROWCOMPARE_SUBLINK
doesn't allow multiple tuples from the subplan. So we can tell whether
the subplan is strict or not by checking its testexpr, leveraging the
existing codes in find_nonnullable_rels/vars_walker. Below is an
example:
# explain (costs off)
select * from a left join b on a.i = b.i where b.i = ANY (select i from c
where c.j = b.j);
QUERY PLAN
-----------------------------------
Hash Join
Hash Cond: (b.i = a.i)
-> Seq Scan on b
Filter: (SubPlan 1)
SubPlan 1
-> Seq Scan on c
Filter: (j = b.j)
-> Hash
-> Seq Scan on a
(9 rows)
BTW, this change would also have impact on SpecialJoinInfo, especially
for the case of identity 3, because find_nonnullable_rels() is also used
to determine strict_relids from the clause. As an example, consider
select * from a left join b on a.i = b.i
left join c on b.j = ANY (select j from c);
Now we can know the SubPlan is strict for 'b'. Thus the b/c join would
be considered to be legal.
Thanks
Richard
Attachments:
v1-0001-Check-SubPlan-clause-for-nonnullable-rels-Vars.patchapplication/octet-stream; name=v1-0001-Check-SubPlan-clause-for-nonnullable-rels-Vars.patchDownload
From 79f59201c076b9e7f9e6370683e8f0b324931d84 Mon Sep 17 00:00:00 2001
From: pgsql-guo <richard.guo@openpie.com>
Date: Sun, 11 Sep 2022 10:32:52 +0000
Subject: [PATCH v1] Check SubPlan clause for nonnullable rels/Vars
---
src/backend/optimizer/util/clauses.c | 14 +++++++
src/test/regress/expected/join.out | 56 ++++++++++++++++++++++++++++
src/test/regress/sql/join.sql | 28 ++++++++++++++
3 files changed, 98 insertions(+)
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 5c54171fee..c525ad9f71 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -1534,6 +1534,13 @@ find_nonnullable_rels_walker(Node *node, bool top_level)
bms_membership(phv->phrels) == BMS_SINGLETON)
result = bms_add_members(result, phv->phrels);
}
+ else if (IsA(node, SubPlan))
+ {
+ /* descend through testexpr for ALL/ANY/ROWCOMPARE */
+ SubPlan *splan = (SubPlan *) node;
+
+ result = find_nonnullable_rels_walker(splan->testexpr, top_level);
+ }
return result;
}
@@ -1742,6 +1749,13 @@ find_nonnullable_vars_walker(Node *node, bool top_level)
result = find_nonnullable_vars_walker((Node *) phv->phexpr, top_level);
}
+ else if (IsA(node, SubPlan))
+ {
+ /* descend through testexpr for ALL/ANY/ROWCOMPARE */
+ SubPlan *splan = (SubPlan *) node;
+
+ result = find_nonnullable_vars_walker(splan->testexpr, top_level);
+ }
return result;
}
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 2ed2e542a4..cc110eaf20 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6654,3 +6654,59 @@ where exists (select 1 from j3
(13 rows)
drop table j3;
+--
+-- test check of SubPlan clause for nonnullable rels/Vars when reducing outer joins
+--
+begin;
+create temp table a (i int, j int);
+create temp table b (i int, j int);
+create temp table c (i int, j int);
+insert into a values (1,1), (2,2), (3,3);
+insert into b values (2,2), (3,3), (4,4);
+insert into c values (3,3), (4,4), (5,5);
+analyze a;
+analyze b;
+analyze c;
+explain (costs off)
+select * from a left join b on a.i = b.i where b.i = ANY (select i from c where c.j = b.j);
+ QUERY PLAN
+-----------------------------------
+ Hash Join
+ Hash Cond: (b.i = a.i)
+ -> Seq Scan on b
+ Filter: (SubPlan 1)
+ SubPlan 1
+ -> Seq Scan on c
+ Filter: (j = b.j)
+ -> Hash
+ -> Seq Scan on a
+(9 rows)
+
+select * from a left join b on a.i = b.i where b.i = ANY (select i from c where c.j = b.j);
+ i | j | i | j
+---+---+---+---
+ 3 | 3 | 3 | 3
+(1 row)
+
+explain (costs off)
+select * from a left join b on b.i = ANY (select i from c where c.j = a.j) where b.i is null;
+ QUERY PLAN
+-----------------------------
+ Nested Loop Anti Join
+ Join Filter: (SubPlan 1)
+ -> Seq Scan on a
+ -> Materialize
+ -> Seq Scan on b
+ SubPlan 1
+ -> Seq Scan on c
+ Filter: (j = a.j)
+(8 rows)
+
+select * from a left join b on b.i = ANY (select i from c where c.j = a.j) where b.i is null;
+ i | j | i | j
+---+---+---+---
+ 1 | 1 | |
+ 2 | 2 | |
+(2 rows)
+
+rollback;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 27e7e741a1..7fef18dbd7 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2341,3 +2341,31 @@ where exists (select 1 from j3
and t1.unique1 < 1;
drop table j3;
+
+--
+-- test check of SubPlan clause for nonnullable rels/Vars when reducing outer joins
+--
+
+begin;
+
+create temp table a (i int, j int);
+create temp table b (i int, j int);
+create temp table c (i int, j int);
+
+insert into a values (1,1), (2,2), (3,3);
+insert into b values (2,2), (3,3), (4,4);
+insert into c values (3,3), (4,4), (5,5);
+
+analyze a;
+analyze b;
+analyze c;
+
+explain (costs off)
+select * from a left join b on a.i = b.i where b.i = ANY (select i from c where c.j = b.j);
+select * from a left join b on a.i = b.i where b.i = ANY (select i from c where c.j = b.j);
+
+explain (costs off)
+select * from a left join b on b.i = ANY (select i from c where c.j = a.j) where b.i is null;
+select * from a left join b on b.i = ANY (select i from c where c.j = a.j) where b.i is null;
+
+rollback;
--
2.25.1
Richard Guo <guofenglinux@gmail.com> writes:
While wandering around the codes of reducing outer joins, I noticed that
when determining which base rels/Vars are forced nonnullable by given
clause, we don't take SubPlan into consideration. Does anyone happen to
know what is the concern behind that?
Probably just didn't bother with the case at the time.
IMO, for SubPlans of type ALL/ANY/ROWCOMPARE, we should be able to find
additional nonnullable rels/Vars by descending through their testexpr.
I think you can make something of this, but you need to be a lot more
paranoid than this patch is.
* I don't believe you can prove anything from an ALL_SUBLINK SubPlan,
because it will return true if the sub-query returns zero rows, no
matter what the testexpr is. (Maybe if you could prove the sub-query
does return a row, but I doubt it's worth going there.)
* You need to explicitly check the subLinkType; as written this'll
consider EXPR_SUBLINK and so on. I'm not really on board with
assuming that nothing bad will happen with sublink types other than
the ones the code is expecting.
* It's not apparent to me that it's okay to pass down "top_level"
rather than "false". Maybe it's all right, but it could do with
a comment.
regards, tom lane
On Thu, Nov 3, 2022 at 4:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
* I don't believe you can prove anything from an ALL_SUBLINK SubPlan,
because it will return true if the sub-query returns zero rows, no
matter what the testexpr is. (Maybe if you could prove the sub-query
does return a row, but I doubt it's worth going there.)
Thanks for pointing this out. You're right. I didn't consider the case
that the subplan produces zero rows. In this case ALL_SUBLINK will
always return true, and ANY_SUBLINK will always return false. That
makes ALL_SUBLINK not strict, and ANY_SUBLINK can be strict only at top
level.
* You need to explicitly check the subLinkType; as written this'll
consider EXPR_SUBLINK and so on. I'm not really on board with
assuming that nothing bad will happen with sublink types other than
the ones the code is expecting.
Yes, I need to check for ANY_SUBLINK and ROWCOMPARE_SUBLINK here. The
testexpr is only meaningful for ALL/ANY/ROWCOMPARE, and ALL_SUBLINK has
been proven not strict.
* It's not apparent to me that it's okay to pass down "top_level"
rather than "false". Maybe it's all right, but it could do with
a comment.
The 'top_level' param is one point that I'm not very confident about.
I've added comments in the v2 patch.
Thanks for reviewing this patch!
Thanks
Richard
Attachments:
v2-0001-Check-SubPlan-clause-for-nonnullable-rels-Vars.patchapplication/octet-stream; name=v2-0001-Check-SubPlan-clause-for-nonnullable-rels-Vars.patchDownload
From 8001e52ebe22c775179c9432ec91220a0d089cce Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Thu, 3 Nov 2022 16:12:54 +0800
Subject: [PATCH v2] Check SubPlan clause for nonnullable rels/Vars
---
src/backend/optimizer/util/clauses.c | 46 +++++++++++++++++++++++
src/test/regress/expected/join.out | 56 ++++++++++++++++++++++++++++
src/test/regress/sql/join.sql | 28 ++++++++++++++
3 files changed, 130 insertions(+)
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 7fb32a0710..ef3d3ab617 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -1534,6 +1534,29 @@ find_nonnullable_rels_walker(Node *node, bool top_level)
bms_membership(phv->phrels) == BMS_SINGLETON)
result = bms_add_members(result, phv->phrels);
}
+ else if (IsA(node, SubPlan))
+ {
+ /* descend through testexpr for ALL/ANY/ROWCOMPARE */
+ SubPlan *splan = (SubPlan *) node;
+
+ /*
+ * For ALL_SUBLINK, if the subplan produces zero rows, the result is
+ * always TRUE. So ALL_SUBLINK is not strict.
+ *
+ * For ANY_SUBLINK, if the subplan produces zero rows, the result is
+ * always FALSE. If the subplan produces more than one rows, the
+ * per-row results are combined using OR semantics. So ANY_SUBLINK can
+ * be strict only at top level.
+ *
+ * For ROWCOMPARE_SUBLINK, if the subplan produces zero rows, the
+ * result is always NULL. Otherwise, the subplan is only allowed to
+ * produce one row, and the result for ROWCOMPARE_SUBLINK is the same
+ * as its testexpr's.
+ */
+ if ((top_level && splan->subLinkType == ANY_SUBLINK) ||
+ splan->subLinkType == ROWCOMPARE_SUBLINK)
+ result = find_nonnullable_rels_walker(splan->testexpr, top_level);
+ }
return result;
}
@@ -1742,6 +1765,29 @@ find_nonnullable_vars_walker(Node *node, bool top_level)
result = find_nonnullable_vars_walker((Node *) phv->phexpr, top_level);
}
+ else if (IsA(node, SubPlan))
+ {
+ /* descend through testexpr for ALL/ANY/ROWCOMPARE */
+ SubPlan *splan = (SubPlan *) node;
+
+ /*
+ * For ALL_SUBLINK, if the subplan produces zero rows, the result is
+ * always TRUE. So ALL_SUBLINK is not strict.
+ *
+ * For ANY_SUBLINK, if the subplan produces zero rows, the result is
+ * always FALSE. If the subplan produces more than one rows, the
+ * per-row results are combined using OR semantics. So ANY_SUBLINK can
+ * be strict only at top level.
+ *
+ * For ROWCOMPARE_SUBLINK, if the subplan produces zero rows, the
+ * result is always NULL. Otherwise, the subplan is only allowed to
+ * produce one row, and the result for ROWCOMPARE_SUBLINK is the same
+ * as its testexpr's.
+ */
+ if ((top_level && splan->subLinkType == ANY_SUBLINK) ||
+ splan->subLinkType == ROWCOMPARE_SUBLINK)
+ result = find_nonnullable_vars_walker(splan->testexpr, top_level);
+ }
return result;
}
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index b901d7299f..b365af97db 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6720,3 +6720,59 @@ where exists (select 1 from j3
(13 rows)
drop table j3;
+--
+-- test check of SubPlan clause for nonnullable rels/Vars when reducing outer joins
+--
+begin;
+create temp table a (i int, j int);
+create temp table b (i int, j int);
+create temp table c (i int, j int);
+insert into a values (1,1), (2,2), (3,3);
+insert into b values (2,2), (3,3), (4,4);
+insert into c values (3,3), (4,4), (5,5);
+analyze a;
+analyze b;
+analyze c;
+explain (costs off)
+select * from a left join b on a.i = b.i where b.i = ANY (select i from c where c.j = b.j);
+ QUERY PLAN
+-----------------------------------
+ Hash Join
+ Hash Cond: (b.i = a.i)
+ -> Seq Scan on b
+ Filter: (SubPlan 1)
+ SubPlan 1
+ -> Seq Scan on c
+ Filter: (j = b.j)
+ -> Hash
+ -> Seq Scan on a
+(9 rows)
+
+select * from a left join b on a.i = b.i where b.i = ANY (select i from c where c.j = b.j);
+ i | j | i | j
+---+---+---+---
+ 3 | 3 | 3 | 3
+(1 row)
+
+explain (costs off)
+select * from a left join b on b.i = ANY (select i from c where c.j = a.j) where b.i is null;
+ QUERY PLAN
+-----------------------------
+ Nested Loop Anti Join
+ Join Filter: (SubPlan 1)
+ -> Seq Scan on a
+ -> Materialize
+ -> Seq Scan on b
+ SubPlan 1
+ -> Seq Scan on c
+ Filter: (j = a.j)
+(8 rows)
+
+select * from a left join b on b.i = ANY (select i from c where c.j = a.j) where b.i is null;
+ i | j | i | j
+---+---+---+---
+ 1 | 1 | |
+ 2 | 2 | |
+(2 rows)
+
+rollback;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index ccbbe5454c..f218f92675 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2366,3 +2366,31 @@ where exists (select 1 from j3
and t1.unique1 < 1;
drop table j3;
+
+--
+-- test check of SubPlan clause for nonnullable rels/Vars when reducing outer joins
+--
+
+begin;
+
+create temp table a (i int, j int);
+create temp table b (i int, j int);
+create temp table c (i int, j int);
+
+insert into a values (1,1), (2,2), (3,3);
+insert into b values (2,2), (3,3), (4,4);
+insert into c values (3,3), (4,4), (5,5);
+
+analyze a;
+analyze b;
+analyze c;
+
+explain (costs off)
+select * from a left join b on a.i = b.i where b.i = ANY (select i from c where c.j = b.j);
+select * from a left join b on a.i = b.i where b.i = ANY (select i from c where c.j = b.j);
+
+explain (costs off)
+select * from a left join b on b.i = ANY (select i from c where c.j = a.j) where b.i is null;
+select * from a left join b on b.i = ANY (select i from c where c.j = a.j) where b.i is null;
+
+rollback;
--
2.31.0
Richard Guo <guofenglinux@gmail.com> writes:
[ v2-0001-Check-SubPlan-clause-for-nonnullable-rels-Vars.patch ]
Pushed with cosmetic changes:
* I don't believe in "add at the end" as a principle for placement
of new code. There's usually some other logic that will give more
consistent results. In cases like this, ordering the treatment of
Node types in the same way as they appear in the include/nodes/
headers is the standard answer. (Not that everybody's been totally
consistent about that :-( ... but that's not an argument for
introducing even more entropy.)
* I rewrote the comments a bit.
* I didn't like the test case too much: spinning up a whole new set
of tables seems like a lot of useless cycles. Plus it makes it
harder to experiment with the test query manually. I usually like
to write such queries using the regression database's standard tables,
so I rewrote this example that way.
regards, tom lane
On Sun, Nov 6, 2022 at 3:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
[ v2-0001-Check-SubPlan-clause-for-nonnullable-rels-Vars.patch ]
Pushed with cosmetic changes:
* I don't believe in "add at the end" as a principle for placement
of new code. There's usually some other logic that will give more
consistent results. In cases like this, ordering the treatment of
Node types in the same way as they appear in the include/nodes/
headers is the standard answer. (Not that everybody's been totally
consistent about that :-( ... but that's not an argument for
introducing even more entropy.)* I rewrote the comments a bit.
* I didn't like the test case too much: spinning up a whole new set
of tables seems like a lot of useless cycles. Plus it makes it
harder to experiment with the test query manually. I usually like
to write such queries using the regression database's standard tables,
so I rewrote this example that way.
Thanks for the changes. They make the patch look better. And thanks for
pushing it.
Thanks
Richard