Inconsistent RestrictInfo serial numbers
I ran into an "ERROR: variable not found in subplan target lists"
error, which can be reproduced with the following query.
create table t (a int primary key, b int);
insert into t select i, i from generate_series(1, 10)i;
analyze t;
explain (costs off)
select 1 from t t1
left join
(t t2 left join t t3 on t2.a = t3.a) on true
left join t t4 on t1.a is null and t1.b = 1
right join t t5 on t1.b = t5.b;
ERROR: variable not found in subplan target lists
The first bad commit is a3179ab69.
commit a3179ab692be4314d5ee5cd56598976c487d5ef2
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri Sep 27 16:04:04 2024 -0400
Recalculate where-needed data accurately after a join removal.
However, after digging into it further, it seems to me that the issue
originated in b262ad440, and the changes in a3179ab69 made it easier
to surface.
commit b262ad440edecda0b1aba81d967ab560a83acb8a
Author: David Rowley <drowley@postgresql.org>
Date: Tue Jan 23 18:09:18 2024 +1300
Add better handling of redundant IS [NOT] NULL quals
When we generate multiple clones of the same qual condition to cope
with outer join identity 3, we need to ensure that all the clones get
the same serial number. To achieve this, we reset the
root->last_rinfo_serial counter each time we produce RestrictInfo(s)
from the qual (see deconstruct_distribute_oj_quals). This approach
works only if we ensure that we are not changing the qual list in
any way that'd affect the number of RestrictInfos built from it.
However, with b262ad440, an IS NULL qual on a NOT NULL column might
result in an additional constant-FALSE RestrictInfo. This can
unexpectedly increase root->last_rinfo_serial, causing inconsistent
RestrictInfo serial numbers across multiple clones of the same qual,
which can confuse users of 'rinfo_serial', such as
rebuild_joinclause_attr_needed, and lead to planner errors.
In the query above, the has_clone version of qual 't1.a is null' would
be reduced to constant-FALSE, while the is_clone version would not.
This results in differing serial numbers for subsequent quals (such as
qual 't1.b = 1') across different versions.
To fix, I think we can reset the root->last_rinfo_serial counter after
generating the additional constant-FALSE RestrictInfo. Please see
attached.
Thanks
Richard
Attachments:
v1-0001-Fix-inconsistent-RestrictInfo-serial-numbers.patchapplication/octet-stream; name=v1-0001-Fix-inconsistent-RestrictInfo-serial-numbers.patchDownload
From dbf9093c642a42d9831cad0d8914bb096750c625 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 8 Oct 2024 18:59:03 +0900
Subject: [PATCH v1] Fix inconsistent RestrictInfo serial numbers
When we generate multiple clones of the same qual condition to cope
with outer join identity 3, we need to ensure that all the clones get
the same serial number. To achieve this, we reset the
root->last_rinfo_serial counter each time we produce RestrictInfo(s)
from the qual (see deconstruct_distribute_oj_quals). This approach
works only if we ensure that we are not changing the qual list in
any way that'd affect the number of RestrictInfos built from it.
However, with b262ad440, an IS NULL qual on a NOT NULL column might
result in an additional constant-FALSE RestrictInfo. This can
unexpectedly increase root->last_rinfo_serial, causing inconsistent
RestrictInfo serial numbers across multiple clones of the same qual,
which can confuse users of these serial numbers, such as
rebuild_joinclause_attr_needed, and lead to planner errors.
To fix, reset the root->last_rinfo_serial counter after generating the
additional constant-FALSE RestrictInfo.
---
src/backend/optimizer/plan/initsplan.c | 10 ++++++++-
src/backend/optimizer/util/joininfo.c | 12 +++++++++--
src/test/regress/expected/predicate.out | 28 +++++++++++++++++++++++++
src/test/regress/sql/predicate.sql | 15 +++++++++++++
4 files changed, 62 insertions(+), 3 deletions(-)
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index c5bc0f51e9..903c397d40 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -2767,11 +2767,18 @@ add_base_clause_to_rel(PlannerInfo *root, Index relid,
/*
* Substitute the origin qual with constant-FALSE if it is provably
- * always false. Note that we keep the same rinfo_serial.
+ * always false.
+ *
+ * Note that we need to keep the same rinfo_serial, since it is in
+ * practice the same condition. We also need to reset the
+ * last_rinfo_serial counter, which is essential to ensure that the
+ * RestrictInfos for the "same" qual condition get identical serial
+ * numbers (see deconstruct_distribute_oj_quals).
*/
if (restriction_is_always_false(root, restrictinfo))
{
int save_rinfo_serial = restrictinfo->rinfo_serial;
+ int save_last_rinfo_serial = root->last_rinfo_serial;
restrictinfo = make_restrictinfo(root,
(Expr *) makeBoolConst(false, false),
@@ -2784,6 +2791,7 @@ add_base_clause_to_rel(PlannerInfo *root, Index relid,
restrictinfo->incompatible_relids,
restrictinfo->outer_relids);
restrictinfo->rinfo_serial = save_rinfo_serial;
+ root->last_rinfo_serial = save_last_rinfo_serial;
}
}
diff --git a/src/backend/optimizer/util/joininfo.c b/src/backend/optimizer/util/joininfo.c
index 5fb0c17630..65993bd659 100644
--- a/src/backend/optimizer/util/joininfo.c
+++ b/src/backend/optimizer/util/joininfo.c
@@ -106,12 +106,19 @@ add_join_clause_to_rels(PlannerInfo *root,
return;
/*
- * Substitute constant-FALSE for the origin qual if it is always false.
- * Note that we keep the same rinfo_serial.
+ * Substitute the origin qual with constant-FALSE if it is provably always
+ * false.
+ *
+ * Note that we need to keep the same rinfo_serial, since it is in
+ * practice the same condition. We also need to reset the
+ * last_rinfo_serial counter, which is essential to ensure that the
+ * RestrictInfos for the "same" qual condition get identical serial
+ * numbers (see deconstruct_distribute_oj_quals).
*/
if (restriction_is_always_false(root, restrictinfo))
{
int save_rinfo_serial = restrictinfo->rinfo_serial;
+ int save_last_rinfo_serial = root->last_rinfo_serial;
restrictinfo = make_restrictinfo(root,
(Expr *) makeBoolConst(false, false),
@@ -124,6 +131,7 @@ add_join_clause_to_rels(PlannerInfo *root,
restrictinfo->incompatible_relids,
restrictinfo->outer_relids);
restrictinfo->rinfo_serial = save_rinfo_serial;
+ root->last_rinfo_serial = save_last_rinfo_serial;
}
cur_relid = -1;
diff --git a/src/test/regress/expected/predicate.out b/src/test/regress/expected/predicate.out
index 6f1cc0d54c..965a3a7616 100644
--- a/src/test/regress/expected/predicate.out
+++ b/src/test/regress/expected/predicate.out
@@ -290,3 +290,31 @@ SELECT * FROM pred_parent WHERE a IS NULL;
(2 rows)
DROP TABLE pred_parent, pred_child;
+-- Validate the additional constant-FALSE qual does not cause inconsistent
+-- RestrictInfo serial numbers
+CREATE TABLE pred_tab (a int PRIMARY KEY, b int);
+INSERT INTO pred_tab SELECT i, i FROM generate_series(1, 10)i;
+ANALYZE pred_tab;
+EXPLAIN (COSTS OFF)
+SELECT 1 FROM pred_tab t1
+ LEFT JOIN
+ (pred_tab t2 LEFT JOIN pred_tab t3 ON t2.a = t3.a) ON TRUE
+ LEFT JOIN pred_tab t4 ON t1.a IS NULL AND t1.b = 1
+ RIGHT JOIN pred_tab t5 ON t1.b = t5.b;
+ QUERY PLAN
+---------------------------------------------------
+ Hash Right Join
+ Hash Cond: (t1.b = t5.b)
+ -> Nested Loop Left Join
+ -> Nested Loop Left Join
+ Join Filter: (false AND (t1.b = 1))
+ -> Seq Scan on pred_tab t1
+ -> Result
+ One-Time Filter: false
+ -> Materialize
+ -> Seq Scan on pred_tab t2
+ -> Hash
+ -> Seq Scan on pred_tab t5
+(12 rows)
+
+DROP TABLE pred_tab;
diff --git a/src/test/regress/sql/predicate.sql b/src/test/regress/sql/predicate.sql
index 63f6a7786f..661013ff7e 100644
--- a/src/test/regress/sql/predicate.sql
+++ b/src/test/regress/sql/predicate.sql
@@ -147,3 +147,18 @@ EXPLAIN (COSTS OFF)
SELECT * FROM pred_parent WHERE a IS NULL;
DROP TABLE pred_parent, pred_child;
+
+-- Validate the additional constant-FALSE qual does not cause inconsistent
+-- RestrictInfo serial numbers
+CREATE TABLE pred_tab (a int PRIMARY KEY, b int);
+INSERT INTO pred_tab SELECT i, i FROM generate_series(1, 10)i;
+ANALYZE pred_tab;
+
+EXPLAIN (COSTS OFF)
+SELECT 1 FROM pred_tab t1
+ LEFT JOIN
+ (pred_tab t2 LEFT JOIN pred_tab t3 ON t2.a = t3.a) ON TRUE
+ LEFT JOIN pred_tab t4 ON t1.a IS NULL AND t1.b = 1
+ RIGHT JOIN pred_tab t5 ON t1.b = t5.b;
+
+DROP TABLE pred_tab;
--
2.43.0
On Tue, Oct 8, 2024 at 4:50 PM Richard Guo <guofenglinux@gmail.com> wrote:
I ran into an "ERROR: variable not found in subplan target lists"
error, which can be reproduced with the following query.create table t (a int primary key, b int);
insert into t select i, i from generate_series(1, 10)i;
analyze t;explain (costs off)
select 1 from t t1
left join
(t t2 left join t t3 on t2.a = t3.a) on true
left join t t4 on t1.a is null and t1.b = 1
right join t t5 on t1.b = t5.b;
ERROR: variable not found in subplan target listsThe first bad commit is a3179ab69.
commit a3179ab692be4314d5ee5cd56598976c487d5ef2
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri Sep 27 16:04:04 2024 -0400Recalculate where-needed data accurately after a join removal.
However, after digging into it further, it seems to me that the issue
originated in b262ad440, and the changes in a3179ab69 made it easier
to surface.commit b262ad440edecda0b1aba81d967ab560a83acb8a
Author: David Rowley <drowley@postgresql.org>
Date: Tue Jan 23 18:09:18 2024 +1300Add better handling of redundant IS [NOT] NULL quals
When we generate multiple clones of the same qual condition to cope
with outer join identity 3, we need to ensure that all the clones get
the same serial number. To achieve this, we reset the
root->last_rinfo_serial counter each time we produce RestrictInfo(s)
from the qual (see deconstruct_distribute_oj_quals). This approach
works only if we ensure that we are not changing the qual list in
any way that'd affect the number of RestrictInfos built from it.However, with b262ad440, an IS NULL qual on a NOT NULL column might
result in an additional constant-FALSE RestrictInfo. This can
unexpectedly increase root->last_rinfo_serial, causing inconsistent
RestrictInfo serial numbers across multiple clones of the same qual,
which can confuse users of 'rinfo_serial', such as
rebuild_joinclause_attr_needed, and lead to planner errors.In the query above, the has_clone version of qual 't1.a is null' would
be reduced to constant-FALSE, while the is_clone version would not.
This results in differing serial numbers for subsequent quals (such as
qual 't1.b = 1') across different versions.To fix, I think we can reset the root->last_rinfo_serial counter after
generating the additional constant-FALSE RestrictInfo. Please see
attached.
A naive question: Why don't we just reuse the same RestrictInfo
instead of creating a new one. I understand that the original
RestrictInfo was passed from somewhere else to add it to a given
relation. But I don't see any relation specific information being
considered while deciding whether the clause is constant false. So may
be we should do this processing elsewhere and replace the original
clause itself?
--
Best Wishes,
Ashutosh Bapat
On Tue, Oct 8, 2024 at 9:02 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
But I don't see any relation specific information being
considered while deciding whether the clause is constant false.
Isn't rel->notnullattnums relation specific information? We need this
information to decide if a Var cannot be NULL.
So may
be we should do this processing elsewhere and replace the original
clause itself?
I’m not sure about this. Different versions of the same qual clause
can lead to different conclusions about whether it can be reduced to
constant-FALSE. I don't think it is possible to replace the original
clause; we need to do this processing on a version-by-version basis.
Thanks
Richard
On 10/8/24 18:20, Richard Guo wrote:
To fix, I think we can reset the root->last_rinfo_serial counter after
generating the additional constant-FALSE RestrictInfo. Please see
attached.
Thanks for the job!
The approach will work, no doubt. But why are you using such a wordy
approach with save_last_rinfo_serial? I'd say it is better to invent a
function like make_rinfo_variant() or something like that and explicitly
set rinfo_serial. It will be easier to read and understand in the
future, won't it?
--
regards, Andrei Lepikhov
On Wed, Oct 9, 2024 at 11:15 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 10/8/24 18:20, Richard Guo wrote:
To fix, I think we can reset the root->last_rinfo_serial counter after
generating the additional constant-FALSE RestrictInfo. Please see
attached.
The approach will work, no doubt. But why are you using such a wordy
approach with save_last_rinfo_serial? I'd say it is better to invent a
function like make_rinfo_variant() or something like that and explicitly
set rinfo_serial. It will be easier to read and understand in the
future, won't it?
Maybe, but I don't think it's worth creating a new function for such a
straightforward operation, nor do I think it's worth changing the
internals of make_restrictinfo() regarding how root->last_rinfo_serial
is incremented, as that would require updating all the callers of this
function.
Thanks
Richard
On Wed, Oct 9, 2024 at 7:45 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Tue, Oct 8, 2024 at 9:02 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:But I don't see any relation specific information being
considered while deciding whether the clause is constant false.Isn't rel->notnullattnums relation specific information? We need this
information to decide if a Var cannot be NULL.
I am talking about restriction_is_always_false() being called from
add_base_clause_to_rel(). The later doesn't pass the given relid or
the corresponding rel to the first. So restriction_is_always_false()
is independent of relid passed to add_base_clause_to_rel() and thus
restriction_is_always_false() being called from outside (say caller of
add_base_clause_to_rel()) will have same result for the same
RestrictInfo.
So may
be we should do this processing elsewhere and replace the original
clause itself?I’m not sure about this. Different versions of the same qual clause
can lead to different conclusions about whether it can be reduced to
constant-FALSE.
Can you please provide an example?
I don't think it is possible to replace the original
clause; we need to do this processing on a version-by-version basis.
Either way, each version will be represented by only one RestrictInfo.
Why can't we replace that version instead of creating a new
RestrictInfo?
--
Best Wishes,
Ashutosh Bapat
On Wed, Oct 9, 2024 at 9:30 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
On Wed, Oct 9, 2024 at 7:45 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Tue, Oct 8, 2024 at 9:02 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:But I don't see any relation specific information being
considered while deciding whether the clause is constant false.Isn't rel->notnullattnums relation specific information? We need this
information to decide if a Var cannot be NULL.I am talking about restriction_is_always_false() being called from
add_base_clause_to_rel(). The later doesn't pass the given relid or
the corresponding rel to the first. So restriction_is_always_false()
is independent of relid passed to add_base_clause_to_rel() and thus
restriction_is_always_false() being called from outside (say caller of
add_base_clause_to_rel()) will have same result for the same
RestrictInfo.
I see what you meant now. You're right that the result of
restriction_is_always_true/false is independent of the relid.
However, I still don't think it's a good idea to move it elsewhere
because: 1) we need the relid to determine whether the relation is an
inheritance parent table, as we must avoid this type of simplification
for those tables; 2) I recall there was a discussion during the work
on commit b262ad440 about potentially extending the simplification
logic in the future to consider the relation's constraints.
So may
be we should do this processing elsewhere and replace the original
clause itself?I’m not sure about this. Different versions of the same qual clause
can lead to different conclusions about whether it can be reduced to
constant-FALSE.Can you please provide an example?
I think the query I provided in my initial email serves as an example.
To quote what I said there:
"
In the query above, the has_clone version of qual 't1.a is null' would
be reduced to constant-FALSE, while the is_clone version would not.
"
I don't think it is possible to replace the original
clause; we need to do this processing on a version-by-version basis.Either way, each version will be represented by only one RestrictInfo.
Why can't we replace that version instead of creating a new
RestrictInfo?
Sure we can do that. However, I find that manually altering a
well-formed RestrictInfo's clause (along with its left_relids,
right_relids, clause_relids, num_base_rels, etc.) seems too
hacky to me.
Thanks
Richard
On Thu, Oct 10, 2024 at 7:37 AM Richard Guo <guofenglinux@gmail.com> wrote:
So may
be we should do this processing elsewhere and replace the original
clause itself?I’m not sure about this. Different versions of the same qual clause
can lead to different conclusions about whether it can be reduced to
constant-FALSE.Can you please provide an example?
I think the query I provided in my initial email serves as an example.
To quote what I said there:"
In the query above, the has_clone version of qual 't1.a is null' would
be reduced to constant-FALSE, while the is_clone version would not.
"I don't think it is possible to replace the original
clause; we need to do this processing on a version-by-version basis.Either way, each version will be represented by only one RestrictInfo.
Why can't we replace that version instead of creating a new
RestrictInfo?Sure we can do that. However, I find that manually altering a
well-formed RestrictInfo's clause (along with its left_relids,
right_relids, clause_relids, num_base_rels, etc.) seems too
hacky to me.
That's fair.
1. What strikes me as odd in your patch is that the last_rinfo_serial
is reset so far away from the new clause that will be created with the
next last_rinfo_serial OR the clause whose clones are being created.
It either indicates another site where we are missing (possibly
future) to restore rinfo_serial in a clone OR reset of
last_serial_rinfo needs to happen somewhere else. Why isn't resetting
last_rinfo_serial in deconstruct_distribute_oj_quals() enough?
2. This change would also prevent add_base_clause_to_rel() and
add_join_clause_to_rels() from being used anywhere outside the context
of deconstruct_distribute_oj_quals() since these functions would reset
last_rinfo_serial when it's expected to be incremented. Moreover
another make_restrictinfo() call somewhere in the minions of
distribute_qual_to_rels() or distribute_quals_to_rels() would cause a
similar bug.
3. Do we need to reset last_serial_rinfo everywhere we reset
rinfo_serial e.g. create_join_clause()?
I suspect that b262ad440edecda0b1aba81d967ab560a83acb8a didn't do
anything wrong. It's effectively copying rinfo_serial of original
clause into the derived clause. I would have liked it better if it
would have used the same method as create_join_clause(). Some other
commti failed to notice the some minion of
distribute_quals_to_rels()->distribute_qual_to_rels() may increment
last_rinfo_serial while creating an alternate RestrictInfo for the
qual passed to distribute_qual_to_rels(). I think a robust fix is to
reset last_rinfo_serial in distribute_quals_to_rels() for every qual
and add an Assert(root->last_rinfo_serial == saved_last_rinfo_serial +
list_length(quals)) before resetting last_rinfo_serial in
deconstruct_distribute_oj_quals() to catch any future digressions.
--
Best Wishes,
Ashutosh Bapat
On Fri, Oct 11, 2024 at 3:02 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
1. What strikes me as odd in your patch is that the last_rinfo_serial
is reset so far away from the new clause that will be created with the
next last_rinfo_serial OR the clause whose clones are being created.
It either indicates another site where we are missing (possibly
future) to restore rinfo_serial in a clone OR reset of
last_serial_rinfo needs to happen somewhere else. Why isn't resetting
last_rinfo_serial in deconstruct_distribute_oj_quals() enough?
Well, the resetting of the last_rinfo_serial counter in
deconstruct_distribute_oj_quals is to ensure that multiple clones of
the same qual receive the same serial number. As the comment there
explains, it works only if we ensure that we are not changing the qual
list in any way that'd affect the number of RestrictInfos built from
it.
However, in b262ad440, we did not get this memo promptly. As I
explained earlier, the has_clone version of qual 't1.a is null'
would be reduced to constant-FALSE, while the is_clone version would
not. That is to say, for the has_clone version, we'd build three
RestrictInfos, whereas for the is_clone version, we'd build only two.
This discrepancy in numbers is the root cause of the issue.
So, resetting last_rinfo_serial in deconstruct_distribute_oj_quals()
is not enough.
2. This change would also prevent add_base_clause_to_rel() and
add_join_clause_to_rels() from being used anywhere outside the context
of deconstruct_distribute_oj_quals() since these functions would reset
last_rinfo_serial when it's expected to be incremented. Moreover
another make_restrictinfo() call somewhere in the minions of
distribute_qual_to_rels() or distribute_quals_to_rels() would cause a
similar bug.
I don't see how this change would prevent add_base_clause_to_rel and
add_join_clause_to_rels from being used in other context than
deconstruct_distribute_oj_quals. Could you please provide an example?
3. Do we need to reset last_serial_rinfo everywhere we reset
rinfo_serial e.g. create_join_clause()?
Hmm, I don't think that's necessary. As long as we'd build the same
number of RestrictInfos from the qual list for different versions,
we're good.
I suspect that b262ad440edecda0b1aba81d967ab560a83acb8a didn't do
anything wrong. It's effectively copying rinfo_serial of original
clause into the derived clause. I would have liked it better if it
would have used the same method as create_join_clause(). Some other
commti failed to notice the some minion of
distribute_quals_to_rels()->distribute_qual_to_rels() may increment
last_rinfo_serial while creating an alternate RestrictInfo for the
qual passed to distribute_qual_to_rels(). I think a robust fix is to
reset last_rinfo_serial in distribute_quals_to_rels() for every qual
and add an Assert(root->last_rinfo_serial == saved_last_rinfo_serial +
list_length(quals)) before resetting last_rinfo_serial in
deconstruct_distribute_oj_quals() to catch any future digressions.
Would you mind proposing a patch for this method so we can discuss the
details?
Thanks
Richard
On Tue, Oct 8, 2024 at 8:20 PM Richard Guo <guofenglinux@gmail.com> wrote:
I ran into an "ERROR: variable not found in subplan target lists"
error, which can be reproduced with the following query.
To fix, I think we can reset the root->last_rinfo_serial counter after
generating the additional constant-FALSE RestrictInfo. Please see
attached.
I intend to push this patch shortly, barring any objections, so that
it can catch up with the next minor release (Nov. 14).
Thanks
Richard
On Wed, Nov 6, 2024 at 4:10 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Tue, Oct 8, 2024 at 8:20 PM Richard Guo <guofenglinux@gmail.com> wrote:
I ran into an "ERROR: variable not found in subplan target lists"
error, which can be reproduced with the following query.To fix, I think we can reset the root->last_rinfo_serial counter after
generating the additional constant-FALSE RestrictInfo. Please see
attached.I intend to push this patch shortly, barring any objections, so that
it can catch up with the next minor release (Nov. 14).
I didn't get time to respond to your email and create a patch based on
my proposal. Sorry. I may not get time for another week or so. Please
feel free to push the patch, if you feel that it's in a committable
state.
--
Best Wishes,
Ashutosh Bapat
On Wed, Nov 6, 2024 at 11:06 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
On Wed, Nov 6, 2024 at 4:10 PM Richard Guo <guofenglinux@gmail.com> wrote:
I intend to push this patch shortly, barring any objections, so that
it can catch up with the next minor release (Nov. 14).I didn't get time to respond to your email and create a patch based on
my proposal. Sorry. I may not get time for another week or so. Please
feel free to push the patch, if you feel that it's in a committable
state.
Pushed with some tweaks to the commit message.
Thanks
Richard