BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables
The following bug has been logged on the website:
Bug reference: 19435
Logged by: Hang ammmkilo
Email address: ammmkilo@163.com
PostgreSQL version: 18.3
Operating system: Ubuntu 22.04
Description:
A user encountered an error when attempting to execute a query involving
multiple RIGHT JOIN operations and a NATURAL JOIN on the same table
(pg_table_a). The error message returned was:
[XX000]ERROR: no relation entry for relid 2
This error seems to be an internal one and should not be triggered by users.
It might be a bug.
```sql
DROP TABLE IF EXISTS pg_table_a;
CREATE TABLE pg_table_a (
id INTEGER PRIMARY KEY,
col_bool BOOLEAN
);
INSERT INTO pg_table_a (id, col_bool)
VALUES (5, TRUE);
SELECT 1 AS c1
FROM (
pg_table_a AS tom0
RIGHT JOIN (
(pg_table_a AS tom1 NATURAL JOIN pg_table_a AS tom2)
RIGHT JOIN pg_table_a AS tom3
ON tom1.col_bool IS NOT NULL
)
ON tom1.col_bool
);
```
On Tue, Mar 17, 2026 at 7:34 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 19435
Logged by: Hang ammmkilo
Email address: ammmkilo@163.com
PostgreSQL version: 18.3
Operating system: Ubuntu 22.04
Description:A user encountered an error when attempting to execute a query involving
multiple RIGHT JOIN operations and a NATURAL JOIN on the same table
(pg_table_a). The error message returned was:
[XX000]ERROR: no relation entry for relid 2
This error seems to be an internal one and should not be triggered by users.
It might be a bug.
```sql
DROP TABLE IF EXISTS pg_table_a;CREATE TABLE pg_table_a (
id INTEGER PRIMARY KEY,
col_bool BOOLEAN
);INSERT INTO pg_table_a (id, col_bool)
VALUES (5, TRUE);SELECT 1 AS c1
FROM (
pg_table_a AS tom0
RIGHT JOIN (
(pg_table_a AS tom1 NATURAL JOIN pg_table_a AS tom2)
RIGHT JOIN pg_table_a AS tom3
ON tom1.col_bool IS NOT NULL
)
ON tom1.col_bool
);
```
Thanks for the report!
I was able to reproduce this issue on the master. git bisect that I ran pointed
to commit fc069a3a631 as the likely cause. So I've CC'd its committer,
Alexander, on this thread.
Regards,
--
Fujii Masao
On Tue, Mar 17, 2026 at 2:14 PM Fujii Masao <masao.fujii@gmail.com> wrote:
On Tue, Mar 17, 2026 at 7:34 PM PG Bug reporting form
<noreply@postgresql.org> wrote:The following bug has been logged on the website:
Bug reference: 19435
Logged by: Hang ammmkilo
Email address: ammmkilo@163.com
PostgreSQL version: 18.3
Operating system: Ubuntu 22.04
Description:A user encountered an error when attempting to execute a query involving
multiple RIGHT JOIN operations and a NATURAL JOIN on the same table
(pg_table_a). The error message returned was:
[XX000]ERROR: no relation entry for relid 2
This error seems to be an internal one and should not be triggered by users.
It might be a bug.
```sql
DROP TABLE IF EXISTS pg_table_a;CREATE TABLE pg_table_a (
id INTEGER PRIMARY KEY,
col_bool BOOLEAN
);INSERT INTO pg_table_a (id, col_bool)
VALUES (5, TRUE);SELECT 1 AS c1
FROM (
pg_table_a AS tom0
RIGHT JOIN (
(pg_table_a AS tom1 NATURAL JOIN pg_table_a AS tom2)
RIGHT JOIN pg_table_a AS tom3
ON tom1.col_bool IS NOT NULL
)
ON tom1.col_bool
);
```Thanks for the report!
I was able to reproduce this issue on the master. git bisect that I ran pointed
to commit fc069a3a631 as the likely cause. So I've CC'd its committer,
Alexander, on this thread.
Thank you for adding me to the thread. I'm lookin at this.
------
Regards,
Alexander Korotkov
Supabase
On Tue, 17 Mar 2026 at 17:15, Fujii Masao <masao.fujii@gmail.com> wrote:
On Tue, Mar 17, 2026 at 7:34 PM PG Bug reporting form
<noreply@postgresql.org> wrote:The following bug has been logged on the website:
Bug reference: 19435
Logged by: Hang ammmkilo
Email address: ammmkilo@163.com
PostgreSQL version: 18.3
Operating system: Ubuntu 22.04
Description:A user encountered an error when attempting to execute a query involving
multiple RIGHT JOIN operations and a NATURAL JOIN on the same table
(pg_table_a). The error message returned was:
[XX000]ERROR: no relation entry for relid 2
This error seems to be an internal one and should not be triggered by users.
It might be a bug.
```sql
DROP TABLE IF EXISTS pg_table_a;CREATE TABLE pg_table_a (
id INTEGER PRIMARY KEY,
col_bool BOOLEAN
);INSERT INTO pg_table_a (id, col_bool)
VALUES (5, TRUE);SELECT 1 AS c1
FROM (
pg_table_a AS tom0
RIGHT JOIN (
(pg_table_a AS tom1 NATURAL JOIN pg_table_a AS tom2)
RIGHT JOIN pg_table_a AS tom3
ON tom1.col_bool IS NOT NULL
)
ON tom1.col_bool
);
```Thanks for the report!
I was able to reproduce this issue on the master. git bisect that I ran pointed
to commit fc069a3a631 as the likely cause. So I've CC'd its committer,
Alexander, on this thread.Regards,
--
Fujii Masao
My git bisect shows the same commit
also, after "set enable_self_join_elimination to false;" query executes ok
--
Best regards,
Kirill Reshke
Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月17日周二 20:26写道:
On Tue, Mar 17, 2026 at 2:14 PM Fujii Masao <masao.fujii@gmail.com> wrote:
On Tue, Mar 17, 2026 at 7:34 PM PG Bug reporting form
<noreply@postgresql.org> wrote:The following bug has been logged on the website:
Bug reference: 19435
Logged by: Hang ammmkilo
Email address: ammmkilo@163.com
PostgreSQL version: 18.3
Operating system: Ubuntu 22.04
Description:A user encountered an error when attempting to execute a query involving
multiple RIGHT JOIN operations and a NATURAL JOIN on the same table
(pg_table_a). The error message returned was:
[XX000]ERROR: no relation entry for relid 2
This error seems to be an internal one and should not be triggered by users.
It might be a bug.
```sql
DROP TABLE IF EXISTS pg_table_a;CREATE TABLE pg_table_a (
id INTEGER PRIMARY KEY,
col_bool BOOLEAN
);INSERT INTO pg_table_a (id, col_bool)
VALUES (5, TRUE);SELECT 1 AS c1
FROM (
pg_table_a AS tom0
RIGHT JOIN (
(pg_table_a AS tom1 NATURAL JOIN pg_table_a AS tom2)
RIGHT JOIN pg_table_a AS tom3
ON tom1.col_bool IS NOT NULL
)
ON tom1.col_bool
);
```Thanks for the report!
I was able to reproduce this issue on the master. git bisect that I ran pointed
to commit fc069a3a631 as the likely cause. So I've CC'd its committer,
Alexander, on this thread.Thank you for adding me to the thread. I'm lookin at this.
The error was reported in rebuild_joinclause_attr_needed() when
processing Relid = 1(rtindex =1),
When processing its joininfo" ON tom1.col_bool IS NOT NULL",
(gdb) pgprint rinfo->clause
Var [varno=2 varattno=2 vartype=16
varreturningtype=VAR_RETURNING_DEFAULT varnosyn=2 varattnosyn=2]
The varno=2, rtindex=2(tom1) has been removed. In
add_vars_to_attr_needed(), to find the base_rel, but the
root->simple_rel_array[2] is NULL.
So the error is reporting.
It seems the joininfo should be replaced by rtindex = 3, because the
rtindex=2 would be removed.
--
Thanks,
Tender Wang
Tender Wang <tndrwang@gmail.com> 于2026年3月17日周二 20:59写道:
The error was reported in rebuild_joinclause_attr_needed() when
processing Relid = 1(rtindex =1),
When processing its joininfo" ON tom1.col_bool IS NOT NULL",
(gdb) pgprint rinfo->clause
Var [varno=2 varattno=2 vartype=16
varreturningtype=VAR_RETURNING_DEFAULT varnosyn=2 varattnosyn=2]The varno=2, rtindex=2(tom1) has been removed. In
add_vars_to_attr_needed(), to find the base_rel, but the
root->simple_rel_array[2] is NULL.
So the error is reporting.
It seems the joininfo should be replaced by rtindex = 3, because the
rtindex=2 would be removed.
--
(gdb) pgprint rinfo
RestrictInfo [is_pushed_down=false can_join=false pseudoconstant=false
has_clone=true is_clone=false leakproof=false
has_volatile=VOLATILITY_UNKNOWN security_level=0
num_base_rels=1 rinfo_serial=4 eval_cost={startup = -1,
per_tuple = 0} norm_selec=-1 outer_selec=-1 outer_is_left=false
hashjoinoperator=0 left_bucketsize=-1
right_bucketsize=-1 left_mcvfreq=-1 right_mcvfreq=-1
left_hasheqoperator=0 right_hasheqoperator=0]
[clause] Var [varno=2 varattno=2 vartype=16
varreturningtype=VAR_RETURNING_DEFAULT varnosyn=2 varattnosyn=2]
[clause_relids] Bitmapset [3]
[required_relids] Bitmapset [3 1]
[incompatible_relids] Bitmapset [7 6]
[outer_relids] Bitmapset [6 5 3]
The above is the joininfo of the rtindex=1(tom0), we can see that the
required_relids is changed to [3 1], but the clause is still
rtindex=2(varno=2).
I guess the current logic in remove_self_join_rel() may forget to
process the rinfo->clause.
--
Thanks,
Tender Wang
On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
Tender Wang <tndrwang@gmail.com> 于2026年3月17日周二 20:59写道:
The error was reported in rebuild_joinclause_attr_needed() when
processing Relid = 1(rtindex =1),
When processing its joininfo" ON tom1.col_bool IS NOT NULL",
(gdb) pgprint rinfo->clause
Var [varno=2 varattno=2 vartype=16
varreturningtype=VAR_RETURNING_DEFAULT varnosyn=2 varattnosyn=2]The varno=2, rtindex=2(tom1) has been removed. In
add_vars_to_attr_needed(), to find the base_rel, but the
root->simple_rel_array[2] is NULL.
So the error is reporting.
It seems the joininfo should be replaced by rtindex = 3, because the
rtindex=2 would be removed.
--(gdb) pgprint rinfo
RestrictInfo [is_pushed_down=false can_join=false pseudoconstant=false
has_clone=true is_clone=false leakproof=false
has_volatile=VOLATILITY_UNKNOWN security_level=0
num_base_rels=1 rinfo_serial=4 eval_cost={startup = -1,
per_tuple = 0} norm_selec=-1 outer_selec=-1 outer_is_left=false
hashjoinoperator=0 left_bucketsize=-1
right_bucketsize=-1 left_mcvfreq=-1 right_mcvfreq=-1
left_hasheqoperator=0 right_hasheqoperator=0]
[clause] Var [varno=2 varattno=2 vartype=16
varreturningtype=VAR_RETURNING_DEFAULT varnosyn=2 varattnosyn=2]
[clause_relids] Bitmapset [3]
[required_relids] Bitmapset [3 1]
[incompatible_relids] Bitmapset [7 6]
[outer_relids] Bitmapset [6 5 3]The above is the joininfo of the rtindex=1(tom0), we can see that the
required_relids is changed to [3 1], but the clause is still
rtindex=2(varno=2).
I guess the current logic in remove_self_join_rel() may forget to
process the rinfo->clause.
Yes, it looks like your analysis is valid. Will you share a patch for
updating `clause` ?
--
Best regards,
Kirill Reshke
Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道:
On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
Yes, it looks like your analysis is valid. Will you share a patch for
updating `clause` ?
Sorry, it's already the middle of the night here, so I'm afraid I
don't have much time to work on this right now.
--
Thanks,
Tender Wang
On Tue, Mar 17, 2026 at 3:30 PM Tender Wang <tndrwang@gmail.com> wrote:
Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道:
On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
Yes, it looks like your analysis is valid. Will you share a patch for
updating `clause` ?Sorry, it's already the middle of the night here, so I'm afraid I
don't have much time to work on this right now.
Thank you for your research. I've written a simple draft patch. It
fixes the reported case, but I doubt it is correct in general. I'll
continue the investigation.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
sje_joininfo_fix.patchapplication/octet-stream; name=sje_joininfo_fix.patchDownload+6-0
Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 01:46写道:
On Tue, Mar 17, 2026 at 3:30 PM Tender Wang <tndrwang@gmail.com> wrote:
Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道:
On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
Yes, it looks like your analysis is valid. Will you share a patch for
updating `clause` ?Sorry, it's already the middle of the night here, so I'm afraid I
don't have much time to work on this right now.Thank you for your research. I've written a simple draft patch. It
fixes the reported case, but I doubt it is correct in general. I'll
continue the investigation.
The cause of this bug is doing the following statement:
...
ChangeVarNodesWalkExpression((Node *) rinfo->clause, context);
...
in replace_relid_callback().
The rinfo->clause is only Var(tom1.col_bool), sorry I typoed it to "
ON tom1.col_bool IS NOT NULL", in my first reply email.
See expression_tree_walker_impl(), it does nothing if the node is just
a Var node.
So I think whether we can add logic in ChangeVarNodesWalkExpression() as below:
...
if (node && IsA(node, Var))
{
Var *var = (Var *) node;
if (var->varlevelsup == context->sublevels_up)
{
if (var->varno == context->rt_index)
var->varno = context->new_index;
var->varnullingrels = adjust_relid_set(var->varnullingrels,
context->rt_index,
context->new_index);
if (var->varnosyn == context->rt_index)
var->varnosyn = context->new_index;
}
return false;
}
else
return expression_tree_walker(node,
ChangeVarNodes_walker,
(void *) context);
...
I tried the above fix, no error again. But I got a plan like this:
QUERY PLAN
-------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..115164616.71 rows=7458350250 width=4)
-> Seq Scan on pg_table_a tom3 (cost=0.00..34.70 rows=2470 width=0)
-> Materialize (cost=0.00..99509.82 rows=3019575 width=0)
-> Nested Loop Left Join (cost=0.00..75564.95 rows=3019575 width=0)
Join Filter: tom2.col_bool
-> Seq Scan on pg_table_a tom2 (cost=0.00..34.70
rows=2445 width=5)
Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
-> Materialize (cost=0.00..47.05 rows=2470 width=0)
-> Seq Scan on pg_table_a tom0
(cost=0.00..34.70 rows=2470 width=0)
(9 rows)
Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
This makes me unhappy.
Your patch gets the same plan.
--
Thanks,
Tender Wang
Tender Wang <tndrwang@gmail.com> 于2026年3月18日周三 09:12写道:
Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 01:46写道:
On Tue, Mar 17, 2026 at 3:30 PM Tender Wang <tndrwang@gmail.com> wrote:
Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道:
On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
Yes, it looks like your analysis is valid. Will you share a patch for
updating `clause` ?Sorry, it's already the middle of the night here, so I'm afraid I
don't have much time to work on this right now.Thank you for your research. I've written a simple draft patch. It
fixes the reported case, but I doubt it is correct in general. I'll
continue the investigation.I tried the above fix, no error again. But I got a plan like this:
QUERY PLAN
-------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..115164616.71 rows=7458350250 width=4)
-> Seq Scan on pg_table_a tom3 (cost=0.00..34.70 rows=2470 width=0)
-> Materialize (cost=0.00..99509.82 rows=3019575 width=0)
-> Nested Loop Left Join (cost=0.00..75564.95 rows=3019575 width=0)
Join Filter: tom2.col_bool
-> Seq Scan on pg_table_a tom2 (cost=0.00..34.70
rows=2445 width=5)
Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
-> Materialize (cost=0.00..47.05 rows=2470 width=0)
-> Seq Scan on pg_table_a tom0
(cost=0.00..34.70 rows=2470 width=0)
(9 rows)Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
This makes me unhappy.
Your patch gets the same plan.
In replace_relid_callback(), we add NullTest to rinfo, but it is not a
logical equal check by restrict_infos_logically_equal().
I think for baserestrictinfo, we can just use rinfo->clause, no need
to check the equality of RestrictInfo.
I tried this way, the plan looks as follows:
QUERY PLAN
-------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..115776846.35 rows=7498006100 width=4)
-> Seq Scan on pg_table_a tom3 (cost=0.00..34.70 rows=2470 width=0)
-> Materialize (cost=0.00..100038.47 rows=3035630 width=0)
-> Nested Loop Left Join (cost=0.00..75966.32 rows=3035630 width=0)
Join Filter: tom2.col_bool
-> Seq Scan on pg_table_a tom2 (cost=0.00..34.70
rows=2458 width=5)
Filter: (col_bool IS NOT NULL)
-> Materialize (cost=0.00..47.05 rows=2470 width=0)
-> Seq Scan on pg_table_a tom0
(cost=0.00..34.70 rows=2470 width=0)
(9 rows)
No redundant filter anymore.
Please see the attached patch.
--
Thanks,
Tender Wang
Attachments:
0001-Fix-sje-bug.patchapplication/octet-stream; name=0001-Fix-sje-bug.patchDownload+31-7
On Wed, 18 Mar 2026 at 10:44, Tender Wang <tndrwang@gmail.com> wrote:
Tender Wang <tndrwang@gmail.com> 于2026年3月18日周三 09:12写道:
Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 01:46写道:
On Tue, Mar 17, 2026 at 3:30 PM Tender Wang <tndrwang@gmail.com> wrote:
Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道:
On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
Yes, it looks like your analysis is valid. Will you share a patch for
updating `clause` ?Sorry, it's already the middle of the night here, so I'm afraid I
don't have much time to work on this right now.Thank you for your research. I've written a simple draft patch. It
fixes the reported case, but I doubt it is correct in general. I'll
continue the investigation.I tried the above fix, no error again. But I got a plan like this:
QUERY PLAN
-------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..115164616.71 rows=7458350250 width=4)
-> Seq Scan on pg_table_a tom3 (cost=0.00..34.70 rows=2470 width=0)
-> Materialize (cost=0.00..99509.82 rows=3019575 width=0)
-> Nested Loop Left Join (cost=0.00..75564.95 rows=3019575 width=0)
Join Filter: tom2.col_bool
-> Seq Scan on pg_table_a tom2 (cost=0.00..34.70
rows=2445 width=5)
Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
-> Materialize (cost=0.00..47.05 rows=2470 width=0)
-> Seq Scan on pg_table_a tom0
(cost=0.00..34.70 rows=2470 width=0)
(9 rows)Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
This makes me unhappy.
Your patch gets the same plan.In replace_relid_callback(), we add NullTest to rinfo, but it is not a
logical equal check by restrict_infos_logically_equal().
I think for baserestrictinfo, we can just use rinfo->clause, no need
to check the equality of RestrictInfo.I tried this way, the plan looks as follows:
QUERY PLAN
-------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..115776846.35 rows=7498006100 width=4)
-> Seq Scan on pg_table_a tom3 (cost=0.00..34.70 rows=2470 width=0)
-> Materialize (cost=0.00..100038.47 rows=3035630 width=0)
-> Nested Loop Left Join (cost=0.00..75966.32 rows=3035630 width=0)
Join Filter: tom2.col_bool
-> Seq Scan on pg_table_a tom2 (cost=0.00..34.70
rows=2458 width=5)
Filter: (col_bool IS NOT NULL)
-> Materialize (cost=0.00..47.05 rows=2470 width=0)
-> Seq Scan on pg_table_a tom0
(cost=0.00..34.70 rows=2470 width=0)
(9 rows)No redundant filter anymore.
Please see the attached patch.
--
Thanks,
Tender Wang
Hi!
Your patch looks solid. ChangeVarNodesWalkExpression looks like a
generic rewriter utility function, so I was wondering why we never got
complaints about bugs related it. But this functions is reachable only
when SJE optimisation is allowed, so looks like this explains the
issue.
--
Best regards,
Kirill Reshke
Hi, Tender!
On Wed, Mar 18, 2026 at 7:44 AM Tender Wang <tndrwang@gmail.com> wrote:
Tender Wang <tndrwang@gmail.com> 于2026年3月18日周三 09:12写道:
Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 01:46写道:
On Tue, Mar 17, 2026 at 3:30 PM Tender Wang <tndrwang@gmail.com> wrote:
Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道:
On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
Yes, it looks like your analysis is valid. Will you share a patch for
updating `clause` ?Sorry, it's already the middle of the night here, so I'm afraid I
don't have much time to work on this right now.Thank you for your research. I've written a simple draft patch. It
fixes the reported case, but I doubt it is correct in general. I'll
continue the investigation.I tried the above fix, no error again. But I got a plan like this:
QUERY PLAN
-------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..115164616.71 rows=7458350250 width=4)
-> Seq Scan on pg_table_a tom3 (cost=0.00..34.70 rows=2470 width=0)
-> Materialize (cost=0.00..99509.82 rows=3019575 width=0)
-> Nested Loop Left Join (cost=0.00..75564.95 rows=3019575 width=0)
Join Filter: tom2.col_bool
-> Seq Scan on pg_table_a tom2 (cost=0.00..34.70
rows=2445 width=5)
Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
-> Materialize (cost=0.00..47.05 rows=2470 width=0)
-> Seq Scan on pg_table_a tom0
(cost=0.00..34.70 rows=2470 width=0)
(9 rows)Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
This makes me unhappy.
Your patch gets the same plan.In replace_relid_callback(), we add NullTest to rinfo, but it is not a
logical equal check by restrict_infos_logically_equal().
I think for baserestrictinfo, we can just use rinfo->clause, no need
to check the equality of RestrictInfo.I tried this way, the plan looks as follows:
QUERY PLAN
-------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..115776846.35 rows=7498006100 width=4)
-> Seq Scan on pg_table_a tom3 (cost=0.00..34.70 rows=2470 width=0)
-> Materialize (cost=0.00..100038.47 rows=3035630 width=0)
-> Nested Loop Left Join (cost=0.00..75966.32 rows=3035630 width=0)
Join Filter: tom2.col_bool
-> Seq Scan on pg_table_a tom2 (cost=0.00..34.70
rows=2458 width=5)
Filter: (col_bool IS NOT NULL)
-> Materialize (cost=0.00..47.05 rows=2470 width=0)
-> Seq Scan on pg_table_a tom0
(cost=0.00..34.70 rows=2470 width=0)
(9 rows)No redundant filter anymore.
Please see the attached patch.
What about being more generic and call ChangeVarNodes_walker() for the
node in ChangeVarNodesWalkExpression()? It also works with out case
and avoids code duplication.
Changes in restrict_infos_logically_equal() makes me a bit uneasy. I
see, restictinfo's are different by their outer_relids. Why
outer_relids doesn't matter when required_relids is singleton? More
general when do outer_relids matter for add_non_redundant_clauses() if
we're putting restictinfo's into a single list anyway?
------
Regards,
Alexander Korotkov
Supabase
Attachments:
0001-Fix-sje-bug-v2.patchapplication/octet-stream; name=0001-Fix-sje-bug-v2.patchDownload+14-3
Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 16:40写道:
What about being more generic and call ChangeVarNodes_walker() for the
node in ChangeVarNodesWalkExpression()? It also works with out case
and avoids code duplication.
Works for me.
Changes in restrict_infos_logically_equal() makes me a bit uneasy. I
see, restictinfo's are different by their outer_relids. Why
outer_relids doesn't matter when required_relids is singleton? More
general when do outer_relids matter for add_non_redundant_clauses() if
we're putting restictinfo's into a single list anyway?
(gdb) call nodeToString(binfo_candidates )
{RESTRICTINFO :clause {NULLTEST :arg {VAR :varno 3 :varattno 2
:vartype 16 :vartypmod -1 :varcollid 0 :varnullingrels (b)
:varlevelsup 0 :varreturningtype 0 :varnosyn 3 :varattnosyn 2
:location -1} :nulltesttype 1 :argisrow false :location -1}
:is_pushed_down true :can_join false :pseudoconstant false :has_clone
false :is_clone false :leakproof false :has_volatile 0 :security_level
0 :num_base_rels 1 :clause_relids (b 3) :required_relids (b 3)
:incompatible_relids (b) :outer_relids (b 5) :left_relids (b)
:right_relids (b) :orclause <> :rinfo_serial 3 :eval_cost.startup -1
:eval_cost.per_tuple 0 :norm_selec -1 :outer_selec -1 :mergeopfamilies
<> :left_em <> :right_em <> :outer_is_left false :hashjoinoperator 0
:left_bucketsize -1 :right_bucketsize -1 :left_mcvfreq -1
:right_mcvfreq -1 :left_hasheqoperator 0 :right_hasheqoperator 0}
{RESTRICTINFO :clause {NULLTEST :arg {VAR :varno 3 :varattno 2
:vartype 16 :vartypmod -1 :varcollid 0 :varnullingrels (b)
:varlevelsup 0 :varreturningtype 0 :varnosyn 3 :varattnosyn 2
:location -1} :nulltesttype 1 :argisrow false :location -1}
:is_pushed_down true :can_join true :pseudoconstant false :has_clone
false :is_clone false :leakproof false :has_volatile 2 :security_level
0 :num_base_rels 1 :clause_relids (b 3) :required_relids (b 3)
:incompatible_relids (b) :outer_relids (b) :left_relids (b 3)
:right_relids (b 3) :orclause <> :rinfo_serial 6 :eval_cost.startup -1
:eval_cost.per_tuple 0 :norm_selec -1 :outer_selec -1 :mergeopfamilies
<> :left_em <> :right_em <> :outer_is_left true :hashjoinoperator 91
:left_bucketsize -1 :right_bucketsize -1 :left_mcvfreq -1
:right_mcvfreq -1 :left_hasheqoperator 91 :right_hasheqoperator 91})"
The first is tom1.col_bool IS NOT NULL, its outer_relids (b 5) is
not empty because it's in the nullable side.
The second is added in replace_relid_callback(). Some fields do not match.
In this case, tom1.col_bool IS NOT NULL becomes a filter clause, not a
join clause, and is safe to only check rinfo->clause.
I think it's better to add is_pushed_down == true check, for example:
...
- if (bms_membership(a->required_relids) == BMS_SINGLETON)
+ if (bms_membership(a->required_relids) == BMS_SINGLETON &&
+ a->is_pushed_down &&
+ b->is_pushed_down)
...
Any thought?
--
Thanks,
Tender Wang
On 18/3/26 09:40, Alexander Korotkov wrote:
What about being more generic and call ChangeVarNodes_walker() for the
node in ChangeVarNodesWalkExpression()? It also works with out case
and avoids code duplication.
I’ve reached the same conclusion. We lost a possible case when the
RestrictInfo→clause contains a bare Var that isn’t pushed into either
the left or right subtree.
I think we can fix this by replacing the expression walker with
ChangeVarNodes_walker().
What is the reason for the second change? Tender, can you show us how to
reproduce the issue so we can support your update to
restrict_infos_logically_equal? If we include it, we should add a test.
--
regards, Andrei Lepikhov,
pgEdge
Attachments:
fix.difftext/plain; charset=UTF-8; name=fix.diffDownload+1-3
On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 18/3/26 09:40, Alexander Korotkov wrote:
What about being more generic and call ChangeVarNodes_walker() for the
node in ChangeVarNodesWalkExpression()? It also works with out case
and avoids code duplication.I’ve reached the same conclusion. We lost a possible case when the
RestrictInfo→clause contains a bare Var that isn’t pushed into either
the left or right subtree.
I think we can fix this by replacing the expression walker with
ChangeVarNodes_walker().
I see. This is even better.
What is the reason for the second change? Tender, can you show us how to
reproduce the issue so we can support your update to
restrict_infos_logically_equal? If we include it, we should add a test.
I think Tender already shown this in [1]. The same qual is present
twice in the plan.
Links.
1. /messages/by-id/CAHewXN=7kDJjUcgEm+6qhaKOXuqzvhRqAAKdafNCRgn0yH7BGg@mail.gmail.com
------
Regards,
Alexander Korotkov
Supabase
On 18/3/26 13:21, Alexander Korotkov wrote:
On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
What is the reason for the second change? Tender, can you show us how to
reproduce the issue so we can support your update to
restrict_infos_logically_equal? If we include it, we should add a test.I think Tender already shown this in [1]. The same qual is present
twice in the plan.
Got it. I suggest making this a separate commit to keep the history
clear. Let me share a draft with a test case for the bug fix first.
--
regards, Andrei Lepikhov,
pgEdge
Attachments:
0001-Fix-self-join-removal-to-update-bare-Var-references-.patchtext/plain; charset=UTF-8; name=0001-Fix-self-join-removal-to-update-bare-Var-references-.patchDownload+23-4
On Wed, Mar 18, 2026 at 3:31 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 18/3/26 13:21, Alexander Korotkov wrote:
On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
What is the reason for the second change? Tender, can you show us how to
reproduce the issue so we can support your update to
restrict_infos_logically_equal? If we include it, we should add a test.I think Tender already shown this in [1]. The same qual is present
twice in the plan.Got it. I suggest making this a separate commit to keep the history
clear. Let me share a draft with a test case for the bug fix first.
Yes, I was also thinking about splitting this into two distinct
commit. The patch you've attached looks good for me. I'm going to
push and backpatch it if no objections. And let's continue the
investigation on restrict_infos_logically_equal().
------
Regards,
Alexander Korotkov
Supabase
On Wed, 18 Mar 2026 at 18:31, Andrei Lepikhov <lepihov@gmail.com> wrote:
On 18/3/26 13:21, Alexander Korotkov wrote:
On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
What is the reason for the second change? Tender, can you show us how to
reproduce the issue so we can support your update to
restrict_infos_logically_equal? If we include it, we should add a test.I think Tender already shown this in [1]. The same qual is present
twice in the plan.Got it. I suggest making this a separate commit to keep the history
clear. Let me share a draft with a test case for the bug fix first.--
regards, Andrei Lepikhov,
pgEdge
Hi!
Is `cool_bool` a typo of `col_bool` in regression test ?
--
Best regards,
Kirill Reshke
Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 22:38写道:
On Wed, Mar 18, 2026 at 3:31 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 18/3/26 13:21, Alexander Korotkov wrote:
On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
What is the reason for the second change? Tender, can you show us how to
reproduce the issue so we can support your update to
restrict_infos_logically_equal? If we include it, we should add a test.I think Tender already shown this in [1]. The same qual is present
twice in the plan.Got it. I suggest making this a separate commit to keep the history
clear. Let me share a draft with a test case for the bug fix first.Yes, I was also thinking about splitting this into two distinct
commit. The patch you've attached looks good for me. I'm going to
push and backpatch it if no objections. And let's continue the
investigation on restrict_infos_logically_equal().
Agree.
And the patch seems to have forgotten to add "Reported by".
--
Thanks,
Tender Wang