BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

Started by PG Bug reporting form20 days ago30 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 19435
Logged by: Hang ammmkilo
Email address: ammmkilo@163.com
PostgreSQL version: 18.3
Operating system: Ubuntu 22.04
Description:

A user encountered an error when attempting to execute a query involving
multiple RIGHT JOIN operations and a NATURAL JOIN on the same table
(pg_table_a). The error message returned was:
[XX000]ERROR: no relation entry for relid 2
This error seems to be an internal one and should not be triggered by users.
It might be a bug.
```sql
DROP TABLE IF EXISTS pg_table_a;

CREATE TABLE pg_table_a (
id INTEGER PRIMARY KEY,
col_bool BOOLEAN
);

INSERT INTO pg_table_a (id, col_bool)
VALUES (5, TRUE);

SELECT 1 AS c1
FROM (
pg_table_a AS tom0
RIGHT JOIN (
(pg_table_a AS tom1 NATURAL JOIN pg_table_a AS tom2)
RIGHT JOIN pg_table_a AS tom3
ON tom1.col_bool IS NOT NULL
)
ON tom1.col_bool
);
```

#2Fujii Masao
masao.fujii@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

On Tue, Mar 17, 2026 at 7:34 PM PG Bug reporting form
<noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 19435
Logged by: Hang ammmkilo
Email address: ammmkilo@163.com
PostgreSQL version: 18.3
Operating system: Ubuntu 22.04
Description:

A user encountered an error when attempting to execute a query involving
multiple RIGHT JOIN operations and a NATURAL JOIN on the same table
(pg_table_a). The error message returned was:
[XX000]ERROR: no relation entry for relid 2
This error seems to be an internal one and should not be triggered by users.
It might be a bug.
```sql
DROP TABLE IF EXISTS pg_table_a;

CREATE TABLE pg_table_a (
id INTEGER PRIMARY KEY,
col_bool BOOLEAN
);

INSERT INTO pg_table_a (id, col_bool)
VALUES (5, TRUE);

SELECT 1 AS c1
FROM (
pg_table_a AS tom0
RIGHT JOIN (
(pg_table_a AS tom1 NATURAL JOIN pg_table_a AS tom2)
RIGHT JOIN pg_table_a AS tom3
ON tom1.col_bool IS NOT NULL
)
ON tom1.col_bool
);
```

Thanks for the report!

I was able to reproduce this issue on the master. git bisect that I ran pointed
to commit fc069a3a631 as the likely cause. So I've CC'd its committer,
Alexander, on this thread.

Regards,

--
Fujii Masao

#3Alexander Korotkov
aekorotkov@gmail.com
In reply to: Fujii Masao (#2)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

On Tue, Mar 17, 2026 at 2:14 PM Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Mar 17, 2026 at 7:34 PM PG Bug reporting form
<noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 19435
Logged by: Hang ammmkilo
Email address: ammmkilo@163.com
PostgreSQL version: 18.3
Operating system: Ubuntu 22.04
Description:

A user encountered an error when attempting to execute a query involving
multiple RIGHT JOIN operations and a NATURAL JOIN on the same table
(pg_table_a). The error message returned was:
[XX000]ERROR: no relation entry for relid 2
This error seems to be an internal one and should not be triggered by users.
It might be a bug.
```sql
DROP TABLE IF EXISTS pg_table_a;

CREATE TABLE pg_table_a (
id INTEGER PRIMARY KEY,
col_bool BOOLEAN
);

INSERT INTO pg_table_a (id, col_bool)
VALUES (5, TRUE);

SELECT 1 AS c1
FROM (
pg_table_a AS tom0
RIGHT JOIN (
(pg_table_a AS tom1 NATURAL JOIN pg_table_a AS tom2)
RIGHT JOIN pg_table_a AS tom3
ON tom1.col_bool IS NOT NULL
)
ON tom1.col_bool
);
```

Thanks for the report!

I was able to reproduce this issue on the master. git bisect that I ran pointed
to commit fc069a3a631 as the likely cause. So I've CC'd its committer,
Alexander, on this thread.

Thank you for adding me to the thread. I'm lookin at this.

------
Regards,
Alexander Korotkov
Supabase

#4Kirill Reshke
reshkekirill@gmail.com
In reply to: Fujii Masao (#2)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

On Tue, 17 Mar 2026 at 17:15, Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Mar 17, 2026 at 7:34 PM PG Bug reporting form
<noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 19435
Logged by: Hang ammmkilo
Email address: ammmkilo@163.com
PostgreSQL version: 18.3
Operating system: Ubuntu 22.04
Description:

A user encountered an error when attempting to execute a query involving
multiple RIGHT JOIN operations and a NATURAL JOIN on the same table
(pg_table_a). The error message returned was:
[XX000]ERROR: no relation entry for relid 2
This error seems to be an internal one and should not be triggered by users.
It might be a bug.
```sql
DROP TABLE IF EXISTS pg_table_a;

CREATE TABLE pg_table_a (
id INTEGER PRIMARY KEY,
col_bool BOOLEAN
);

INSERT INTO pg_table_a (id, col_bool)
VALUES (5, TRUE);

SELECT 1 AS c1
FROM (
pg_table_a AS tom0
RIGHT JOIN (
(pg_table_a AS tom1 NATURAL JOIN pg_table_a AS tom2)
RIGHT JOIN pg_table_a AS tom3
ON tom1.col_bool IS NOT NULL
)
ON tom1.col_bool
);
```

Thanks for the report!

I was able to reproduce this issue on the master. git bisect that I ran pointed
to commit fc069a3a631 as the likely cause. So I've CC'd its committer,
Alexander, on this thread.

Regards,

--
Fujii Masao

My git bisect shows the same commit
also, after "set enable_self_join_elimination to false;" query executes ok

--
Best regards,
Kirill Reshke

#5Tender Wang
tndrwang@gmail.com
In reply to: Alexander Korotkov (#3)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月17日周二 20:26写道:

On Tue, Mar 17, 2026 at 2:14 PM Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Mar 17, 2026 at 7:34 PM PG Bug reporting form
<noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 19435
Logged by: Hang ammmkilo
Email address: ammmkilo@163.com
PostgreSQL version: 18.3
Operating system: Ubuntu 22.04
Description:

A user encountered an error when attempting to execute a query involving
multiple RIGHT JOIN operations and a NATURAL JOIN on the same table
(pg_table_a). The error message returned was:
[XX000]ERROR: no relation entry for relid 2
This error seems to be an internal one and should not be triggered by users.
It might be a bug.
```sql
DROP TABLE IF EXISTS pg_table_a;

CREATE TABLE pg_table_a (
id INTEGER PRIMARY KEY,
col_bool BOOLEAN
);

INSERT INTO pg_table_a (id, col_bool)
VALUES (5, TRUE);

SELECT 1 AS c1
FROM (
pg_table_a AS tom0
RIGHT JOIN (
(pg_table_a AS tom1 NATURAL JOIN pg_table_a AS tom2)
RIGHT JOIN pg_table_a AS tom3
ON tom1.col_bool IS NOT NULL
)
ON tom1.col_bool
);
```

Thanks for the report!

I was able to reproduce this issue on the master. git bisect that I ran pointed
to commit fc069a3a631 as the likely cause. So I've CC'd its committer,
Alexander, on this thread.

Thank you for adding me to the thread. I'm lookin at this.

The error was reported in rebuild_joinclause_attr_needed() when
processing Relid = 1(rtindex =1),
When processing its joininfo" ON tom1.col_bool IS NOT NULL",
(gdb) pgprint rinfo->clause
Var [varno=2 varattno=2 vartype=16
varreturningtype=VAR_RETURNING_DEFAULT varnosyn=2 varattnosyn=2]

The varno=2, rtindex=2(tom1) has been removed. In
add_vars_to_attr_needed(), to find the base_rel, but the
root->simple_rel_array[2] is NULL.
So the error is reporting.
It seems the joininfo should be replaced by rtindex = 3, because the
rtindex=2 would be removed.
--
Thanks,
Tender Wang

#6Tender Wang
tndrwang@gmail.com
In reply to: Tender Wang (#5)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

Tender Wang <tndrwang@gmail.com> 于2026年3月17日周二 20:59写道:

The error was reported in rebuild_joinclause_attr_needed() when
processing Relid = 1(rtindex =1),
When processing its joininfo" ON tom1.col_bool IS NOT NULL",
(gdb) pgprint rinfo->clause
Var [varno=2 varattno=2 vartype=16
varreturningtype=VAR_RETURNING_DEFAULT varnosyn=2 varattnosyn=2]

The varno=2, rtindex=2(tom1) has been removed. In
add_vars_to_attr_needed(), to find the base_rel, but the
root->simple_rel_array[2] is NULL.
So the error is reporting.
It seems the joininfo should be replaced by rtindex = 3, because the
rtindex=2 would be removed.
--

(gdb) pgprint rinfo
RestrictInfo [is_pushed_down=false can_join=false pseudoconstant=false
has_clone=true is_clone=false leakproof=false
has_volatile=VOLATILITY_UNKNOWN security_level=0
num_base_rels=1 rinfo_serial=4 eval_cost={startup = -1,
per_tuple = 0} norm_selec=-1 outer_selec=-1 outer_is_left=false
hashjoinoperator=0 left_bucketsize=-1
right_bucketsize=-1 left_mcvfreq=-1 right_mcvfreq=-1
left_hasheqoperator=0 right_hasheqoperator=0]
[clause] Var [varno=2 varattno=2 vartype=16
varreturningtype=VAR_RETURNING_DEFAULT varnosyn=2 varattnosyn=2]
[clause_relids] Bitmapset [3]
[required_relids] Bitmapset [3 1]
[incompatible_relids] Bitmapset [7 6]
[outer_relids] Bitmapset [6 5 3]

The above is the joininfo of the rtindex=1(tom0), we can see that the
required_relids is changed to [3 1], but the clause is still
rtindex=2(varno=2).
I guess the current logic in remove_self_join_rel() may forget to
process the rinfo->clause.

--
Thanks,
Tender Wang

#7Kirill Reshke
reshkekirill@gmail.com
In reply to: Tender Wang (#6)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:

Tender Wang <tndrwang@gmail.com> 于2026年3月17日周二 20:59写道:

The error was reported in rebuild_joinclause_attr_needed() when
processing Relid = 1(rtindex =1),
When processing its joininfo" ON tom1.col_bool IS NOT NULL",
(gdb) pgprint rinfo->clause
Var [varno=2 varattno=2 vartype=16
varreturningtype=VAR_RETURNING_DEFAULT varnosyn=2 varattnosyn=2]

The varno=2, rtindex=2(tom1) has been removed. In
add_vars_to_attr_needed(), to find the base_rel, but the
root->simple_rel_array[2] is NULL.
So the error is reporting.
It seems the joininfo should be replaced by rtindex = 3, because the
rtindex=2 would be removed.
--

(gdb) pgprint rinfo
RestrictInfo [is_pushed_down=false can_join=false pseudoconstant=false
has_clone=true is_clone=false leakproof=false
has_volatile=VOLATILITY_UNKNOWN security_level=0
num_base_rels=1 rinfo_serial=4 eval_cost={startup = -1,
per_tuple = 0} norm_selec=-1 outer_selec=-1 outer_is_left=false
hashjoinoperator=0 left_bucketsize=-1
right_bucketsize=-1 left_mcvfreq=-1 right_mcvfreq=-1
left_hasheqoperator=0 right_hasheqoperator=0]
[clause] Var [varno=2 varattno=2 vartype=16
varreturningtype=VAR_RETURNING_DEFAULT varnosyn=2 varattnosyn=2]
[clause_relids] Bitmapset [3]
[required_relids] Bitmapset [3 1]
[incompatible_relids] Bitmapset [7 6]
[outer_relids] Bitmapset [6 5 3]

The above is the joininfo of the rtindex=1(tom0), we can see that the
required_relids is changed to [3 1], but the clause is still
rtindex=2(varno=2).
I guess the current logic in remove_self_join_rel() may forget to
process the rinfo->clause.

Yes, it looks like your analysis is valid. Will you share a patch for
updating `clause` ?

--
Best regards,
Kirill Reshke

#8Tender Wang
tndrwang@gmail.com
In reply to: Kirill Reshke (#7)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道:

On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
Yes, it looks like your analysis is valid. Will you share a patch for
updating `clause` ?

Sorry, it's already the middle of the night here, so I'm afraid I
don't have much time to work on this right now.

--
Thanks,
Tender Wang

#9Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tender Wang (#8)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

On Tue, Mar 17, 2026 at 3:30 PM Tender Wang <tndrwang@gmail.com> wrote:

Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道:

On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
Yes, it looks like your analysis is valid. Will you share a patch for
updating `clause` ?

Sorry, it's already the middle of the night here, so I'm afraid I
don't have much time to work on this right now.

Thank you for your research. I've written a simple draft patch. It
fixes the reported case, but I doubt it is correct in general. I'll
continue the investigation.

------
Regards,
Alexander Korotkov
Supabase

Attachments:

sje_joininfo_fix.patchapplication/octet-stream; name=sje_joininfo_fix.patchDownload+6-0
#10Tender Wang
tndrwang@gmail.com
In reply to: Alexander Korotkov (#9)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 01:46写道:

On Tue, Mar 17, 2026 at 3:30 PM Tender Wang <tndrwang@gmail.com> wrote:

Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道:

On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
Yes, it looks like your analysis is valid. Will you share a patch for
updating `clause` ?

Sorry, it's already the middle of the night here, so I'm afraid I
don't have much time to work on this right now.

Thank you for your research. I've written a simple draft patch. It
fixes the reported case, but I doubt it is correct in general. I'll
continue the investigation.

The cause of this bug is doing the following statement:
...
ChangeVarNodesWalkExpression((Node *) rinfo->clause, context);
...
in replace_relid_callback().

The rinfo->clause is only Var(tom1.col_bool), sorry I typoed it to "
ON tom1.col_bool IS NOT NULL", in my first reply email.
See expression_tree_walker_impl(), it does nothing if the node is just
a Var node.
So I think whether we can add logic in ChangeVarNodesWalkExpression() as below:

...
if (node && IsA(node, Var))
{
Var *var = (Var *) node;

if (var->varlevelsup == context->sublevels_up)
{
if (var->varno == context->rt_index)
var->varno = context->new_index;
var->varnullingrels = adjust_relid_set(var->varnullingrels,
context->rt_index,
context->new_index);
if (var->varnosyn == context->rt_index)
var->varnosyn = context->new_index;
}
return false;
}
else
return expression_tree_walker(node,
ChangeVarNodes_walker,
(void *) context);
...

I tried the above fix, no error again. But I got a plan like this:
QUERY PLAN
-------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..115164616.71 rows=7458350250 width=4)
-> Seq Scan on pg_table_a tom3 (cost=0.00..34.70 rows=2470 width=0)
-> Materialize (cost=0.00..99509.82 rows=3019575 width=0)
-> Nested Loop Left Join (cost=0.00..75564.95 rows=3019575 width=0)
Join Filter: tom2.col_bool
-> Seq Scan on pg_table_a tom2 (cost=0.00..34.70
rows=2445 width=5)
Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
-> Materialize (cost=0.00..47.05 rows=2470 width=0)
-> Seq Scan on pg_table_a tom0
(cost=0.00..34.70 rows=2470 width=0)
(9 rows)

Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
This makes me unhappy.
Your patch gets the same plan.

--
Thanks,
Tender Wang

#11Tender Wang
tndrwang@gmail.com
In reply to: Tender Wang (#10)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

Tender Wang <tndrwang@gmail.com> 于2026年3月18日周三 09:12写道:

Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 01:46写道:

On Tue, Mar 17, 2026 at 3:30 PM Tender Wang <tndrwang@gmail.com> wrote:

Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道:

On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
Yes, it looks like your analysis is valid. Will you share a patch for
updating `clause` ?

Sorry, it's already the middle of the night here, so I'm afraid I
don't have much time to work on this right now.

Thank you for your research. I've written a simple draft patch. It
fixes the reported case, but I doubt it is correct in general. I'll
continue the investigation.

I tried the above fix, no error again. But I got a plan like this:
QUERY PLAN
-------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..115164616.71 rows=7458350250 width=4)
-> Seq Scan on pg_table_a tom3 (cost=0.00..34.70 rows=2470 width=0)
-> Materialize (cost=0.00..99509.82 rows=3019575 width=0)
-> Nested Loop Left Join (cost=0.00..75564.95 rows=3019575 width=0)
Join Filter: tom2.col_bool
-> Seq Scan on pg_table_a tom2 (cost=0.00..34.70
rows=2445 width=5)
Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
-> Materialize (cost=0.00..47.05 rows=2470 width=0)
-> Seq Scan on pg_table_a tom0
(cost=0.00..34.70 rows=2470 width=0)
(9 rows)

Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
This makes me unhappy.
Your patch gets the same plan.

In replace_relid_callback(), we add NullTest to rinfo, but it is not a
logical equal check by restrict_infos_logically_equal().
I think for baserestrictinfo, we can just use rinfo->clause, no need
to check the equality of RestrictInfo.

I tried this way, the plan looks as follows:
QUERY PLAN
-------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..115776846.35 rows=7498006100 width=4)
-> Seq Scan on pg_table_a tom3 (cost=0.00..34.70 rows=2470 width=0)
-> Materialize (cost=0.00..100038.47 rows=3035630 width=0)
-> Nested Loop Left Join (cost=0.00..75966.32 rows=3035630 width=0)
Join Filter: tom2.col_bool
-> Seq Scan on pg_table_a tom2 (cost=0.00..34.70
rows=2458 width=5)
Filter: (col_bool IS NOT NULL)
-> Materialize (cost=0.00..47.05 rows=2470 width=0)
-> Seq Scan on pg_table_a tom0
(cost=0.00..34.70 rows=2470 width=0)
(9 rows)

No redundant filter anymore.

Please see the attached patch.

--
Thanks,
Tender Wang

Attachments:

0001-Fix-sje-bug.patchapplication/octet-stream; name=0001-Fix-sje-bug.patchDownload+31-7
#12Kirill Reshke
reshkekirill@gmail.com
In reply to: Tender Wang (#11)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

On Wed, 18 Mar 2026 at 10:44, Tender Wang <tndrwang@gmail.com> wrote:

Tender Wang <tndrwang@gmail.com> 于2026年3月18日周三 09:12写道:

Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 01:46写道:

On Tue, Mar 17, 2026 at 3:30 PM Tender Wang <tndrwang@gmail.com> wrote:

Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道:

On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
Yes, it looks like your analysis is valid. Will you share a patch for
updating `clause` ?

Sorry, it's already the middle of the night here, so I'm afraid I
don't have much time to work on this right now.

Thank you for your research. I've written a simple draft patch. It
fixes the reported case, but I doubt it is correct in general. I'll
continue the investigation.

I tried the above fix, no error again. But I got a plan like this:
QUERY PLAN
-------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..115164616.71 rows=7458350250 width=4)
-> Seq Scan on pg_table_a tom3 (cost=0.00..34.70 rows=2470 width=0)
-> Materialize (cost=0.00..99509.82 rows=3019575 width=0)
-> Nested Loop Left Join (cost=0.00..75564.95 rows=3019575 width=0)
Join Filter: tom2.col_bool
-> Seq Scan on pg_table_a tom2 (cost=0.00..34.70
rows=2445 width=5)
Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
-> Materialize (cost=0.00..47.05 rows=2470 width=0)
-> Seq Scan on pg_table_a tom0
(cost=0.00..34.70 rows=2470 width=0)
(9 rows)

Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
This makes me unhappy.
Your patch gets the same plan.

In replace_relid_callback(), we add NullTest to rinfo, but it is not a
logical equal check by restrict_infos_logically_equal().
I think for baserestrictinfo, we can just use rinfo->clause, no need
to check the equality of RestrictInfo.

I tried this way, the plan looks as follows:
QUERY PLAN
-------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..115776846.35 rows=7498006100 width=4)
-> Seq Scan on pg_table_a tom3 (cost=0.00..34.70 rows=2470 width=0)
-> Materialize (cost=0.00..100038.47 rows=3035630 width=0)
-> Nested Loop Left Join (cost=0.00..75966.32 rows=3035630 width=0)
Join Filter: tom2.col_bool
-> Seq Scan on pg_table_a tom2 (cost=0.00..34.70
rows=2458 width=5)
Filter: (col_bool IS NOT NULL)
-> Materialize (cost=0.00..47.05 rows=2470 width=0)
-> Seq Scan on pg_table_a tom0
(cost=0.00..34.70 rows=2470 width=0)
(9 rows)

No redundant filter anymore.

Please see the attached patch.

--
Thanks,
Tender Wang

Hi!
Your patch looks solid. ChangeVarNodesWalkExpression looks like a
generic rewriter utility function, so I was wondering why we never got
complaints about bugs related it. But this functions is reachable only
when SJE optimisation is allowed, so looks like this explains the
issue.

--
Best regards,
Kirill Reshke

#13Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tender Wang (#11)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

Hi, Tender!

On Wed, Mar 18, 2026 at 7:44 AM Tender Wang <tndrwang@gmail.com> wrote:

Tender Wang <tndrwang@gmail.com> 于2026年3月18日周三 09:12写道:

Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 01:46写道:

On Tue, Mar 17, 2026 at 3:30 PM Tender Wang <tndrwang@gmail.com> wrote:

Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道:

On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
Yes, it looks like your analysis is valid. Will you share a patch for
updating `clause` ?

Sorry, it's already the middle of the night here, so I'm afraid I
don't have much time to work on this right now.

Thank you for your research. I've written a simple draft patch. It
fixes the reported case, but I doubt it is correct in general. I'll
continue the investigation.

I tried the above fix, no error again. But I got a plan like this:
QUERY PLAN
-------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..115164616.71 rows=7458350250 width=4)
-> Seq Scan on pg_table_a tom3 (cost=0.00..34.70 rows=2470 width=0)
-> Materialize (cost=0.00..99509.82 rows=3019575 width=0)
-> Nested Loop Left Join (cost=0.00..75564.95 rows=3019575 width=0)
Join Filter: tom2.col_bool
-> Seq Scan on pg_table_a tom2 (cost=0.00..34.70
rows=2445 width=5)
Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
-> Materialize (cost=0.00..47.05 rows=2470 width=0)
-> Seq Scan on pg_table_a tom0
(cost=0.00..34.70 rows=2470 width=0)
(9 rows)

Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
This makes me unhappy.
Your patch gets the same plan.

In replace_relid_callback(), we add NullTest to rinfo, but it is not a
logical equal check by restrict_infos_logically_equal().
I think for baserestrictinfo, we can just use rinfo->clause, no need
to check the equality of RestrictInfo.

I tried this way, the plan looks as follows:
QUERY PLAN
-------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..115776846.35 rows=7498006100 width=4)
-> Seq Scan on pg_table_a tom3 (cost=0.00..34.70 rows=2470 width=0)
-> Materialize (cost=0.00..100038.47 rows=3035630 width=0)
-> Nested Loop Left Join (cost=0.00..75966.32 rows=3035630 width=0)
Join Filter: tom2.col_bool
-> Seq Scan on pg_table_a tom2 (cost=0.00..34.70
rows=2458 width=5)
Filter: (col_bool IS NOT NULL)
-> Materialize (cost=0.00..47.05 rows=2470 width=0)
-> Seq Scan on pg_table_a tom0
(cost=0.00..34.70 rows=2470 width=0)
(9 rows)

No redundant filter anymore.

Please see the attached patch.

What about being more generic and call ChangeVarNodes_walker() for the
node in ChangeVarNodesWalkExpression()? It also works with out case
and avoids code duplication.

Changes in restrict_infos_logically_equal() makes me a bit uneasy. I
see, restictinfo's are different by their outer_relids. Why
outer_relids doesn't matter when required_relids is singleton? More
general when do outer_relids matter for add_non_redundant_clauses() if
we're putting restictinfo's into a single list anyway?

------
Regards,
Alexander Korotkov
Supabase

Attachments:

0001-Fix-sje-bug-v2.patchapplication/octet-stream; name=0001-Fix-sje-bug-v2.patchDownload+14-3
#14Tender Wang
tndrwang@gmail.com
In reply to: Alexander Korotkov (#13)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 16:40写道:

What about being more generic and call ChangeVarNodes_walker() for the
node in ChangeVarNodesWalkExpression()? It also works with out case
and avoids code duplication.

Works for me.

Changes in restrict_infos_logically_equal() makes me a bit uneasy. I
see, restictinfo's are different by their outer_relids. Why
outer_relids doesn't matter when required_relids is singleton? More
general when do outer_relids matter for add_non_redundant_clauses() if
we're putting restictinfo's into a single list anyway?

(gdb) call nodeToString(binfo_candidates )
{RESTRICTINFO :clause {NULLTEST :arg {VAR :varno 3 :varattno 2
:vartype 16 :vartypmod -1 :varcollid 0 :varnullingrels (b)
:varlevelsup 0 :varreturningtype 0 :varnosyn 3 :varattnosyn 2
:location -1} :nulltesttype 1 :argisrow false :location -1}
:is_pushed_down true :can_join false :pseudoconstant false :has_clone
false :is_clone false :leakproof false :has_volatile 0 :security_level
0 :num_base_rels 1 :clause_relids (b 3) :required_relids (b 3)
:incompatible_relids (b) :outer_relids (b 5) :left_relids (b)
:right_relids (b) :orclause <> :rinfo_serial 3 :eval_cost.startup -1
:eval_cost.per_tuple 0 :norm_selec -1 :outer_selec -1 :mergeopfamilies
<> :left_em <> :right_em <> :outer_is_left false :hashjoinoperator 0
:left_bucketsize -1 :right_bucketsize -1 :left_mcvfreq -1
:right_mcvfreq -1 :left_hasheqoperator 0 :right_hasheqoperator 0}

{RESTRICTINFO :clause {NULLTEST :arg {VAR :varno 3 :varattno 2
:vartype 16 :vartypmod -1 :varcollid 0 :varnullingrels (b)
:varlevelsup 0 :varreturningtype 0 :varnosyn 3 :varattnosyn 2
:location -1} :nulltesttype 1 :argisrow false :location -1}
:is_pushed_down true :can_join true :pseudoconstant false :has_clone
false :is_clone false :leakproof false :has_volatile 2 :security_level
0 :num_base_rels 1 :clause_relids (b 3) :required_relids (b 3)
:incompatible_relids (b) :outer_relids (b) :left_relids (b 3)
:right_relids (b 3) :orclause <> :rinfo_serial 6 :eval_cost.startup -1
:eval_cost.per_tuple 0 :norm_selec -1 :outer_selec -1 :mergeopfamilies
<> :left_em <> :right_em <> :outer_is_left true :hashjoinoperator 91
:left_bucketsize -1 :right_bucketsize -1 :left_mcvfreq -1
:right_mcvfreq -1 :left_hasheqoperator 91 :right_hasheqoperator 91})"

The first is tom1.col_bool IS NOT NULL, its outer_relids (b 5) is
not empty because it's in the nullable side.
The second is added in replace_relid_callback(). Some fields do not match.

In this case, tom1.col_bool IS NOT NULL becomes a filter clause, not a
join clause, and is safe to only check rinfo->clause.
I think it's better to add is_pushed_down == true check, for example:
...
-       if (bms_membership(a->required_relids) == BMS_SINGLETON)
+       if (bms_membership(a->required_relids) == BMS_SINGLETON &&
+               a->is_pushed_down &&
+               b->is_pushed_down)
...

Any thought?

--
Thanks,
Tender Wang

#15Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Korotkov (#13)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

On 18/3/26 09:40, Alexander Korotkov wrote:

What about being more generic and call ChangeVarNodes_walker() for the
node in ChangeVarNodesWalkExpression()? It also works with out case
and avoids code duplication.

I’ve reached the same conclusion. We lost a possible case when the
RestrictInfo→clause contains a bare Var that isn’t pushed into either
the left or right subtree.
I think we can fix this by replacing the expression walker with
ChangeVarNodes_walker().
What is the reason for the second change? Tender, can you show us how to
reproduce the issue so we can support your update to
restrict_infos_logically_equal? If we include it, we should add a test.

--
regards, Andrei Lepikhov,
pgEdge

Attachments:

fix.difftext/plain; charset=UTF-8; name=fix.diffDownload+1-3
#16Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrei Lepikhov (#15)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 18/3/26 09:40, Alexander Korotkov wrote:

What about being more generic and call ChangeVarNodes_walker() for the
node in ChangeVarNodesWalkExpression()? It also works with out case
and avoids code duplication.

I’ve reached the same conclusion. We lost a possible case when the
RestrictInfo→clause contains a bare Var that isn’t pushed into either
the left or right subtree.
I think we can fix this by replacing the expression walker with
ChangeVarNodes_walker().

I see. This is even better.

What is the reason for the second change? Tender, can you show us how to
reproduce the issue so we can support your update to
restrict_infos_logically_equal? If we include it, we should add a test.

I think Tender already shown this in [1]. The same qual is present
twice in the plan.

Links.
1. /messages/by-id/CAHewXN=7kDJjUcgEm+6qhaKOXuqzvhRqAAKdafNCRgn0yH7BGg@mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

#17Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Korotkov (#16)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

On 18/3/26 13:21, Alexander Korotkov wrote:

On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

What is the reason for the second change? Tender, can you show us how to
reproduce the issue so we can support your update to
restrict_infos_logically_equal? If we include it, we should add a test.

I think Tender already shown this in [1]. The same qual is present
twice in the plan.

Got it. I suggest making this a separate commit to keep the history
clear. Let me share a draft with a test case for the bug fix first.

--
regards, Andrei Lepikhov,
pgEdge

Attachments:

0001-Fix-self-join-removal-to-update-bare-Var-references-.patchtext/plain; charset=UTF-8; name=0001-Fix-self-join-removal-to-update-bare-Var-references-.patchDownload+23-4
#18Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrei Lepikhov (#17)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

On Wed, Mar 18, 2026 at 3:31 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 18/3/26 13:21, Alexander Korotkov wrote:

On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

What is the reason for the second change? Tender, can you show us how to
reproduce the issue so we can support your update to
restrict_infos_logically_equal? If we include it, we should add a test.

I think Tender already shown this in [1]. The same qual is present
twice in the plan.

Got it. I suggest making this a separate commit to keep the history
clear. Let me share a draft with a test case for the bug fix first.

Yes, I was also thinking about splitting this into two distinct
commit. The patch you've attached looks good for me. I'm going to
push and backpatch it if no objections. And let's continue the
investigation on restrict_infos_logically_equal().

------
Regards,
Alexander Korotkov
Supabase

#19Kirill Reshke
reshkekirill@gmail.com
In reply to: Andrei Lepikhov (#17)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

On Wed, 18 Mar 2026 at 18:31, Andrei Lepikhov <lepihov@gmail.com> wrote:

On 18/3/26 13:21, Alexander Korotkov wrote:

On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

What is the reason for the second change? Tender, can you show us how to
reproduce the issue so we can support your update to
restrict_infos_logically_equal? If we include it, we should add a test.

I think Tender already shown this in [1]. The same qual is present
twice in the plan.

Got it. I suggest making this a separate commit to keep the history
clear. Let me share a draft with a test case for the bug fix first.

--
regards, Andrei Lepikhov,
pgEdge

Hi!
Is `cool_bool` a typo of `col_bool` in regression test ?

--
Best regards,
Kirill Reshke

#20Tender Wang
tndrwang@gmail.com
In reply to: Alexander Korotkov (#18)
Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 22:38写道:

On Wed, Mar 18, 2026 at 3:31 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 18/3/26 13:21, Alexander Korotkov wrote:

On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

What is the reason for the second change? Tender, can you show us how to
reproduce the issue so we can support your update to
restrict_infos_logically_equal? If we include it, we should add a test.

I think Tender already shown this in [1]. The same qual is present
twice in the plan.

Got it. I suggest making this a separate commit to keep the history
clear. Let me share a draft with a test case for the bug fix first.

Yes, I was also thinking about splitting this into two distinct
commit. The patch you've attached looks good for me. I'm going to
push and backpatch it if no objections. And let's continue the
investigation on restrict_infos_logically_equal().

Agree.
And the patch seems to have forgotten to add "Reported by".

--
Thanks,
Tender Wang

#21Andrei Lepikhov
lepihov@gmail.com
In reply to: Kirill Reshke (#19)
#22Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrei Lepikhov (#21)
#23Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Korotkov (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#24)
#26Tender Wang
tndrwang@gmail.com
In reply to: Tom Lane (#25)
#27Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tom Lane (#25)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Korotkov (#27)
#29Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Korotkov (#23)
#30Tender Wang
tndrwang@gmail.com
In reply to: Andrei Lepikhov (#29)