BUG #18170: Unexpected error: no relation entry for relid 3

Started by PG Bug reporting formover 2 years ago34 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18170
Logged by: Zuming Jiang
Email address: zuming.jiang@inf.ethz.ch
PostgreSQL version: 16.0
Operating system: Ubuntu 20.04
Description:

My fuzzer finds a bug in Postgres, which triggers an unexpected error.

--- Set up database ---
create table t0 (c1 text, c5 text, c6 text);
create table t2 (vkey int4, c17 text, primary key(vkey));
------------------------

The fuzzer generates a test case:

--- Test case ---
select  
  rtrim(ref_4.c6, ref_2.c17) as c_0
from 
  (t0 as ref_0
    inner join ((t0 as ref_1
        full outer join (t2 as ref_2
          right outer join t2 as ref_3
          on (ref_2.vkey = ref_3.vkey))
        on (ref_1.c1 = ref_2.c17))
      left outer join t0 as ref_4
      on (ref_1.c5 = ref_4.c1))
    on (ref_0.c6 = ref_4.c1))
except all
select ref_6.c17 as c_0 from t2 as ref_6;
------------------------
--- Expected behavior ---
The test case should not trigger any error.
--- Actual behavior ---
The test case trigger an error: 

ERROR: no relation entry for relid 3

--- Postgres version ---
Github commit: 26f988212eada9c586223cbbf876c7eb455044d9
Version: PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
--- Platform information ---
Platform: Ubuntu 20.04
Kernel: Linux 5.4.0-147-generic
#2Vik Fearing
vik@postgresfriends.org
In reply to: PG Bug reporting form (#1)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

On 10/26/23 16:01, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 18170
Logged by: Zuming Jiang
Email address: zuming.jiang@inf.ethz.ch
PostgreSQL version: 16.0

This should be 17devel

Operating system: Ubuntu 20.04
Description:

My fuzzer finds a bug in Postgres, which triggers an unexpected error.

This bisects to d3d55ce571369dad6e1d582f1655e5a3fbd8594a, Remove useless
self-joins.
--
Vik Fearing

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vik Fearing (#2)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

Vik Fearing <vik@postgresfriends.org> writes:

On 10/26/23 16:01, PG Bug reporting form wrote:

My fuzzer finds a bug in Postgres, which triggers an unexpected error.

This bisects to d3d55ce571369dad6e1d582f1655e5a3fbd8594a, Remove useless
self-joins.

I wonder if that new code thinks it can remove ref_2 from the query,
even though ref_2 is used in the targetlist. I'm not seeing
control reach remove_leftjoinrel_from_query, though.

Also, while nosing around in this, I tried to pprint(root) at the
point of the error, and got

2023-10-26 12:48:37.852 EDT [1186007] WARNING: could not dump unrecognized node type: 37413808

This happens because the patch changed RelOptInfo.unique_for_rels from
a list of Bitmapsets into a list of UniqueRelInfo, even though it did
not bother to make UniqueRelInfo be a Node type (much less document
the change in globally-visible data structures: pathnodes.h still says
it's a list of Relid sets). This is not acceptable.

I'm getting the distinct impression that this patch wasn't
ready for prime time.

regards, tom lane

#4Alexander Korotkov
aekorotkov@gmail.com
In reply to: Vik Fearing (#2)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

On Thu, Oct 26, 2023 at 6:21 PM Vik Fearing <vik@postgresfriends.org> wrote:

On 10/26/23 16:01, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 18170
Logged by: Zuming Jiang
Email address: zuming.jiang@inf.ethz.ch
PostgreSQL version: 16.0

This should be 17devel

Operating system: Ubuntu 20.04
Description:

My fuzzer finds a bug in Postgres, which triggers an unexpected error.

This bisects to d3d55ce571369dad6e1d582f1655e5a3fbd8594a, Remove useless
self-joins.

Thank you for bisecting!
I'm currently digging into this.

------
Regards,
Alexander Korotkov

#5Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tom Lane (#3)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

On Thu, Oct 26, 2023 at 8:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Vik Fearing <vik@postgresfriends.org> writes:

On 10/26/23 16:01, PG Bug reporting form wrote:

My fuzzer finds a bug in Postgres, which triggers an unexpected error.

This bisects to d3d55ce571369dad6e1d582f1655e5a3fbd8594a, Remove useless
self-joins.

I wonder if that new code thinks it can remove ref_2 from the query,
even though ref_2 is used in the targetlist. I'm not seeing
control reach remove_leftjoinrel_from_query, though.

Also, while nosing around in this, I tried to pprint(root) at the
point of the error, and got

2023-10-26 12:48:37.852 EDT [1186007] WARNING: could not dump unrecognized node type: 37413808

This happens because the patch changed RelOptInfo.unique_for_rels from
a list of Bitmapsets into a list of UniqueRelInfo, even though it did
not bother to make UniqueRelInfo be a Node type (much less document
the change in globally-visible data structures: pathnodes.h still says
it's a list of Relid sets). This is not acceptable.

I'm getting the distinct impression that this patch wasn't
ready for prime time.

Please, give me a chance to fix this shortly. If that wouldn't be an
easy fix, I will revert.

------
Regards,
Alexander Korotkov

#6Andrei Lepikhov
lepihov@gmail.com
In reply to: Tom Lane (#3)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

On 27/10/2023 00:12, Tom Lane wrote:

Vik Fearing <vik@postgresfriends.org> writes:

On 10/26/23 16:01, PG Bug reporting form wrote:

My fuzzer finds a bug in Postgres, which triggers an unexpected error.

This bisects to d3d55ce571369dad6e1d582f1655e5a3fbd8594a, Remove useless
self-joins.

I wonder if that new code thinks it can remove ref_2 from the query,
even though ref_2 is used in the targetlist. I'm not seeing
control reach remove_leftjoinrel_from_query, though.

As I see, this join can be removed: in the explain you can see that
OUTER JOIN is replaced with the INNER JOIN(ref_2, ref_3) ON a key column.
In my opinion, the origin of the problem is that the parent_root
contains a link to ref_2 in its
simple_rte_array[]->subquery->targetList. I am still looking for a
general solution right now, but it doesn't look too complicated at first
sight.

Also, while nosing around in this, I tried to pprint(root) at the
point of the error, and got

Yeah, it is my blunder. Alexander already fixed that, as I see. The only
question is whether it is worth making the UniqueRelInfo structure as a
node (for debug purposes - I see only one reason) or we should exclude
this field from read/write operations at all.

--
regards,
Andrei Lepikhov
Postgres Professional

#7Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrei Lepikhov (#6)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

On Fri, Oct 27, 2023 at 9:31 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 27/10/2023 00:12, Tom Lane wrote:

Vik Fearing <vik@postgresfriends.org> writes:

On 10/26/23 16:01, PG Bug reporting form wrote:

My fuzzer finds a bug in Postgres, which triggers an unexpected error.

This bisects to d3d55ce571369dad6e1d582f1655e5a3fbd8594a, Remove useless
self-joins.

I wonder if that new code thinks it can remove ref_2 from the query,
even though ref_2 is used in the targetlist. I'm not seeing
control reach remove_leftjoinrel_from_query, though.

As I see, this join can be removed: in the explain you can see that
OUTER JOIN is replaced with the INNER JOIN(ref_2, ref_3) ON a key column.
In my opinion, the origin of the problem is that the parent_root
contains a link to ref_2 in its
simple_rte_array[]->subquery->targetList. I am still looking for a
general solution right now, but it doesn't look too complicated at first
sight.

Yes, I came to the same conclusion. We process root->parse. But I
didn't get why parent_root->simple_rte_array[]->subquery is not the
same as root->parse. They look the same, but they are distinct
copies. If they were the same pointers, there would be no problem.

Also, while nosing around in this, I tried to pprint(root) at the
point of the error, and got

Yeah, it is my blunder. Alexander already fixed that, as I see. The only
question is whether it is worth making the UniqueRelInfo structure as a
node (for debug purposes - I see only one reason) or we should exclude
this field from read/write operations at all.

I think we just shouldn't break the general rule, that everything
inside the node should be nodes too.

------
Regards,
Alexander Korotkov

#8Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Korotkov (#7)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

On 27/10/2023 15:17, Alexander Korotkov wrote:

On Fri, Oct 27, 2023 at 9:31 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 27/10/2023 00:12, Tom Lane wrote:

Vik Fearing <vik@postgresfriends.org> writes:

On 10/26/23 16:01, PG Bug reporting form wrote:

My fuzzer finds a bug in Postgres, which triggers an unexpected error.

This bisects to d3d55ce571369dad6e1d582f1655e5a3fbd8594a, Remove useless
self-joins.

I wonder if that new code thinks it can remove ref_2 from the query,
even though ref_2 is used in the targetlist. I'm not seeing
control reach remove_leftjoinrel_from_query, though.

As I see, this join can be removed: in the explain you can see that
OUTER JOIN is replaced with the INNER JOIN(ref_2, ref_3) ON a key column.
In my opinion, the origin of the problem is that the parent_root
contains a link to ref_2 in its
simple_rte_array[]->subquery->targetList. I am still looking for a
general solution right now, but it doesn't look too complicated at first
sight.

Yes, I came to the same conclusion. We process root->parse. But I
didn't get why parent_root->simple_rte_array[]->subquery is not the
same as root->parse. They look the same, but they are distinct
copies. If they were the same pointers, there would be no problem.

As I see, the copy of the parse tree is induced by the same feature as
usual in the last few months: 2489d76. It introduced
remove_nulling_relids, and it altered our parse tree. Right now, I don't
have an answer: it should be fixed in SJE, or this is a more general
issue just discovered by the SJE.

--
regards,
Andrei Lepikhov
Postgres Professional

#9Andrei Lepikhov
lepihov@gmail.com
In reply to: Andrei Lepikhov (#8)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

On 27/10/2023 16:07, Andrei Lepikhov wrote:

On 27/10/2023 15:17, Alexander Korotkov wrote:

On Fri, Oct 27, 2023 at 9:31 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 27/10/2023 00:12, Tom Lane wrote:

Vik Fearing <vik@postgresfriends.org> writes:

On 10/26/23 16:01, PG Bug reporting form wrote:

My fuzzer finds a bug in Postgres, which triggers an unexpected
error.

This bisects to d3d55ce571369dad6e1d582f1655e5a3fbd8594a, Remove
useless
self-joins.

I wonder if that new code thinks it can remove ref_2 from the query,
even though ref_2 is used in the targetlist.  I'm not seeing
control reach remove_leftjoinrel_from_query, though.

As I see, this join can be removed: in the explain you can see that
OUTER JOIN is replaced with the INNER JOIN(ref_2, ref_3) ON a key
column.
In my opinion, the origin of the problem is that the parent_root
contains a link to ref_2 in its
simple_rte_array[]->subquery->targetList. I am still looking for a
general solution right now, but it doesn't look too complicated at first
sight.

Yes, I came to the same conclusion.  We process root->parse.  But I
didn't get why parent_root->simple_rte_array[]->subquery is not the
same as root->parse.  They look the same, but they are distinct
copies.  If they were the same pointers, there would be no problem.

As I see, the copy of the parse tree is induced by the same feature as
usual in the last few months: 2489d76. It introduced
remove_nulling_relids, and it altered our parse tree. Right now, I don't
have an answer: it should be fixed in SJE, or this is a more general
issue just discovered by the SJE.

So, I can propose two options. First - don't clean only the current root
structure, but also make cleanup of the parent. Although it looks safe,
I am not happy with this approach - it seems too simple: we should have
a genuine reason for such a cleaning because it potentially adds overhead.
The second option is to add a flag for not altering queries in
remove_nulling_relids() - it looks like a mistake when we have two
different query trees in the root and its parent. Also, it reduces
memory usage a bit.
So, if my analysis is correct, it is better to use the second way (see
attachment).

--
regards,
Andrei Lepikhov
Postgres Professional

Attachments:

0001-Don-t-alter-parse-tree-during-the-procedure-of-outer.patchtext/plain; charset=UTF-8; name=0001-Don-t-alter-parse-tree-during-the-procedure-of-outer.patchDownload+21-11
#10Richard Guo
guofenglinux@gmail.com
In reply to: Andrei Lepikhov (#9)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

On Fri, Oct 27, 2023 at 7:00 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru>
wrote:

So, I can propose two options. First - don't clean only the current root
structure, but also make cleanup of the parent. Although it looks safe,
I am not happy with this approach - it seems too simple: we should have
a genuine reason for such a cleaning because it potentially adds overhead.
The second option is to add a flag for not altering queries in
remove_nulling_relids() - it looks like a mistake when we have two
different query trees in the root and its parent. Also, it reduces
memory usage a bit.
So, if my analysis is correct, it is better to use the second way (see
attachment).

Alternatively, can we look at subroot->parse->targetList instead of
subquery->targetList where we call estimate_num_groups on the output of
the subquery?

--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -341,7 +341,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
        *pNumGroups = subpath->rows;
    else
        *pNumGroups = estimate_num_groups(subroot,
-
get_tlist_exprs(subquery->targetList, false),
+
get_tlist_exprs(subroot->parse->targetList, false),
                                          subpath->rows,
                                          NULL,
                                          NULL);

BTW, I'm a little surprised that QTW_DONT_COPY_QUERY doesn't seem to be
used anywhere currently.

Thanks
Richard

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#10)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

Richard Guo <guofenglinux@gmail.com> writes:

Alternatively, can we look at subroot->parse->targetList instead of
subquery->targetList where we call estimate_num_groups on the output of
the subquery?

Please read the comment just above that.

regards, tom lane

#12Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tom Lane (#11)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

On Fri, Oct 27, 2023 at 5:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Richard Guo <guofenglinux@gmail.com> writes:

Alternatively, can we look at subroot->parse->targetList instead of
subquery->targetList where we call estimate_num_groups on the output of
the subquery?

Please read the comment just above that.

Thank you, Tom, This is clear.

Could you please comment on the second option by Andrei [1]? Was the
copying of the parse trees an intention of 2489d76 or just a side
effect? Do you think removing copying of trees is safe?

Links
1. /messages/by-id/2eb87c2e-7e6f-4002-8df3-8fac3aa6a037@postgrespro.ru

------
Regards,
Alexander Korotkov

#13Andrei Lepikhov
lepihov@gmail.com
In reply to: Richard Guo (#10)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

On 27/10/2023 21:10, Richard Guo wrote:

On Fri, Oct 27, 2023 at 7:00 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:

So, I can propose two options. First - don't clean only the current
root
structure, but also make cleanup of the parent. Although it looks safe,
I am not happy with this approach - it seems too simple: we should have
a genuine reason for such a cleaning because it potentially adds
overhead.
The second option is to add a flag for not altering queries in
remove_nulling_relids() - it looks like a mistake when we have two
different query trees in the root and its parent. Also, it reduces
memory usage a bit.
So, if my analysis is correct, it is better to use the second way (see
attachment).

Alternatively, can we look at subroot->parse->targetList instead of
subquery->targetList where we call estimate_num_groups on the output of
the subquery?

It is a solution. But does it mask the real problem? In my mind, we copy
node trees to use somewhere else or probe a conjecture. Here, we have
two different representations of the same subquery. Keeping aside the
memory consumption issue, is it correct?
Make sense to apply both options: switch the groups estimation to
subroot targetList and keep one version of a subquery.
In attachment - second (combined) version of the change. Here I added
assertions to check identity of root->parse and incoming query tree.

--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -341,7 +341,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
        *pNumGroups = subpath->rows;
    else
        *pNumGroups = estimate_num_groups(subroot,
-                                         
get_tlist_exprs(subquery->targetList, false),
+                                         
get_tlist_exprs(subroot->parse->targetList, false),
                                          subpath->rows,
                                          NULL,
                                          NULL);

BTW, I'm a little surprised that QTW_DONT_COPY_QUERY doesn't seem to be
used anywhere currently.

I too. But we use this flag in the enterprise fork to reduce memory
consumption. It could be proposed for upstream, but looks a bit unsafe.
I guess, some extensions could do the same.

--
regards,
Andrei Lepikhov
Postgres Professional

Attachments:

v2-0001-Don-t-alter-parse-tree-during-the-procedure-of-outer.patchtext/plain; charset=UTF-8; name=v2-0001-Don-t-alter-parse-tree-during-the-procedure-of-outer.patchDownload+44-16
#14Vik Fearing
vik@postgresfriends.org
In reply to: Andrei Lepikhov (#13)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

On 28/10/2023 08:59, Andrei Lepikhov wrote:

BTW, I'm a little surprised that QTW_DONT_COPY_QUERY doesn't seem to be
used anywhere currently.

I too. But we use this flag in the enterprise fork to reduce memory
consumption. It could be proposed for upstream, but looks a bit unsafe.
I guess, some extensions could do the same.

What is "the enterprise fork"? This is a PostgreSQL mailing list.
--
Vik Fearing

#15Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrei Lepikhov (#13)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

On Sat, Oct 28, 2023 at 9:59 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 27/10/2023 21:10, Richard Guo wrote:

On Fri, Oct 27, 2023 at 7:00 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:

So, I can propose two options. First - don't clean only the current
root
structure, but also make cleanup of the parent. Although it looks safe,
I am not happy with this approach - it seems too simple: we should have
a genuine reason for such a cleaning because it potentially adds
overhead.
The second option is to add a flag for not altering queries in
remove_nulling_relids() - it looks like a mistake when we have two
different query trees in the root and its parent. Also, it reduces
memory usage a bit.
So, if my analysis is correct, it is better to use the second way (see
attachment).

Alternatively, can we look at subroot->parse->targetList instead of
subquery->targetList where we call estimate_num_groups on the output of
the subquery?

It is a solution. But does it mask the real problem? In my mind, we copy
node trees to use somewhere else or probe a conjecture. Here, we have
two different representations of the same subquery. Keeping aside the
memory consumption issue, is it correct?
Make sense to apply both options: switch the groups estimation to
subroot targetList and keep one version of a subquery.
In attachment - second (combined) version of the change. Here I added
assertions to check identity of root->parse and incoming query tree.

Andrei, did you read the comment just before the groups estimation as
pointed by Tom [1]?

* XXX you don't really want to know about this: we do the estimation
* using the subquery's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The
* reason is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
* generate_append_tlist, and those would confuse estimate_num_groups
* mightily. We ought to get rid of the "varno 0" hack, but that
* requires a redesign of the parsetree representation of setops, so
* that there can be an RTE corresponding to each setop's output.

As I understand, it requires much more work to correctly switch the
groups estimation to subroot targetList.

+1 for asserts that parse trees are the same.

Links
1. /messages/by-id/1551957.1698417686@sss.pgh.pa.us

------
Regards,
Alexander Korotkov

#16Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Korotkov (#15)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

On 28/10/2023 17:10, Alexander Korotkov wrote:

It is a solution. But does it mask the real problem? In my mind, we copy
node trees to use somewhere else or probe a conjecture. Here, we have
two different representations of the same subquery. Keeping aside the
memory consumption issue, is it correct?
Make sense to apply both options: switch the groups estimation to
subroot targetList and keep one version of a subquery.
In attachment - second (combined) version of the change. Here I added
assertions to check identity of root->parse and incoming query tree.

Andrei, did you read the comment just before the groups estimation as
pointed by Tom [1]?

Yes, and I am a bit confused. We use here subroot->parse->targetList.
The processed_tlist, where we can find the "Varno 0" value, is based on
it, but it is different. As I see, forming processed_tlist, we make a
new node and don't change the original targetList. Am I wrong?

* XXX you don't really want to know about this: we do the estimation
* using the subquery's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The
* reason is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
* generate_append_tlist, and those would confuse estimate_num_groups
* mightily. We ought to get rid of the "varno 0" hack, but that
* requires a redesign of the parsetree representation of setops, so
* that there can be an RTE corresponding to each setop's output.

As I understand, it requires much more work to correctly switch the
groups estimation to subroot targetList.

"Varno 0" is quite an irritating problem, which has beaten me a lot
before, during the development of the GROUP-BY optimization feature and
not only. I'd be glad to redesign this part of the planner. But I didn't
find an easy way to implement that yet.

--
regards,
Andrei Lepikhov
Postgres Professional

#17Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#15)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

On Sat, Oct 28, 2023 at 1:10 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Sat, Oct 28, 2023 at 9:59 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 27/10/2023 21:10, Richard Guo wrote:

On Fri, Oct 27, 2023 at 7:00 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:

So, I can propose two options. First - don't clean only the current
root
structure, but also make cleanup of the parent. Although it looks safe,
I am not happy with this approach - it seems too simple: we should have
a genuine reason for such a cleaning because it potentially adds
overhead.
The second option is to add a flag for not altering queries in
remove_nulling_relids() - it looks like a mistake when we have two
different query trees in the root and its parent. Also, it reduces
memory usage a bit.
So, if my analysis is correct, it is better to use the second way (see
attachment).

Alternatively, can we look at subroot->parse->targetList instead of
subquery->targetList where we call estimate_num_groups on the output of
the subquery?

It is a solution. But does it mask the real problem? In my mind, we copy
node trees to use somewhere else or probe a conjecture. Here, we have
two different representations of the same subquery. Keeping aside the
memory consumption issue, is it correct?
Make sense to apply both options: switch the groups estimation to
subroot targetList and keep one version of a subquery.
In attachment - second (combined) version of the change. Here I added
assertions to check identity of root->parse and incoming query tree.

Andrei, did you read the comment just before the groups estimation as
pointed by Tom [1]?

* XXX you don't really want to know about this: we do the estimation
* using the subquery's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The
* reason is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
* generate_append_tlist, and those would confuse estimate_num_groups
* mightily. We ought to get rid of the "varno 0" hack, but that
* requires a redesign of the parsetree representation of setops, so
* that there can be an RTE corresponding to each setop's output.

As I understand, it requires much more work to correctly switch the
groups estimation to subroot targetList.

+1 for asserts that parse trees are the same.

I made some beautification of the patch by Andrei. I also removed the
part which changes the target list for estimate_num_groups(). Any
objections to pushing this?

------
Regards,
Alexander Korotkov

Attachments:

0001-Don-t-alter-parse-tree-during-the-procedure-of-ou-v3.patchapplication/octet-stream; name=0001-Don-t-alter-parse-tree-during-the-procedure-of-ou-v3.patchDownload+48-15
#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Korotkov (#17)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

Alexander Korotkov <aekorotkov@gmail.com> writes:

I made some beautification of the patch by Andrei. I also removed the
part which changes the target list for estimate_num_groups(). Any
objections to pushing this?

It seems moderately likely that this will break as much as it fixes.

I've not studied the original patch enough to understand why you need
to be playing strange games with tree mutation rules, but I suspect
that this is band-aiding over some rather fundamentally bad code.

regards, tom lane

#19Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#11)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

On Fri, Oct 27, 2023 at 10:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Richard Guo <guofenglinux@gmail.com> writes:

Alternatively, can we look at subroot->parse->targetList instead of
subquery->targetList where we call estimate_num_groups on the output of
the subquery?

Please read the comment just above that.

Hmm, I understand that we cannot look at subroot->processed_tlist here
because of the "varno 0" hack explained in the comment. But what I
proposed is to look at subroot->parse->targetList, which should be fine
because the tlist generated by generate_append_tlist would only be
returned into processed_tlist not the original parse->targetList. For
instance, the query below still works if we look at
subroot->parse->targetList, even though the subquery is itself a setop.

(SELECT 1,2,3 UNION SELECT 4,5,6 ORDER BY 1,2) INTERSECT SELECT 4,5,6;
?column? | ?column? | ?column?
----------+----------+----------
4 | 5 | 6
(1 row)

Thanks
Richard

#20Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#18)
Re: BUG #18170: Unexpected error: no relation entry for relid 3

On Sun, Oct 29, 2023 at 11:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexander Korotkov <aekorotkov@gmail.com> writes:

I made some beautification of the patch by Andrei. I also removed the
part which changes the target list for estimate_num_groups(). Any
objections to pushing this?

It seems moderately likely that this will break as much as it fixes.

I've not studied the original patch enough to understand why you need
to be playing strange games with tree mutation rules, but I suspect
that this is band-aiding over some rather fundamentally bad code.

I also have some concerns about this patch. It requires that
root->parse remains unchanged during the whole subquery_planner() in
order to work, which is an implicit constraint we did not have before.

Thanks
Richard

#21Andrei Lepikhov
lepihov@gmail.com
In reply to: Tom Lane (#18)
#22Andrei Lepikhov
lepihov@gmail.com
In reply to: Richard Guo (#20)
#23Richard Guo
guofenglinux@gmail.com
In reply to: Andrei Lepikhov (#22)
#24Andrei Lepikhov
lepihov@gmail.com
In reply to: Richard Guo (#23)
#25Richard Guo
guofenglinux@gmail.com
In reply to: Andrei Lepikhov (#24)
#26Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#25)
#27Alexander Korotkov
aekorotkov@gmail.com
In reply to: Richard Guo (#20)
#28Alexander Korotkov
aekorotkov@gmail.com
In reply to: Richard Guo (#25)
#29Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Korotkov (#28)
#30Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#12)
#31Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#30)
#32Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Korotkov (#31)
#33Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#3)
#34Alexander Korotkov
aekorotkov@gmail.com
In reply to: Richard Guo (#33)