Improving EXPLAIN's display of SubPlan nodes

Started by Tom Laneabout 2 years ago23 messageshackers

tgl@sss.pgh.pa.us

about 2 years ago

EXPLAIN has always been really poor at displaying SubPlan nodes
in expressions: you don't get much more than "(SubPlan N)".
This is mostly because every time I thought about it, I despaired
of trying to represent all the information in a SubPlan accurately.
However, a recent discussion[1]/messages/by-id/149c5c2f-4267-44e3-a177-d1fd24c53f6d@universite-paris-saclay.fr made me realize that we could do
a lot better just by displaying the SubLinkType and the testexpr
(if relevant). So here's a proposed patch. You can see what
it does by examining the regression test changes.

There's plenty of room to bikeshed about exactly how to display
this stuff, and I'm open to suggestions.

BTW, I was somewhat depressed to discover that we have exactly
zero regression coverage of the ROWCOMPARE_SUBLINK code paths;
not only was EXPLAIN output not covered, but the planner and
executor too. So I added some simple tests for that. Otherwise
I think existing coverage is enough for this.

regards, tom lane

[1]: /messages/by-id/149c5c2f-4267-44e3-a177-d1fd24c53f6d@universite-paris-saclay.fr

Aleksander Alekseev

aleksander@timescale.com

about 2 years ago

In reply to: Tom Lane (#1)

Re: Improving EXPLAIN's display of SubPlan nodes

Hi,

EXPLAIN has always been really poor at displaying SubPlan nodes
in expressions: you don't get much more than "(SubPlan N)".
This is mostly because every time I thought about it, I despaired
of trying to represent all the information in a SubPlan accurately.
However, a recent discussion[1] made me realize that we could do
a lot better just by displaying the SubLinkType and the testexpr
(if relevant). So here's a proposed patch. You can see what
it does by examining the regression test changes.

There's plenty of room to bikeshed about exactly how to display
this stuff, and I'm open to suggestions.

BTW, I was somewhat depressed to discover that we have exactly
zero regression coverage of the ROWCOMPARE_SUBLINK code paths;
not only was EXPLAIN output not covered, but the planner and
executor too. So I added some simple tests for that. Otherwise
I think existing coverage is enough for this.

I reviewed the code and tested the patch on MacOS. It looks good to me.

Although something like:

```
+   Filter: (ANY (base_tbl.a = $1) FROM SubPlan 1 (returns $1))
+   SubPlan 1 (returns $1)
```

... arguably doesn't give much more information to the user comparing
to what we have now:

```
- Filter: (SubPlan 1)
- SubPlan 1
```

... I believe this is the right step toward more detailed EXPLAINs,
and perhaps could be useful for debugging and/or educational purposes.
Also the patch improves code coverage.

--
Best regards,
Aleksander Alekseev

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Aleksander Alekseev (#2)

Re: Improving EXPLAIN's display of SubPlan nodes

Aleksander Alekseev <aleksander@timescale.com> writes:

Although something like:

```
+   Filter: (ANY (base_tbl.a = $1) FROM SubPlan 1 (returns $1))
+   SubPlan 1 (returns $1)
```

... arguably doesn't give much more information to the user comparing
to what we have now:

```
- Filter: (SubPlan 1)
- SubPlan 1
```

Yeah, I would probably not have thought to do this on my own; but
we had an actual user request for it. I think arguably the main
benefit is to confirm "yes, this is the sub-select you think it is".

The main thing that's still missing compared to what is in the plan
data structure is information about which Param is which. I think
we have the subplan output Params relatively well covered through
the expedient of listing them in the generated plan_name, but it's
still not apparent to me how we could shoehorn subplan input
Params into this (or whether it's worth doing).

regards, tom lane

Chantal Keller

chantal.keller@universite-paris-saclay.fr

about 2 years ago

In reply to: Tom Lane (#3)

Re: Improving EXPLAIN's display of SubPlan nodes

Hi Aleksander and Tom

I do confirm that I requested to get this information, in order to
recover the formula to filter on.

Thanks to both of you
Chantal

Le 22/01/2024 à 18:07, Tom Lane a écrit :

Show quoted text

Aleksander Alekseev <aleksander@timescale.com> writes:

Although something like:
```
+   Filter: (ANY (base_tbl.a = $1) FROM SubPlan 1 (returns $1))
+   SubPlan 1 (returns $1)
```
... arguably doesn't give much more information to the user comparing
to what we have now:

```
- Filter: (SubPlan 1)
- SubPlan 1
```

Yeah, I would probably not have thought to do this on my own; but
we had an actual user request for it. I think arguably the main
benefit is to confirm "yes, this is the sub-select you think it is".

The main thing that's still missing compared to what is in the plan
data structure is information about which Param is which. I think
we have the subplan output Params relatively well covered through
the expedient of listing them in the generated plan_name, but it's
still not apparent to me how we could shoehorn subplan input
Params into this (or whether it's worth doing).

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Tom Lane (#3)

Re: Improving EXPLAIN's display of SubPlan nodes

I wrote:

The main thing that's still missing compared to what is in the plan
data structure is information about which Param is which. I think
we have the subplan output Params relatively well covered through
the expedient of listing them in the generated plan_name, but it's
still not apparent to me how we could shoehorn subplan input
Params into this (or whether it's worth doing).

Actually ... it looks like it probably isn't worth doing, because
it's already the case that we don't expose input Params as such.
EXPLAIN searches for the referent of an input Param and displays
that (cf find_param_referent()). Just for experimental purposes,
I wrote a follow-on patch to add printout of the parParam and args
list (attached, as .txt so the cfbot doesn't think it's a patch).
This produces results like

explain (verbose, costs off)
select array(select sum(x+y) s
from generate_series(1,3) y group by y order by s)
from generate_series(1,3) x;
QUERY PLAN
-------------------------------------------------------------------
Function Scan on pg_catalog.generate_series x
Output: ARRAY(SubPlan 1 PASSING $0 := x.x)
^^^^^^^^^^^^^^^^^ added by delta patch
Function Call: generate_series(1, 3)
SubPlan 1
-> Sort
Output: (sum((x.x + y.y))), y.y
Sort Key: (sum((x.x + y.y)))
-> HashAggregate
Output: sum((x.x + y.y)), y.y
^^^ actual reference to $0
Group Key: y.y
-> Function Scan on pg_catalog.generate_series y
Output: y.y
Function Call: generate_series(1, 3)
(13 rows)

As you can see, it's not necessary to explain what $0 is because
that name isn't shown anywhere else --- the references to "x.x" in
the subplan are actually uses of $0.

So now I'm thinking that we do have enough detail in the present
proposal, and we just need to think about whether there's some
nicer way to present it than the particular spelling I used here.

One idea I considered briefly is to pull the same trick with
regards to output parameters --- that is, instead of adding all
the "returns $n" annotations to subplans, drill down and print
the subplan's relevant targetlist expression instead of "$n".
On balance I think that might be more confusing not less so,
though. SQL users are used to the idea that a sub-select can
"see" variables from the outer query, but not at all vice-versa.
I think it probably wouldn't be formally ambiguous, because
ruleutils already de-duplicates table aliases across the whole
tree, but it still seems likely to be confusing. Also, people
are already pretty used to seeing $n to represent the outputs
of InitPlans, and I've not heard many complaints suggesting
that we should change that.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Tom Lane (#5)

Re: Improving EXPLAIN's display of SubPlan nodes

I wrote:

So now I'm thinking that we do have enough detail in the present
proposal, and we just need to think about whether there's some
nicer way to present it than the particular spelling I used here.

Here's a rebase over 9f1337639 --- no code changes, but this affects
some of the new or changed expected outputs from that commit.

regards, tom lane

Andrey Borodin

amborodin@acm.org

about 2 years ago

In reply to: Tom Lane (#6)

Re: Improving EXPLAIN's display of SubPlan nodes

On 17 Feb 2024, at 00:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Here's a rebase over 9f1337639 --- no code changes, but this affects
some of the new or changed expected outputs from that commit.

Aleksander, as long as your was reviewing this previously, I’ve added you to reviewers of this CF entry [0]https://commitfest.postgresql.org/47/4782/. Please, ping me or remove yourself, it it’s actually not a case.

BTW, as long as there’s a new version and some input from Tom, would you be willing to post fleshier review?

Thanks for working on this!

Best regards, Andrey Borodin.

[0]: https://commitfest.postgresql.org/47/4782/

Dean Rasheed

dean.a.rasheed@gmail.com

about 2 years ago

In reply to: Tom Lane (#6)

Re: Improving EXPLAIN's display of SubPlan nodes

On Fri, 16 Feb 2024 at 19:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So now I'm thinking that we do have enough detail in the present
proposal, and we just need to think about whether there's some
nicer way to present it than the particular spelling I used here.

One thing that concerns me about making even greater use of "$n" is
the potential for confusion with generic plan parameters. Maybe it's
always possible to work out which is which from context, but still it
looks messy:

drop table if exists foo;
create table foo(id int, x int, y int);

explain (verbose, costs off, generic_plan)
select row($3,$4) = (select x,y from foo where id=y) and
row($1,$2) = (select min(x+y),max(x+y) from generate_series(1,3) x)
from generate_series(1,3) y;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Function Scan on pg_catalog.generate_series y
Output: (($3 = $0) AND ($4 = $1) AND (ROWCOMPARE (($1 = $3) AND ($2
= $4)) FROM SubPlan 2 (returns $3,$4)))
Function Call: generate_series(1, 3)
InitPlan 1 (returns $0,$1)
-> Seq Scan on public.foo
Output: foo.x, foo.y
Filter: (foo.id = foo.y)
SubPlan 2 (returns $3,$4)
-> Aggregate
Output: min((x.x + y.y)), max((x.x + y.y))
-> Function Scan on pg_catalog.generate_series x
Output: x.x
Function Call: generate_series(1, 3)

Another odd thing about that is the inconsistency between how the
SubPlan and InitPlan expressions are displayed. I think "ROWCOMPARE"
is really just an internal detail that could be omitted without losing
anything. But the "FROM SubPlan ..." is useful to work out where it's
coming from. Should it also output "FROM InitPlan ..."? I think that
would risk making it harder to read.

Another possibility is to put the SubPlan and InitPlan names inline,
rather than outputting "FROM SubPlan ...". I had a go at hacking that
up and this was the result:

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Function Scan on pg_catalog.generate_series y
Output: (($3 = (InitPlan 1).$0) AND ($4 = (InitPlan 1).$1) AND
((($1 = (SubPlan 2).$3) AND ($2 = (SubPlan 2).$4))))
Function Call: generate_series(1, 3)
InitPlan 1 (returns $0,$1)
-> Seq Scan on public.foo
Output: foo.x, foo.y
Filter: (foo.id = foo.y)
SubPlan 2 (returns $3,$4)
-> Aggregate
Output: min((x.x + y.y)), max((x.x + y.y))
-> Function Scan on pg_catalog.generate_series x
Output: x.x
Function Call: generate_series(1, 3)

It's a little more verbose in this case, but in a lot of other cases
it ended up being more compact.

The code is a bit messy, but I think the regression test output
(attached) is clearer and easier to interpret. SubPlans and InitPlans
are displayed consistently, and it's easier to distinguish
SubPlan/InitPlan outputs from external parameters.

There are a few more regression test changes, corresponding to cases
where InitPlans are referenced, such as:

  Seq Scan on document
-   Filter: ((dlevel <= $0) AND f_leak(dtitle))
+   Filter: ((dlevel <= (InitPlan 1).$0) AND f_leak(dtitle))
    InitPlan 1 (returns $0)
      ->  Index Scan using uaccount_pkey on uaccount
            Index Cond: (pguser = CURRENT_USER)

but I think that's useful extra clarification.

Regards,
Dean

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Dean Rasheed (#8)

Re: Improving EXPLAIN's display of SubPlan nodes

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

One thing that concerns me about making even greater use of "$n" is
the potential for confusion with generic plan parameters.

True.

Another possibility is to put the SubPlan and InitPlan names inline,
rather than outputting "FROM SubPlan ...". I had a go at hacking that
up and this was the result:

Output: (($3 = (InitPlan 1).$0) AND ($4 = (InitPlan 1).$1) AND
((($1 = (SubPlan 2).$3) AND ($2 = (SubPlan 2).$4))))

Hmm. I guess what bothers me about that is that it could be read to
suggest that the initplan or subplan is evaluated again for each
output parameter. Perhaps it'll be sufficiently clear as long as
we keep the labeling

InitPlan 1 (returns $0,$1)
SubPlan 2 (returns $3,$4)

but I'm not sure. Anybody else have an opinion?

(I didn't read your changes to the code yet --- I think at this
point we can just debate proposed output without worrying about
how to implement it.)

regards, tom lane

#10

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Tom Lane (#9)

Re: Improving EXPLAIN's display of SubPlan nodes

I wrote:

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

One thing that concerns me about making even greater use of "$n" is
the potential for confusion with generic plan parameters.

True.

After looking at your draft some more, it occurred to me that we're
not that far from getting rid of showing "$n" entirely in this
context. Instead of (subplan_name).$n, we could write something like
(subplan_name).colN or (subplan_name).columnN or (subplan_name).fN,
depending on your taste for verboseness. "fN" would correspond to the
names we assign to columns of anonymous record types, but it hasn't
got much else to recommend it. In the attached I used "colN";
"columnN" would be my next choice.

You could also imagine trying to use the sub-SELECT's actual output
column names, but I fear that would be ambiguous --- too often it'd
be "?column?" or some other non-unique name.

Hmm. I guess what bothers me about that is that it could be read to
suggest that the initplan or subplan is evaluated again for each
output parameter.

This objection seems like it could be solved through documentation,
so I wrote some.

The attached proof-of-concept is incomplete: it fails to replace some
$n occurrences with subplan references, as is visible in some of the
test cases. I believe your quick hack in get_parameter() is not
correct in detail, but for the moment I didn't bother to debug it.
I'm just presenting this as a POC to see if this is the direction
people would like to go in. If there's not objections, I'll try to
make a bulletproof implementation.

regards, tom lane

#11

Dean Rasheed

dean.a.rasheed@gmail.com

about 2 years ago

In reply to: Tom Lane (#10)

Re: Improving EXPLAIN's display of SubPlan nodes

On Sat, 16 Mar 2024 at 17:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:

After looking at your draft some more, it occurred to me that we're
not that far from getting rid of showing "$n" entirely in this
context. Instead of (subplan_name).$n, we could write something like
(subplan_name).colN or (subplan_name).columnN or (subplan_name).fN,
depending on your taste for verboseness. "fN" would correspond to the
names we assign to columns of anonymous record types, but it hasn't
got much else to recommend it. In the attached I used "colN";
"columnN" would be my next choice.

Using the column number rather than the parameter index looks a lot
neater, especially in output with multiple subplans. Of those choices,
"colN" looks nicest, however...

I think it would be confusing if there were tables whose columns are
named "colN". In that case, a SQL qual like "t1.col2 = t2.col2" might
be output as something like "t1.col2 = (SubPlan 1).col3", since the
subplan's targetlist wouldn't necessarily just output the table
columns in order.

I actually think "$n" is not so bad (especially if we make n the
column number). The fact that it's qualified by the subplan name ought
to be sufficient to avoid it being confused with an external
parameter. Maybe there are other options, but I think it's important
to choose something that's unlikely to be confused with a real column
name.

Whatever name is chosen, I think we should still output "(returns
...)" on the subplan nodes. In a complex query there might be a lot of
output columns, and the expressions might be quite complex, making it
hard to see how many columns the subplan is returning. Besides,
without that, it might not be obvious to everyone what "colN" (or
whatever we settle on) means in places that refer to the subplan.

You could also imagine trying to use the sub-SELECT's actual output
column names, but I fear that would be ambiguous --- too often it'd
be "?column?" or some other non-unique name.

Yeah, I think that's a non-starter, because the output column names
aren't necessarily unique.

The attached proof-of-concept is incomplete: it fails to replace some
$n occurrences with subplan references, as is visible in some of the
test cases. I believe your quick hack in get_parameter() is not
correct in detail, but for the moment I didn't bother to debug it.

Yeah, that's exactly what it was, a quick hack. I just wanted to get
some output to see what it would look like in a few real cases.

Overall, I think this is heading in the right direction. I think we
just need a good way to say "the n'th output column of the subplan",
that can't be confused with anything else in the output.

Regards,
Dean

#12

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Dean Rasheed (#11)

Re: Improving EXPLAIN's display of SubPlan nodes

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On Sat, 16 Mar 2024 at 17:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:

After looking at your draft some more, it occurred to me that we're
not that far from getting rid of showing "$n" entirely in this
context. Instead of (subplan_name).$n, we could write something like
(subplan_name).colN or (subplan_name).columnN or (subplan_name).fN,

I think it would be confusing if there were tables whose columns are
named "colN". In that case, a SQL qual like "t1.col2 = t2.col2" might
be output as something like "t1.col2 = (SubPlan 1).col3", since the
subplan's targetlist wouldn't necessarily just output the table
columns in order.

Perhaps. I think people who are using columns named like that are
already accustomed to having to pay close attention to which table
the column is shown as qualified by. So I'm not sure there'd really
be much problem in practice.

I actually think "$n" is not so bad (especially if we make n the
column number). The fact that it's qualified by the subplan name ought
to be sufficient to avoid it being confused with an external
parameter.

That's an interesting compromise position, but I'm not sure that
it buys much compared to the approach shown in your draft (that
is, continuing to use the real param IDs). The real IDs at least
have the advantage of being unique.

Whatever name is chosen, I think we should still output "(returns
...)" on the subplan nodes.

We should do that if we continue to show real param IDs, but
if we change to using column numbers then I think it's pointless.
Output like

SubPlan 1 (returns $1,$2)
...
SubPlan 2 (returns $1,$2,$3)

seems to me that it'd be more confusing not less so. Does SubPlan 2
return the same values as SubPlan 1 plus more?

Overall, I think this is heading in the right direction. I think we
just need a good way to say "the n'th output column of the subplan",
that can't be confused with anything else in the output.

We could consider notations like "(SubPlan 1 column 2)", which
couldn't be confused with anything else, if only because a name
like that would have to be double-quoted. It's a little more
verbose but not that much. I fear "(SubPlan 1 col 2)" is too
short to be clear.

regards, tom lane

#13

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Tom Lane (#12)

Re: Improving EXPLAIN's display of SubPlan nodes

I wrote:

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

Overall, I think this is heading in the right direction. I think we
just need a good way to say "the n'th output column of the subplan",
that can't be confused with anything else in the output.

We could consider notations like "(SubPlan 1 column 2)", which
couldn't be confused with anything else, if only because a name
like that would have to be double-quoted. It's a little more
verbose but not that much. I fear "(SubPlan 1 col 2)" is too
short to be clear.

Here's a cleaned-up version that seems to successfully resolve
PARAM_EXEC references in all cases. I haven't changed the
"(plan_name).colN" notation, but that's an easy fix once we have
consensus on the spelling.

There are two other loose ends bothering me:

1. I see that Gather nodes sometimes print things like

-> Gather (actual rows=N loops=N)
Workers Planned: 2
Params Evaluated: $0, $1
Workers Launched: N

This now sticks out like a sore thumb, because there's no other
reference to $0 or $1 in the EXPLAIN output. We could possibly
adjust the code to print something like

Params Evaluated: (InitPlan 1).col1, (InitPlan 2).col1

but I think that's pretty silly. This looks to me like a code
debugging aid that shouldn't have survived past initial development.
It's of zero use to end users, and it doesn't correspond to anything
we bother to mention in EXPLAIN output in any other case: initplans
just magically get evaluated at the right times. I propose we
nuke the "Params Evaluated" output altogether.

2. MULTIEXPR_SUBLINK subplans now result in EXPLAIN output like

explain (verbose, costs off)
update inhpar i set (f1, f2) = (select i.f1, i.f2 || '-' from int4_tbl limit 1);
...
-> Result
Output: (SubPlan 1).col1, (SubPlan 1).col2, (SubPlan 1), i.tableoid, i.ctid

The undecorated reference to (SubPlan 1) is fairly confusing, since
it doesn't correspond to anything that will actually get output.
I suggest that perhaps instead this should read

Output: (SubPlan 1).col1, (SubPlan 1).col2, IGNORE(SubPlan 1), i.tableoid, i.ctid

Output: (SubPlan 1).col1, (SubPlan 1).col2, RESET(SubPlan 1), i.tableoid, i.ctid

the point of "RESET()" being that what the executor actually does
there is to re-arm the SubPlan to be evaluated again on the next
pass through the targetlist. I'm not greatly in love with either
of those ideas, though. Any thoughts?

regards, tom lane

#14

Dean Rasheed

dean.a.rasheed@gmail.com

about 2 years ago

In reply to: Tom Lane (#13)

Re: Improving EXPLAIN's display of SubPlan nodes

On Sun, 17 Mar 2024 at 19:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Here's a cleaned-up version that seems to successfully resolve
PARAM_EXEC references in all cases. I haven't changed the
"(plan_name).colN" notation, but that's an easy fix once we have
consensus on the spelling.

I took a more detailed look at this and the code and doc changes all
look good to me.

There's a comment at the end of find_param_generator() that should
probably say "No generator found", rather than "No referent found".

The get_rule_expr() code could perhaps be simplified a bit, getting
rid of the show_subplan_name variable and moving the recursive calls
to get_rule_expr() to after the switch statement -- if testexpr is
non-NULL, print it, else print the subplan name probably works for all
subplan types.

The "colN" notation has grown on me, especially when you look at
examples like those in partition_prune.out with a mix of Param types.
Not using "$n" for 2 different purposes is good, and I much prefer
this to the original "FROM [HASHED] SubPlan N (returns ...)" notation.

There are two other loose ends bothering me:

1. I see that Gather nodes sometimes print things like

-> Gather (actual rows=N loops=N)
Workers Planned: 2
Params Evaluated: $0, $1
Workers Launched: N

I propose we nuke the "Params Evaluated" output altogether.

2. MULTIEXPR_SUBLINK subplans now result in EXPLAIN output like

-> Result
Output: (SubPlan 1).col1, (SubPlan 1).col2, (SubPlan 1), i.tableoid, i.ctid

The undecorated reference to (SubPlan 1) is fairly confusing, since
it doesn't correspond to anything that will actually get output.
I suggest that perhaps instead this should read

Output: (SubPlan 1).col1, (SubPlan 1).col2, IGNORE(SubPlan 1), i.tableoid, i.ctid

or

Output: (SubPlan 1).col1, (SubPlan 1).col2, RESET(SubPlan 1), i.tableoid, i.ctid

I think "RESET()" or "RESCAN()" or something like that is better than
"INGORE()", because it indicates that it is actually doing something.
I don't really have a better idea. Perhaps not all uppercase though,
since that seems to go against the rest of the EXPLAIN output.

Regards,
Dean

#15

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Dean Rasheed (#14)

Re: Improving EXPLAIN's display of SubPlan nodes

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

The get_rule_expr() code could perhaps be simplified a bit, getting
rid of the show_subplan_name variable and moving the recursive calls
to get_rule_expr() to after the switch statement -- if testexpr is
non-NULL, print it, else print the subplan name probably works for all
subplan types.

Oooh, good idea. The symmetry wasn't apparent when we started, but
it's there now, and the code does look nicer this way.

The "colN" notation has grown on me, especially when you look at
examples like those in partition_prune.out with a mix of Param types.

OK, I've left it like that in the attached v5, but I'm still open
to other opinions.

The undecorated reference to (SubPlan 1) is fairly confusing, since
it doesn't correspond to anything that will actually get output.
I suggest that perhaps instead this should read
Output: (SubPlan 1).col1, (SubPlan 1).col2, IGNORE(SubPlan 1), i.tableoid, i.ctid
or
Output: (SubPlan 1).col1, (SubPlan 1).col2, RESET(SubPlan 1), i.tableoid, i.ctid

I think "RESET()" or "RESCAN()" or something like that is better than
"INGORE()", because it indicates that it is actually doing something.
I don't really have a better idea. Perhaps not all uppercase though,
since that seems to go against the rest of the EXPLAIN output.

Hm. I used "rescan(SubPlan)" in the attached, but it still looks
a bit odd to my eye.

I did some more work on the documentation too, to show the difference
between hashed and not-hashed subplans. I feel like we're pretty
close here, with the possible exception of how to show MULTIEXPR.

regards, tom lane

#16

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Tom Lane (#15)

Re: Improving EXPLAIN's display of SubPlan nodes

I wrote:

Hm. I used "rescan(SubPlan)" in the attached, but it still looks
a bit odd to my eye.

After thinking a bit more, I understood what was bothering me about
that notation: it looks too much like a call of a user-defined
function named "rescan()". I think we'd be better off with the
all-caps "RESCAN()".

regards, tom lane

#17

Dean Rasheed

dean.a.rasheed@gmail.com

about 2 years ago

In reply to: Tom Lane (#16)

Re: Improving EXPLAIN's display of SubPlan nodes

On Mon, 18 Mar 2024 at 23:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hm. I used "rescan(SubPlan)" in the attached, but it still looks
a bit odd to my eye.

After thinking a bit more, I understood what was bothering me about
that notation: it looks too much like a call of a user-defined
function named "rescan()". I think we'd be better off with the
all-caps "RESCAN()".

Or perhaps move the parentheses, and write "(rescan SubPlan N)" or
"(reset SubPlan N)". Dunno.

Regards,
Dean

#18

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Dean Rasheed (#17)

Re: Improving EXPLAIN's display of SubPlan nodes

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On Mon, 18 Mar 2024 at 23:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:

After thinking a bit more, I understood what was bothering me about
that notation: it looks too much like a call of a user-defined
function named "rescan()". I think we'd be better off with the
all-caps "RESCAN()".

Or perhaps move the parentheses, and write "(rescan SubPlan N)" or
"(reset SubPlan N)". Dunno.

Oh, I like that! It seems rather parallel to the existing "hashed"
annotation. If I had it to do over, I'd likely do the "hashed"
bit differently --- but as the proposal currently stands, we are
not changing "hashed", so we might as well double down on that.

I won't update the patch right now, but "(rescan SubPlan N)"
seems like a winner to me.

regards, tom lane

#19

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Tom Lane (#18)

Re: Improving EXPLAIN's display of SubPlan nodes

I wrote:

I won't update the patch right now, but "(rescan SubPlan N)"
seems like a winner to me.

Here's a hopefully-final version that makes that adjustment and
tweaks a couple of comments.

regards, tom lane

#20

Dean Rasheed

dean.a.rasheed@gmail.com

about 2 years ago

In reply to: Tom Lane (#19)

Re: Improving EXPLAIN's display of SubPlan nodes

On Tue, 19 Mar 2024 at 16:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Here's a hopefully-final version that makes that adjustment and
tweaks a couple of comments.

This looks very good to me.

One final case that could possibly be improved is this one from aggregates.out:

explain (verbose, costs off)
select array(select sum(x+y) s
from generate_series(1,3) y group by y order by s)
from generate_series(1,3) x;
QUERY PLAN
-------------------------------------------------------------------
Function Scan on pg_catalog.generate_series x
Output: ARRAY(SubPlan 1)
Function Call: generate_series(1, 3)
SubPlan 1
-> Sort
Output: (sum((x.x + y.y))), y.y
Sort Key: (sum((x.x + y.y)))
-> HashAggregate
Output: sum((x.x + y.y)), y.y
Group Key: y.y
-> Function Scan on pg_catalog.generate_series y
Output: y.y
Function Call: generate_series(1, 3)

ARRAY operates on a SELECT with a single targetlist item, but in this
case it looks like the subplan output has 2 columns, which might
confuse people.

I wonder if we should output "ARRAY((SubPlan 1).col1)" to make it
clearer. Since ARRAY_SUBLINK is a special case, which always collects
the first column's values, we could just always output "col1" for
ARRAY.

Regards,
Dean

#21

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Dean Rasheed (#20)

#22

Dean Rasheed

dean.a.rasheed@gmail.com

about 2 years ago

In reply to: Tom Lane (#21)

#23