Bogus EXPLAIN results with column aliases for mismatched partitions

Started by Tom Laneover 6 years ago9 messageshackers

tgl@sss.pgh.pa.us

over 6 years ago

I started to fool around with the ruleutils.c rewrite discussed in [1]/messages/by-id/001001d4f44b$2a2cca50$7e865ef0$@lab.ntt.co.jp,
and ran into an independent bug: if you attach column aliases to a
partitioned table, and some of the partitions don't have an exact match of
column attnums, then EXPLAIN uses the wrong aliases for those partitions.
As an example, after modifying partition_prune.sql like this:

diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index a5900e5..41f0b6f 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -202,8 +202,13 @@ CREATE TABLE part (a INT, b INT) PARTITION BY LIST (a);
 CREATE TABLE part_p1 PARTITION OF part FOR VALUES IN (-2,-1,0,1,2);
 CREATE TABLE part_p2 PARTITION OF part DEFAULT PARTITION BY RANGE(a);
 CREATE TABLE part_p2_p1 PARTITION OF part_p2 DEFAULT;
+CREATE TABLE part_rev (b INT, c INT, a INT);
+ALTER TABLE part ATTACH PARTITION part_rev FOR VALUES IN (3);  -- fail
+ALTER TABLE part_rev DROP COLUMN c;
+ALTER TABLE part ATTACH PARTITION part_rev FOR VALUES IN (3);  -- now it's ok
 INSERT INTO part VALUES (-1,-1), (1,1), (2,NULL), (NULL,-2),(NULL,NULL);
 EXPLAIN (COSTS OFF) SELECT tableoid::regclass as part, a, b FROM part WHERE a IS NULL ORDER BY 1, 2, 3;
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM part p(x) ORDER BY x;

--
-- some more cases

then the EXPLAIN output produced by HEAD looks like:

QUERY PLAN
-----------------------------------------------
Sort
Output: p.x, p.b
Sort Key: p.x
-> Append
-> Seq Scan on public.part_p1 p
Output: p.x, p.b
-> Seq Scan on public.part_rev p_1
Output: p_1.a, p_1.x
-> Seq Scan on public.part_p2_p1 p_2
Output: p_2.x, p_2.b
(10 rows)

wherein the "x" alias for column "a" has been applied to part_rev.b.
That's wrong and confusing.

The reason this happens is that expand_single_inheritance_child()
just clones the parent RTE's alias node without any thought for
the possibility that the columns don't match one-to-one. It's
an ancient bug that affects traditional inheritance as well as
partitioning.

I experimented with fixing this by making expand_single_inheritance_child
generate a correctly-adjusted child alias node, which seems reasonable
since it takes pains to adjust the rest of the child RTE for the different
column layout. It turns out to be slightly tedious to do that without
causing a lot of regression test diffs: if we add an alias node where
there was none before, that affects ruleutils.c's selection of table
aliases not just column aliases. The "variant-a" patch below mostly
succeeds in avoiding test diffs, but it adds a fair amount of complication
to inherit.c. The "variant-b" patch below uses a simpler way of setting
up the child aliases, which results in a whole lot of test diffs: all
children of a parent named "x" will now show in EXPLAIN with aliases like
"x_1", "x_2", etc. (That happens already if you wrote an explicit table
alias "x", but not if you didn't.) While my initial reaction was that
that was an unacceptable amount of churn, the idea gets more appealing the
more I think about it. It means that tables you did not name in the query
will be shown with aliases that clearly identify their connection to
something you did name. So despite the added churn, I'm kind of attracted
to the variant-b approach. (Note that the discussion in [1]/messages/by-id/001001d4f44b$2a2cca50$7e865ef0$@lab.ntt.co.jp is almost
certainly going to result in some changes to ruleutils.c's alias-selection
behavior anyway, so I don't think staying exactly compatible with v12 is
worth much here.)

On the other hand, if we went with variant-a it might be plausible to
back-patch this fix. But given the fact that this is a mostly cosmetic
problem, and we've not had field complaints, I don't feel a big need
to fix it in the back branches.

Some other loose ends:

* variant-a's diffs in expected/postgres_fdw.out indicate that
postgres_fdw is doing something weird with the table aliases it selects to
print in the "Relations:" output. I think this is an independent bug ---
and it surely *is* a bug, because the aliases printed by HEAD don't agree
with the table aliases used for Vars of those relations. But I haven't
run it to ground yet. (Notice that variant-b fixes those discrepancies in
the opposite direction...)

* To make computing the modified column alias list cheap, I made
make_inh_translation_list() compute a reverse-mapping array showing the
parent column associated with each child column. I'm more than a little
bit tempted to add that array to the AppendRelInfo struct, instead of
passing it back separately and then discarding it. We could use the info
to simplify and speed up the reverse-mapping logic added by commit
553d2ec27, and I bet there will be other use-cases later. But I've not
done that in these patches.

Thoughts, objections, better ideas?

regards, tom lane

[1]: /messages/by-id/001001d4f44b$2a2cca50$7e865ef0$@lab.ntt.co.jp

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Tom Lane (#1)

Re: Bogus EXPLAIN results with column aliases for mismatched partitions

I wrote:

* variant-a's diffs in expected/postgres_fdw.out indicate that
postgres_fdw is doing something weird with the table aliases it selects to
print in the "Relations:" output. I think this is an independent bug ---
and it surely *is* a bug, because the aliases printed by HEAD don't agree
with the table aliases used for Vars of those relations. But I haven't
run it to ground yet. (Notice that variant-b fixes those discrepancies in
the opposite direction...)

I checked that, and indeed postgres_fdw is doing something randomly
different from what ruleutils does. In set_rtable_names(), the
first priority is rte->alias->aliasname, and if that's not set
then (for a RELATION RTE) you get the result of get_rel_name().
postgres_fdw is taking rte->eref->aliasname as being the alias,
which is usually the same string, but "usually" doesn't cut it.
So we should make it look at rte->alias instead.

Now, there is another thing that set_rtable_names() is doing that
postgres_fdw doesn't do, which is to unique-ify the chosen alias
names by adding "_NN" if the querytree contains multiple uses of
the same table alias. I don't see any practical way for postgres_fdw
to match that behavior, since it lacks global information about the
query. If we wanted to fix it, what we'd likely need to do is
postpone creation of the relation_name strings until EXPLAIN,
providing some way for postgres_fdw to ask ruleutils.c what alias
it'd assigned to a particular RTE. This seems like it wouldn't be
terribly painful for base relations but it'd be a mess for joins
and aggregates, so I'm not eager to do something like that.
In practice the presence or absence of "_NN" might not be too
confusing --- it's certainly not as bad as the inconsistency being
shown now.

In short then I propose the attached fix for this issue.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Tom Lane (#2)

Re: Bogus EXPLAIN results with column aliases for mismatched partitions

I wrote:

Now, there is another thing that set_rtable_names() is doing that
postgres_fdw doesn't do, which is to unique-ify the chosen alias
names by adding "_NN" if the querytree contains multiple uses of
the same table alias. I don't see any practical way for postgres_fdw
to match that behavior, since it lacks global information about the
query. If we wanted to fix it, what we'd likely need to do is
postpone creation of the relation_name strings until EXPLAIN,
providing some way for postgres_fdw to ask ruleutils.c what alias
it'd assigned to a particular RTE.

Hmmm ... so actually, that isn't *quite* as bad as I thought:
ExplainState does already expose that information, so we just need
to rearrange how postgres_fdw does things. Here's a revised proposed
patch, which exposes (and fixes) several additional test cases where
the Relations: string was previously visibly out of sync with what
ruleutils was using for Var names.

BTW, the existing code always schema-qualifies the relation names,
on the rather lame grounds that it's producing the string without
knowing whether EXPLAIN VERBOSE will be specified. In this code,
the verbose flag is available so it would be trivial to make the
output conform to EXPLAIN's normal policy. I didn't change that
here because there'd be a bunch more test output diffs of no
intellectual interest. Should we change it, or leave well enough
alone?

regards, tom lane

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

over 6 years ago

In reply to: Tom Lane (#1)

Re: Bogus EXPLAIN results with column aliases for mismatched partitions

On Sun, Dec 1, 2019 at 11:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

then the EXPLAIN output produced by HEAD looks like:

QUERY PLAN
-----------------------------------------------
Sort
Output: p.x, p.b
Sort Key: p.x
-> Append
-> Seq Scan on public.part_p1 p
Output: p.x, p.b
-> Seq Scan on public.part_rev p_1
Output: p_1.a, p_1.x
-> Seq Scan on public.part_p2_p1 p_2
Output: p_2.x, p_2.b
(10 rows)

wherein the "x" alias for column "a" has been applied to part_rev.b.
That's wrong and confusing.

Agreed.

The reason this happens is that expand_single_inheritance_child()
just clones the parent RTE's alias node without any thought for
the possibility that the columns don't match one-to-one. It's
an ancient bug that affects traditional inheritance as well as
partitioning.

I experimented with fixing this by making expand_single_inheritance_child
generate a correctly-adjusted child alias node, which seems reasonable
since it takes pains to adjust the rest of the child RTE for the different
column layout. It turns out to be slightly tedious to do that without
causing a lot of regression test diffs: if we add an alias node where
there was none before, that affects ruleutils.c's selection of table
aliases not just column aliases. The "variant-a" patch below mostly
succeeds in avoiding test diffs, but it adds a fair amount of complication
to inherit.c. The "variant-b" patch below uses a simpler way of setting
up the child aliases, which results in a whole lot of test diffs: all
children of a parent named "x" will now show in EXPLAIN with aliases like
"x_1", "x_2", etc. (That happens already if you wrote an explicit table
alias "x", but not if you didn't.) While my initial reaction was that
that was an unacceptable amount of churn, the idea gets more appealing the
more I think about it. It means that tables you did not name in the query
will be shown with aliases that clearly identify their connection to
something you did name. So despite the added churn, I'm kind of attracted
to the variant-b approach. (Note that the discussion in [1] is almost
certainly going to result in some changes to ruleutils.c's alias-selection
behavior anyway, so I don't think staying exactly compatible with v12 is
worth much here.)

On the other hand, if we went with variant-a it might be plausible to
back-patch this fix. But given the fact that this is a mostly cosmetic
problem, and we've not had field complaints, I don't feel a big need
to fix it in the back branches.

I tend to agree that "variant b" is more appealing for the reason that
it makes the connection between child RTEs and that of the table named
in the query from which they originate more explicit.

* To make computing the modified column alias list cheap, I made
make_inh_translation_list() compute a reverse-mapping array showing the
parent column associated with each child column. I'm more than a little
bit tempted to add that array to the AppendRelInfo struct, instead of
passing it back separately and then discarding it. We could use the info
to simplify and speed up the reverse-mapping logic added by commit
553d2ec27, and I bet there will be other use-cases later. But I've not
done that in these patches.

Thanks,
Amit

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

over 6 years ago

In reply to: Tom Lane (#3)

Re: Bogus EXPLAIN results with column aliases for mismatched partitions

On Mon, Dec 2, 2019 at 6:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Now, there is another thing that set_rtable_names() is doing that
postgres_fdw doesn't do, which is to unique-ify the chosen alias
names by adding "_NN" if the querytree contains multiple uses of
the same table alias. I don't see any practical way for postgres_fdw
to match that behavior, since it lacks global information about the
query. If we wanted to fix it, what we'd likely need to do is
postpone creation of the relation_name strings until EXPLAIN,
providing some way for postgres_fdw to ask ruleutils.c what alias
it'd assigned to a particular RTE.

Hmmm ... so actually, that isn't *quite* as bad as I thought:
ExplainState does already expose that information, so we just need
to rearrange how postgres_fdw does things. Here's a revised proposed
patch, which exposes (and fixes) several additional test cases where
the Relations: string was previously visibly out of sync with what
ruleutils was using for Var names.

One thing I noticed is this comment:

    /*
     * Add names of relation handled by the foreign scan when the scan is a
-    * join
+    * join.  The input looks something like "(1) LEFT JOIN (2)", and we must
+    * replace the digit strings with the correct relation names.  We do that
+    * here, not when the plan is created, because we can't know what aliases
+    * ruleutils.c will assign at plan creation time.
     */

I think "names of relation" should be "names of relations", so how
about fixing that as well? Other than that the patch looks good to
me. Thanks for working on this! (Actually, we discussed this before.
See [1]/messages/by-id/c2c7191b-5ca0-b37a-9e9d-4df15ffb554b@lab.ntt.co.jp. I wasn't able to come up with a solution, though.)

BTW, the existing code always schema-qualifies the relation names,
on the rather lame grounds that it's producing the string without
knowing whether EXPLAIN VERBOSE will be specified. In this code,
the verbose flag is available so it would be trivial to make the
output conform to EXPLAIN's normal policy. I didn't change that
here because there'd be a bunch more test output diffs of no
intellectual interest. Should we change it, or leave well enough
alone?

I think it would be better to keep that as-is because otherwise, in
case of a foreign join or aggregate, EXPLAIN without the VERBOSE
option won't show any information about foreign tables involved in
that foreign join or aggregate, which isn't useful for users.

Best regards,
Etsuro Fujita

[1]: /messages/by-id/c2c7191b-5ca0-b37a-9e9d-4df15ffb554b@lab.ntt.co.jp

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Etsuro Fujita (#5)

Re: Bogus EXPLAIN results with column aliases for mismatched partitions

Etsuro Fujita <etsuro.fujita@gmail.com> writes:

I think "names of relation" should be "names of relations", so how
about fixing that as well?

Ah, missed that.

On Mon, Dec 2, 2019 at 6:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

BTW, the existing code always schema-qualifies the relation names,
on the rather lame grounds that it's producing the string without
knowing whether EXPLAIN VERBOSE will be specified. In this code,
the verbose flag is available so it would be trivial to make the
output conform to EXPLAIN's normal policy. I didn't change that
here because there'd be a bunch more test output diffs of no
intellectual interest. Should we change it, or leave well enough
alone?

I think it would be better to keep that as-is because otherwise, in
case of a foreign join or aggregate, EXPLAIN without the VERBOSE
option won't show any information about foreign tables involved in
that foreign join or aggregate, which isn't useful for users.

No, I'm just talking about dropping the schema-qualification of table
names when !es->verbose, not removing the Relations: output altogether.
That would be more consistent with the rest of EXPLAIN's output.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Tom Lane (#6)

Re: Bogus EXPLAIN results with column aliases for mismatched partitions

I wrote:

Etsuro Fujita <etsuro.fujita@gmail.com> writes:

On Mon, Dec 2, 2019 at 6:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

BTW, the existing code always schema-qualifies the relation names,
on the rather lame grounds that it's producing the string without
knowing whether EXPLAIN VERBOSE will be specified. In this code,
the verbose flag is available so it would be trivial to make the
output conform to EXPLAIN's normal policy. I didn't change that
here because there'd be a bunch more test output diffs of no
intellectual interest. Should we change it, or leave well enough
alone?

I think it would be better to keep that as-is because otherwise, in
case of a foreign join or aggregate, EXPLAIN without the VERBOSE
option won't show any information about foreign tables involved in
that foreign join or aggregate, which isn't useful for users.

No, I'm just talking about dropping the schema-qualification of table
names when !es->verbose, not removing the Relations: output altogether.
That would be more consistent with the rest of EXPLAIN's output.

Concretely, I'm thinking of the attached (on top of the other patch,
which I just pushed). This agrees exactly with what ExplainTargetRel
does for regular scans.

One could make an argument that we should schema-qualify, regardless
of the "verbose" flag, if the output format isn't EXPLAIN_FORMAT_TEXT.
That would reduce the amount of variability that plan analysis tools
have to cope with. However, ExplainTargetRel itself doesn't provide
the schema in non-verbose mode. Maybe it should, ie we should change
it like

         case T_ModifyTable:
             /* Assert it's on a real relation */
             Assert(rte->rtekind == RTE_RELATION);
             objectname = get_rel_name(rte->relid);
-            if (es->verbose)
+            if (es->verbose || es->format != EXPLAIN_FORMAT_TEXT)
                 namespace = get_namespace_name(get_rel_namespace(rte->relid));
             objecttag = "Relation Name";
             break;

(and likewise for function schema names, a bit further down)?

regards, tom lane

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

over 6 years ago

In reply to: Tom Lane (#7)

Re: Bogus EXPLAIN results with column aliases for mismatched partitions

On Tue, Dec 3, 2019 at 6:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Etsuro Fujita <etsuro.fujita@gmail.com> writes:

On Mon, Dec 2, 2019 at 6:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

BTW, the existing code always schema-qualifies the relation names,
on the rather lame grounds that it's producing the string without
knowing whether EXPLAIN VERBOSE will be specified. In this code,
the verbose flag is available so it would be trivial to make the
output conform to EXPLAIN's normal policy. I didn't change that
here because there'd be a bunch more test output diffs of no
intellectual interest. Should we change it, or leave well enough
alone?

I think it would be better to keep that as-is because otherwise, in
case of a foreign join or aggregate, EXPLAIN without the VERBOSE
option won't show any information about foreign tables involved in
that foreign join or aggregate, which isn't useful for users.

No, I'm just talking about dropping the schema-qualification of table
names when !es->verbose, not removing the Relations: output altogether.
That would be more consistent with the rest of EXPLAIN's output.

Sorry, I misread the meaning.

Concretely, I'm thinking of the attached (on top of the other patch,
which I just pushed). This agrees exactly with what ExplainTargetRel
does for regular scans.

Thanks for the patch! (The patch didn't apply to HEAD cleanly,
though.) I like consistency, so +1 for the change.

One could make an argument that we should schema-qualify, regardless
of the "verbose" flag, if the output format isn't EXPLAIN_FORMAT_TEXT.
That would reduce the amount of variability that plan analysis tools
have to cope with. However, ExplainTargetRel itself doesn't provide
the schema in non-verbose mode. Maybe it should, ie we should change
it like
case T_ModifyTable:
/* Assert it's on a real relation */
Assert(rte->rtekind == RTE_RELATION);
objectname = get_rel_name(rte->relid);
-            if (es->verbose)
+            if (es->verbose || es->format != EXPLAIN_FORMAT_TEXT)
namespace = get_namespace_name(get_rel_namespace(rte->relid));
objecttag = "Relation Name";
break;
(and likewise for function schema names, a bit further down)?

Seems like another issue to me.

Best regards,
Etsuro Fujita

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Etsuro Fujita (#8)

Re: Bogus EXPLAIN results with column aliases for mismatched partitions

Etsuro Fujita <etsuro.fujita@gmail.com> writes:

On Tue, Dec 3, 2019 at 6:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Concretely, I'm thinking of the attached (on top of the other patch,
which I just pushed). This agrees exactly with what ExplainTargetRel
does for regular scans.

Thanks for the patch! (The patch didn't apply to HEAD cleanly,
though.) I like consistency, so +1 for the change.

Yeah, 55a1954da probably changed the expected output from what that
has. I'll clean it up and push.

One could make an argument that we should schema-qualify, regardless
of the "verbose" flag, if the output format isn't EXPLAIN_FORMAT_TEXT.
That would reduce the amount of variability that plan analysis tools
have to cope with. However, ExplainTargetRel itself doesn't provide
the schema in non-verbose mode. Maybe it should, ie we should change
it like ...

Seems like another issue to me.

Agreed. When/if we change that, we could make this code follow along.

regards, tom lane

Bogus EXPLAIN results with column aliases for mismatched partitions

Attachments:

Attachments:

Attachments:

Attachments: