BUG #18589: pg_get_viewdef returns wrong query

Started by PG Bug reporting formover 1 year ago10 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18589
Logged by: Quynh Tran
Email address: tlhquynh@gmail.com
PostgreSQL version: 16.3
Operating system: cloud.google.com/container-optimized-os
Description:

Hi,

In PostgreSQL 16.3, after I created a view by a statement like
"create view kvview as select key as value, value as key from kv order by
value",
I retrieved a different definition from information_schema.views
"SELECT key AS value, value AS key FROM kv ORDER BY key".
I expect ORDER BY column be value.

I tried the same in PostgreSQL 14.12 and 15.7 and got correct equivalent
definitions.
"SELECT keyvalue.key AS value, keyvalue.value AS key FROM keyvalue ORDER BY
keyvalue.key"

So this seems to be a regression.

Below are steps to reproduce in PostgreSQL 16.3:

create table kv (key int primary key, value text);

Query OK, 0 rows affected (5.27 sec)

insert into kv values (1, 'z'), (2, 'z'), (3, 'y');

Query OK, 3 rows affected (0.40 sec)

select key as value, value as key from kv order by value; -- This is what

we want to see.
+-------+-----+
| value | key |
+-------+-----+
| 1 | z |
| 2 | z |
| 3 | y |
+-------+-----+
3 rows in set (4.04 msecs)

create view kvview as select key as value, value as key from kv order by

value; -- Create view with same definition as the query above.
Query OK, 0 rows affected (11.00 sec)

select * from kvview; -- View also has correct result.

+-------+-----+
| value | key |
+-------+-----+
| 1 | z |
| 2 | z |
| 3 | y |
+-------+-----+
3 rows in set (4.05 msecs)

select table_name, view_definition from information_schema.views; -- But

information_schema displays the wrong view definition!
+------------+------------------------------------------------------------+
| table_name | view_definition                                           
|
+------------+------------------------------------------------------------+
| kvview | SELECT key AS value, value AS key FROM kv ORDER BY key |
+------------+------------------------------------------------------------+
1 rows in set (11.67 msecs)

select key as value, value as key from kv order by key; -- The view

definition would give the wrong result if that were what was actually
executed.
+-------+-----+
| value | key |
+-------+-----+
| 3 | y |
| 1 | z |
| 2 | z |
+-------+-----+
3 rows in set (16.76 msecs)

#2Richard Guo
guofenglinux@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #18589: pg_get_viewdef returns wrong query

On Mon, Aug 26, 2024 at 7:22 PM PG Bug reporting form
<noreply@postgresql.org> wrote:

In PostgreSQL 16.3, after I created a view by a statement like
"create view kvview as select key as value, value as key from kv order by
value",
I retrieved a different definition from information_schema.views
"SELECT key AS value, value AS key FROM kv ORDER BY key".
I expect ORDER BY column be value.

I tried the same in PostgreSQL 14.12 and 15.7 and got correct equivalent
definitions.
"SELECT keyvalue.key AS value, keyvalue.value AS key FROM keyvalue ORDER BY
keyvalue.key"

So this seems to be a regression.

This is broken starting from 1b4d280ea. The "new" and "old" entries
in a view's rangetable were removed, which results in the varprefix
flag being set to false in this case, as there is now only one rtable
entry.

Thanks
Richard

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#2)
Re: BUG #18589: pg_get_viewdef returns wrong query

Richard Guo <guofenglinux@gmail.com> writes:

This is broken starting from 1b4d280ea. The "new" and "old" entries
in a view's rangetable were removed, which results in the varprefix
flag being set to false in this case, as there is now only one rtable
entry.

Yeah. Interesting that that old behavior accidentally hid this
ambiguity. I wonder whether there is a way to provoke mis-deparsing
even before 1b4d280ea.

The simplest fix would be to force varprefix on whenever deparsing
an ORDER BY list, as in the first patch attached. However, that
turns out to cause quite a lot of changes in regression test outputs
(which I didn't bother to include in the patch), which is unsurprising
given that it's reverting some of the 1b4d280ea output changes.
I don't like that too much, first because it seems like kind of a lot
of behavior churn for a stable branch, and second because it's just
plain ugly that the same variable is printed in different ways
depending on where it is in the query.

The second patch attached is what I think we should really do.
What it does is to prefix Vars only in the actually-troublesome
case where there is a conflicting SELECT-list entry, so that you
get the clutter only when doing something that's fairly ill-advised.
To do that, get_variable needs access to the SELECT list. We already
have that available in deparse_context, but the field name is
"windowTList" which seems rather misleading if it's to be used for
other purposes too. So I renamed and relocated that field to make
things less confusing. The other debatable point in the patch is
how to tell get_variable that we're considering an ORDER BY item.
I chose to re-use the special_exprkind field that's already there,
but I'm not totally convinced that that's a better solution than
adding another bool field. The problem is that we also use
get_rule_sortgroupclause for GROUP BY, so that it's temporarily hiding
the EXPR_KIND_GROUP_BY setting. That doesn't break anything as long
as we only change it while deparsing a Var, but it seems kind of
fragile. (On the whole, I don't think special_exprkind is all that
well thought out: aside from having no useful documentation, this
experience shows that it's not nearly as extensible as the author
probably believed. I wonder if we should replace it with
"bool in_group_by" or the like.)

With the second patch, there are no changes in existing regression
test outputs, so I added a new test.

Thoughts?

regards, tom lane

Attachments:

bug-18589-simple-fix.patchtext/x-diff; charset=us-ascii; name=bug-18589-simple-fix.patchDownload+18-6
bug-18589-better-fix.patchtext/x-diff; charset=us-ascii; name=bug-18589-better-fix.patchDownload+87-19
#4Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#3)
Re: BUG #18589: pg_get_viewdef returns wrong query

On Tue, Aug 27, 2024 at 1:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

The second patch attached is what I think we should really do.
What it does is to prefix Vars only in the actually-troublesome
case where there is a conflicting SELECT-list entry, so that you
get the clutter only when doing something that's fairly ill-advised.

I also think this is a better way.

To do that, get_variable needs access to the SELECT list. We already
have that available in deparse_context, but the field name is
"windowTList" which seems rather misleading if it's to be used for
other purposes too. So I renamed and relocated that field to make
things less confusing.

+1 to this part.

The other debatable point in the patch is
how to tell get_variable that we're considering an ORDER BY item.
I chose to re-use the special_exprkind field that's already there,
but I'm not totally convinced that that's a better solution than
adding another bool field. The problem is that we also use
get_rule_sortgroupclause for GROUP BY, so that it's temporarily hiding
the EXPR_KIND_GROUP_BY setting.

It seems to me that this might result in different behavior from
previous when deparsing Vars in a GROUP BY list, due to overriding
special_exprkind from EXPR_KIND_GROUP_BY to EXPR_KIND_ORDER_BY.

For example, with this patch:

create table t (x1 int, x2 int);
create view v as select x1 as x2, x2 as x1 from t group by x1, x2;

select pg_get_viewdef('v', true);
pg_get_viewdef
------------------------
SELECT x1 AS x2, +
x2 AS x1 +
FROM t +
GROUP BY t.x1, t.x2;
(1 row)

It seems that the prefixes in the GROUP BY Vars are unnecessary.

(On the whole, I don't think special_exprkind is all that
well thought out: aside from having no useful documentation, this
experience shows that it's not nearly as extensible as the author
probably believed. I wonder if we should replace it with
"bool in_group_by" or the like.)

From my brief 2-minute look at how special_exprkind is used, it seems
that special_exprkind is either EXPR_KIND_GROUP_BY or EXPR_KIND_NONE.
Since it doesn't seem to be extensible as expected, I agree that maybe
a bool field is more straightforward. Then we can have another bool
field for ORDER BY, avoiding the need to override the in-group-by
setting.

Thanks
Richard

#5Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#4)
Re: BUG #18589: pg_get_viewdef returns wrong query

On Tue, Aug 27, 2024 at 4:46 PM Richard Guo <guofenglinux@gmail.com> wrote:

From my brief 2-minute look at how special_exprkind is used, it seems
that special_exprkind is either EXPR_KIND_GROUP_BY or EXPR_KIND_NONE.
Since it doesn't seem to be extensible as expected, I agree that maybe
a bool field is more straightforward. Then we can have another bool
field for ORDER BY, avoiding the need to override the in-group-by
setting.

Alternatively, can we set special_exprkind = EXPR_KIND_ORDER_BY in
get_rule_orderby? I’m looking for a approach that is parallel to how
we set special_exprkind = EXPR_KIND_GROUP_BY in get_basic_select_query.

Thanks
Richard

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#5)
Re: BUG #18589: pg_get_viewdef returns wrong query

Richard Guo <guofenglinux@gmail.com> writes:

Alternatively, can we set special_exprkind = EXPR_KIND_ORDER_BY in
get_rule_orderby? I’m looking for a approach that is parallel to how
we set special_exprkind = EXPR_KIND_GROUP_BY in get_basic_select_query.

I don't want to do that because it would result in prefixing Vars
within grouping/ordering expressions, which is unnecessary. Only
a bare Var requires this treatment.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#4)
Re: BUG #18589: pg_get_viewdef returns wrong query

Richard Guo <guofenglinux@gmail.com> writes:

On Tue, Aug 27, 2024 at 1:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

The other debatable point in the patch is
how to tell get_variable that we're considering an ORDER BY item.

It seems to me that this might result in different behavior from
previous when deparsing Vars in a GROUP BY list, due to overriding
special_exprkind from EXPR_KIND_GROUP_BY to EXPR_KIND_ORDER_BY.

Yeah, I was uncertain whether to bother with that detail. It's true
that there's no bug in GROUP BY because the parser prefers SQL99
semantics in that case, but maybe we shouldn't have this code assuming
that? And it wasn't easy to do with the special_exprkind approach.
But it is pretty easy with the two-bools approach, so I fixed that in
the attached v2.

regards, tom lane

Attachments:

bug-18589-better-fix-2.patchtext/x-diff; charset=us-ascii; name=bug-18589-better-fix-2.patchDownload+109-40
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#7)
Re: BUG #18589: pg_get_viewdef returns wrong query

I wrote:

Yeah, I was uncertain whether to bother with that detail. It's true
that there's no bug in GROUP BY because the parser prefers SQL99
semantics in that case, but maybe we shouldn't have this code assuming
that? And it wasn't easy to do with the special_exprkind approach.
But it is pretty easy with the two-bools approach, so I fixed that in
the attached v2.

I was just about to commit v2 when I realized that it's wrong, because
what we need to be checking against is the output column name(s) that
get_target_list() will print, and that's not necessarily tle->resname.
You can in fact make v2 misbehave with ALTER VIEW RENAME COLUMN, as
shown by the added test cases in attached v3.

That complicates matters greatly because it means that we also need
access to the current get_query_def call's resultDesc. To avoid
cluttering things impossibly, I decided that resultDesc ought to be
passed down in the deparse_context not as a separate parameter, and
while doing that it seemed better to do the same with colNamesVisible.
(There is one place that has to save-n-restore colNamesVisible to
make that work, but overall it's less code and clutter.)

It's slightly annoying that there are now three places that know
how get_target_list chooses output column names. I thought about
refactoring to get that down to one place, but couldn't really
find anything that didn't net out as more code and ugliness.

Also, while looking at this I realized that get_select_query_def's
habit of saving-and-restoring windowClause and windowTList is
completely pointless: it has only one caller and that one has
just set up a fresh deparse_context, which won't be used anymore
afterwards. If there were a code path in which that did something,
the lack of save-and-restore logic in get_update_query_def and
friends would likely be a bug. So I just dropped it. I believe
that it's not really necessary to save-n-restore inGroupBy or
varInOrderBy either (rather than just reset them to false).
But those cases aren't quite as clear-cut, and resetting to false
isn't much cheaper than restoring a saved value, so I left those
bits as-is.

regards, tom lane

Attachments:

bug-18589-better-fix-3.patchtext/x-diff; charset=us-ascii; name=bug-18589-better-fix-3.patchDownload+217-100
#9Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#8)
Re: BUG #18589: pg_get_viewdef returns wrong query

On Thu, Aug 29, 2024 at 3:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I was just about to commit v2 when I realized that it's wrong, because
what we need to be checking against is the output column name(s) that
get_target_list() will print, and that's not necessarily tle->resname.
You can in fact make v2 misbehave with ALTER VIEW RENAME COLUMN, as
shown by the added test cases in attached v3.

Ah yes, we need to match the column name from the view's tuple
descriptor, which may differ from tle->resname after column RENAME.

+1 to v3.

Thanks
Richard

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#9)
Re: BUG #18589: pg_get_viewdef returns wrong query

Richard Guo <guofenglinux@gmail.com> writes:

+1 to v3.

Pushed, thanks for looking at it.

regards, tom lane