BUG #17502: View based on window functions returns wrong results when queried

Started by PG Bug reporting formalmost 4 years ago16 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17502
Logged by: Daniel Farkaš
Email address: daniel.farkas@datoris.com
PostgreSQL version: 10.10
Operating system: Linux
Description:

Hey,

Please be gentle, I've never been in contact with Postgres developers.
In short, I've created a view, which has rather sketchy window functions,
but it gives me results I need.
When I do select * on it, it gives me what I expect. One of the columns has
five distinct values.
But when I do group by on that column, it gives me only one of the values.
When I drop the view and create materialized view, all is good, I get all
five values.

My guess is that some parts of the inner select are affecting outer, view's
select, which is not something I would expect.
My current Postgres is PostgreSQL 10.10 on x86_64-pc-linux-musl, compiled by
gcc (Alpine 8.3.0) 8.3.0, 64-bit.
If you think this is worth investigating further, I will try composing a
simpler example, and test it in a more recent Postgres version.
Maybe it's a known limitation I'm not aware of.

Let me know what you think.

Daniel

#2Magnus Hagander
magnus@hagander.net
In reply to: PG Bug reporting form (#1)
Re: BUG #17502: View based on window functions returns wrong results when queried

On Sun, May 29, 2022 at 4:20 PM PG Bug reporting form <
noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 17502
Logged by: Daniel Farkaš
Email address: daniel.farkas@datoris.com
PostgreSQL version: 10.10
Operating system: Linux
Description:

Hey,

Please be gentle, I've never been in contact with Postgres developers.
In short, I've created a view, which has rather sketchy window functions,
but it gives me results I need.
When I do select * on it, it gives me what I expect. One of the columns has
five distinct values.
But when I do group by on that column, it gives me only one of the values.
When I drop the view and create materialized view, all is good, I get all
five values.

My guess is that some parts of the inner select are affecting outer, view's
select, which is not something I would expect.
My current Postgres is PostgreSQL 10.10 on x86_64-pc-linux-musl, compiled
by
gcc (Alpine 8.3.0) 8.3.0, 64-bit.
If you think this is worth investigating further, I will try composing a
simpler example, and test it in a more recent Postgres version.
Maybe it's a known limitation I'm not aware of.

Let me know what you think.

Please see if you can reproduce this on a current version of PostgreSQL 10,
which is 10.21. Version 10.10 is lacking more than two and a half years
worth of bugfixes.

If you can then yes, try to put together a simpler example, because it
certainly does not sound like correct behavior.

//Magnus

#3Daniel Farkaš
daniel.farkas@datoris.com
In reply to: Magnus Hagander (#2)
Re: BUG #17502: View based on window functions returns wrong results when queried

Hey,

Thanks for the response.
I was able to replicate in 10.21.

Here is how:

SELECT version();
PostgreSQL 10.21 on x86_64-pc-linux-musl, compiled by gcc (Alpine
11.2.1_git20220219) 11.2.1 20220219, 64-bit

CREATE TABLE analytics_table (dimension_1 VARCHAR, dimension_2 VARCHAR,
metric_1 VARCHAR, metric_2 VARCHAR);

INSERT INTO analytics_table VALUES ('a1', 'b1', 'c1', 'd1'), ('a1', 'b2',
'c2', 'd2');

SELECT * FROM analytics_table;
dimension_1|dimension_2|metric_1|metric_2|
-----------+-----------+--------+--------+
a1 |b1 |c1 |d1 |
a1 |b2 |c2 |d2 |

SELECT
dimension_1,
dimension_2,
row_number() OVER(partition by generate_series(1,2) order by dimension_1,
dimension_2) AS rownum,
row_number() OVER(partition by dimension_1, dimension_2) AS metricnum
FROM analytics_table ORDER BY dimension_1, dimension_2;
dimension_1|dimension_2|rownum|metricnum|
-----------+-----------+------+------+
a1 |b1 | 1| 1|
a1 |b1 | 1| 2|
a1 |b2 | 2| 1|
a1 |b2 | 2| 2|

CREATE VIEW analytics_view AS
SELECT
dimension_1,
dimension_2,
row_number() OVER(partition by generate_series(1,2) order by dimension_1,
dimension_2) AS rownum,
CASE WHEN row_number() OVER(partition by dimension_1, dimension_2) = 1 THEN
'metric_1' ELSE 'metric_2' END AS metric_name,
CASE WHEN row_number() OVER(partition by dimension_1, dimension_2) = 1 THEN
metric_1 ELSE metric_2 END AS metric_value
FROM analytics_table ORDER BY dimension_1, dimension_2;

SELECT * FROM analytics_view;
dimension_1|dimension_2|rownum|metric_name|metric_value|
-----------+-----------+------+-----------+------------+
a1 |b1 | 1|metric_1 |c1 |
a1 |b1 | 1|metric_2 |d1 |
a1 |b2 | 2|metric_1 |c2 |
a1 |b2 | 2|metric_2 |d2 |

SELECT metric_name FROM analytics_view;
metric_name|
-----------+
metric_1 |
metric_1 |
metric_1 |
metric_1 |

CREATE MATERIALIZED VIEW analytics_materialized_view AS
SELECT
dimension_1,
dimension_2,
row_number() OVER(partition by generate_series(1,2) order by dimension_1,
dimension_2) AS rownum,
CASE WHEN row_number() OVER(partition by dimension_1, dimension_2) = 1 THEN
'metric_1' ELSE 'metric_2' END AS metric_name,
CASE WHEN row_number() OVER(partition by dimension_1, dimension_2) = 1
THEN metric_1
ELSE metric_2 END AS metric_value
FROM analytics_table ORDER BY dimension_1, dimension_2;

SELECT * FROM analytics_materialized_view;
dimension_1|dimension_2|rownum|metric_name|metric_value|
-----------+-----------+------+-----------+------------+
a1 |b1 | 1|metric_1 |c1 |
a1 |b1 | 1|metric_2 |d1 |
a1 |b2 | 2|metric_1 |c2 |
a1 |b2 | 2|metric_2 |d2 |

SELECT metric_name FROM analytics_materialized_view;
metric_name|
-----------+
metric_1 |
metric_2 |
metric_1 |
metric_2 |

For some analytics purposes I needed to transform one wide table with
multiple metrics into a metric_name/metric_value pairs stored as separate
rows. That's why I did all this. I guess the reason and the method are not
important, the fact that the view gives different results does look like a
bug.

Cheers,
Daniel Farkas
Datoris

On Sun, May 29, 2022 at 4:22 PM Magnus Hagander <magnus@hagander.net> wrote:

Show quoted text

On Sun, May 29, 2022 at 4:20 PM PG Bug reporting form <
noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 17502
Logged by: Daniel Farkaš
Email address: daniel.farkas@datoris.com
PostgreSQL version: 10.10
Operating system: Linux
Description:

Hey,

Please be gentle, I've never been in contact with Postgres developers.
In short, I've created a view, which has rather sketchy window functions,
but it gives me results I need.
When I do select * on it, it gives me what I expect. One of the columns
has
five distinct values.
But when I do group by on that column, it gives me only one of the values.
When I drop the view and create materialized view, all is good, I get all
five values.

My guess is that some parts of the inner select are affecting outer,
view's
select, which is not something I would expect.
My current Postgres is PostgreSQL 10.10 on x86_64-pc-linux-musl, compiled
by
gcc (Alpine 8.3.0) 8.3.0, 64-bit.
If you think this is worth investigating further, I will try composing a
simpler example, and test it in a more recent Postgres version.
Maybe it's a known limitation I'm not aware of.

Let me know what you think.

Please see if you can reproduce this on a current version of PostgreSQL
10, which is 10.21. Version 10.10 is lacking more than two and a half years
worth of bugfixes.

If you can then yes, try to put together a simpler example, because it
certainly does not sound like correct behavior.

//Magnus

#4David Rowley
dgrowleyml@gmail.com
In reply to: Daniel Farkaš (#3)
Re: BUG #17502: View based on window functions returns wrong results when queried

On Mon, 30 May 2022 at 06:58, Daniel Farkaš <daniel.farkas@datoris.com> wrote:

SELECT metric_name FROM analytics_view;
metric_name|
-----------+
metric_1 |
metric_1 |
metric_1 |
metric_1 |

This is certainly a bug. Thanks for reporting it.

The problem seems to be down to the fact that
remove_unused_subquery_outputs() does not check if the to-be-removed
target entry references WindowClauses which contain set-returning
functions.

We only seem to check if the target entry itself is an SRF, per:

/*
* If it contains a set-returning function, we can't remove it since
* that could change the number of rows returned by the subquery.
*/
if (subquery->hasTargetSRFs &&
expression_returns_set(texpr))
continue;

This ensures queries such as the following don't have SRF columns removed:

postgres=# explain verbose select a from (select
a,generate_series(1,2) as b from t) t;
QUERY PLAN
----------------------------------------------------------------------------
Subquery Scan on t (cost=0.00..131.13 rows=5100 width=4)
Output: t.a
-> ProjectSet (cost=0.00..80.13 rows=5100 width=8)
Output: t_1.a, generate_series(1, 2)
-> Seq Scan on public.t t_1 (cost=0.00..35.50 rows=2550 width=4)
Output: t_1.a
(6 rows)

I'm a little bit uncertain if the correct fix is to have
expression_returns_set() look deeper into WindowFuncs to check if the
WindowClause that the function belongs to has any SRFs in the
PARTITION BY / ORDER BY clause. Unfortunately, doing that means
having to pass the PlannerInfo to expression_returns_set(). I don't
quite see how that could be made to work in the back branches.

The other fix would be to make remove_unused_subquery_outputs() pull
out all WindowFuncs from the texpr and check if any of the
WindowClauses have SRFs in the PARTITION BY / ORDER BY clause.

I'll need to look a bit deeper into the usages of
expression_returns_set() to know which of the fixes is correct. There
might be some other bugs lurking due to expression_returns_set() not
checking WindowClauses for SRFs.

David

#5David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#4)
Re: BUG #17502: View based on window functions returns wrong results when queried

On Mon, 30 May 2022 at 10:30, David Rowley <dgrowleyml@gmail.com> wrote:

I'm a little bit uncertain if the correct fix is to have
expression_returns_set() look deeper into WindowFuncs to check if the
WindowClause that the function belongs to has any SRFs in the
PARTITION BY / ORDER BY clause. Unfortunately, doing that means
having to pass the PlannerInfo to expression_returns_set(). I don't
quite see how that could be made to work in the back branches.

There does seem to be other weird anomalies as a result of
expression_returns_set() not looking for SRFs in the WindowFunc
PARTITION BY / ORDER BY clause.

Looking at the usage of expression_returns_set() in
make_sort_input_target(), I'd have expected the following 2 queries to
act in the same way:

setup:
create table ab (a int, b int);
insert into ab values(1,1),(1,2),(2,1),(2,2);

postgres=# select distinct on (a) a,b, row_number() over (order by
generate_Series(1,2)) from ab order by a,b;
a | b | row_number
---+---+------------
1 | 1 | 1
2 | 1 | 7
(2 rows)

postgres=# select distinct on (a) a,b, generate_Series(1,2) from ab
order by a,b;
a | b | generate_series
---+---+-----------------
1 | 1 | 1
1 | 1 | 2
2 | 1 | 1
2 | 1 | 2
(4 rows)

The fact that those are different is starting to make me think that
expression_returns_set() should be looking into WindowFuncs to see if
the WindowClause has SRFs rather than adding special code to
remove_unused_subquery_outputs() in order to fix the bug being
reported on this thread.

However, I'm not sure the latter of the above two queries result makes
any sense given that our documentation [1]https://www.postgresql.org/docs/current/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET claims:

"PostgreSQL's behavior for a set-returning function in a query's
select list is almost exactly the same as if the set-returning
function had been written in a LATERAL FROM-clause item instead. For
example,"

the "almost" is later defined as:

"It would be exactly the same, except that in this specific example,
the planner could choose to put g on the outside of the nested-loop
join, since g has no actual lateral dependency on tab."

which only indicates that the order of the results may differ, not the
actual results themselves.

You can see the LATERAL version only returns 2 rows:

postgres=# select distinct on (a) a, b, g.s from ab, lateral
generate_Series(1,2) g(s) order by a,b;
a | b | s
---+---+---
1 | 1 | 1
2 | 1 | 1
(2 rows)

The following is also pretty strange. Why should adding the SRF column
to the ORDER BY change the number of output rows?

postgres=# select distinct on (a) a,b, generate_Series(1,2) from ab
order by a,b,3 desc;
a | b | generate_series
---+---+-----------------
1 | 1 | 2
2 | 1 | 2
(2 rows)

postgres=# select distinct on (a) a,b, generate_Series(1,2) from ab
order by a,b;
a | b | generate_series
---+---+-----------------
1 | 1 | 1
1 | 1 | 2
2 | 1 | 1
2 | 1 | 2
(4 rows)

The LATERAL version of this will return 2 rows regardless of what the
ORDER BY says:

postgres=# select distinct on (a) a, b, g.s from ab, lateral
generate_Series(1,2) g(s) order by a,b,g.s DESC;
a | b | s
---+---+---
1 | 1 | 2
2 | 1 | 2
(2 rows)

postgres=# select distinct on (a) a, b, g.s from ab, lateral
generate_Series(1,2) g(s) order by a,b;
a | b | s
---+---+---
1 | 1 | 1
2 | 1 | 1
(2 rows)

In addition to what the documentation claims about SRFs in the target
list being the same as LATERAL, maybe the following should give us
some guidance on if the non-lateral SRF queries above should return 2
or 4 rows:

postgres=# select distinct generate_Series(0,1)/2;
?column?
----------
0
(1 row)

In the above, the ProjectSet occurs before the distinctification. That
would translate into the 2-row version of the DISTINCT ON query above
being correct.

I think that roughly translates into changes being required in both
make_sort_input_target() and expression_returns_set().

Does anyone have any other opinions?

David

[1]: https://www.postgresql.org/docs/current/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#4)
Re: BUG #17502: View based on window functions returns wrong results when queried

David Rowley <dgrowleyml@gmail.com> writes:

The problem seems to be down to the fact that
remove_unused_subquery_outputs() does not check if the to-be-removed
target entry references WindowClauses which contain set-returning
functions.

I was sort of wondering why we allow SRFs in this context in the
first place. The results don't seem terribly well-defined to me.
In particular, a WindowFunc invocation is not supposed to change the
number of rows in the query result, and yet this one is doing so.

regards, tom lane

#7David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#6)
Re: BUG #17502: View based on window functions returns wrong results when queried

On Mon, 30 May 2022 at 14:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

The problem seems to be down to the fact that
remove_unused_subquery_outputs() does not check if the to-be-removed
target entry references WindowClauses which contain set-returning
functions.

I was sort of wondering why we allow SRFs in this context in the
first place. The results don't seem terribly well-defined to me.
In particular, a WindowFunc invocation is not supposed to change the
number of rows in the query result, and yet this one is doing so.

That would certainly be an easier fix for the reported problem.

Do you think it would fly to add such a restriction in the backbranches?

David

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#7)
Re: BUG #17502: View based on window functions returns wrong results when queried

David Rowley <dgrowleyml@gmail.com> writes:

On Mon, 30 May 2022 at 14:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I was sort of wondering why we allow SRFs in this context in the
first place. The results don't seem terribly well-defined to me.

Do you think it would fly to add such a restriction in the backbranches?

We could leave it alone in the back branches on the grounds that if
you like the results you get, we shouldn't break it in a minor
release.

regards, tom lane

#9David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#8)
Re: BUG #17502: View based on window functions returns wrong results when queried

On Mon, 30 May 2022 at 14:45, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

On Mon, 30 May 2022 at 14:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I was sort of wondering why we allow SRFs in this context in the
first place. The results don't seem terribly well-defined to me.

Do you think it would fly to add such a restriction in the backbranches?

We could leave it alone in the back branches on the grounds that if
you like the results you get, we shouldn't break it in a minor
release.

I struggle to see how anyone would like their result correctness to
depend on whether remove_unused_subquery_outputs() is able or unable
to remove a column from the subquery.

David

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#9)
Re: BUG #17502: View based on window functions returns wrong results when queried

David Rowley <dgrowleyml@gmail.com> writes:

On Mon, 30 May 2022 at 14:45, Tom Lane <tgl@sss.pgh.pa.us> wrote:

We could leave it alone in the back branches on the grounds that if
you like the results you get, we shouldn't break it in a minor
release.

I struggle to see how anyone would like their result correctness to
depend on whether remove_unused_subquery_outputs() is able or unable
to remove a column from the subquery.

Indeed, it's unlikely that anybody would like these particular
results. But perhaps somebody is running an application that does
this and happens to not trip over any obviously-wrong case; for
example, if one always selects all columns from this view, the
issue is not apparent.

I'm not necessarily against adding the prohibition in the back
branches. However, if this has been wrong since 10.x (if not
further back) then it seems like few people are tripping over
the inconsistency.

regards, tom lane

#11David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#10)
Re: BUG #17502: View based on window functions returns wrong results when queried

On Mon, 30 May 2022 at 15:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm not necessarily against adding the prohibition in the back
branches. However, if this has been wrong since 10.x (if not
further back) then it seems like few people are tripping over
the inconsistency.

Yeah, perhaps. But we have received a report from 1 person and I think
it seems strange to adopt a policy that we require multiple bug
reports for the same issue before we consider fixing.

FWIW, just to demonstrate what a fix could look like, I've attached a
patch. I'm not planning to commit it if you really think we shouldn't
be fixing it.

The patch also does nothing for the weirdness that I described in the
DISTINCT ON case earlier in this thread.

David

Attachments:

dont_remove_srf_windowfuncs.patchtext/plain; charset=US-ASCII; name=dont_remove_srf_windowfuncs.patchDownload+52-3
#12Richard Guo
guofenglinux@gmail.com
In reply to: David Rowley (#5)
Re: BUG #17502: View based on window functions returns wrong results when queried

On Mon, May 30, 2022 at 9:12 AM David Rowley <dgrowleyml@gmail.com> wrote:

The following is also pretty strange. Why should adding the SRF column
to the ORDER BY change the number of output rows?

postgres=# select distinct on (a) a,b, generate_Series(1,2) from ab
order by a,b,3 desc;
a | b | generate_series
---+---+-----------------
1 | 1 | 2
2 | 1 | 2
(2 rows)

postgres=# select distinct on (a) a,b, generate_Series(1,2) from ab
order by a,b;
a | b | generate_series
---+---+-----------------
1 | 1 | 1
1 | 1 | 2
2 | 1 | 1
2 | 1 | 2
(4 rows)

This is indeed weird. This seems depends on at which plan level we add
ProjectSetPath. For the first query we insert ProjectSetPath at the
scan/join level, because the SRF is included in the ORDER BY, which
makes the SRF appear in the scan/join target. For the second query the
SRF is postponed to after the Sort.

Is this a bug we should fix?

Thanks
Richard

#13Ilya Anfimov
ilan@tzirechnoy.com
In reply to: Tom Lane (#6)
Re: BUG #17502: View based on window functions returns wrong results when queried

On Sun, May 29, 2022 at 10:18:59PM -0400, Tom Lane wrote:

David Rowley <dgrowleyml@gmail.com> writes:

The problem seems to be down to the fact that
remove_unused_subquery_outputs() does not check if the to-be-removed
target entry references WindowClauses which contain set-returning
functions.

I was sort of wondering why we allow SRFs in this context in the
first place. The results don't seem terribly well-defined to me.
In particular, a WindowFunc invocation is not supposed to change the
number of rows in the query result, and yet this one is doing so.

I would like to note, that some SRFs may return only one row in
specific cases, like dblink() with the specific query.
It may used much more often, than real multirow SRFs in PARTITION BY.

While converting it to a LATERAL JOIN is not a lot of work --
breaking it in previous versions may affect a lot of people.

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#12)
Re: BUG #17502: View based on window functions returns wrong results when queried

Richard Guo <guofenglinux@gmail.com> writes:

On Mon, May 30, 2022 at 9:12 AM David Rowley <dgrowleyml@gmail.com> wrote:

The following is also pretty strange. Why should adding the SRF column
to the ORDER BY change the number of output rows?

Is this a bug we should fix?

This bug seems to have slipped off the radar screen, but it's still
a bug. I continue to believe that the best fix is to disallow SRFs
in window definitions, and present the trivial patch to do so.

As discussed, I'm comfortable with leaving this alone in the back
branches.

regards, tom lane

Attachments:

disallow-SRFs-in-window-definitions-altogether.patchtext/x-diff; charset=us-ascii; name=disallow-SRFs-in-window-definitions-altogether.patchDownload+22-11
#15Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#14)
Re: BUG #17502: View based on window functions returns wrong results when queried

On Tue, Jan 31, 2023 at 6:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

This bug seems to have slipped off the radar screen, but it's still
a bug. I continue to believe that the best fix is to disallow SRFs
in window definitions, and present the trivial patch to do so.

This patch fixes the original issue reported by Daniel. Another issue
discussed in this thread is the weirdness when there are SRFs in the
targetlist of a query with DISTINCT ON clause, as described by David.

create table ab (a int, b int);
insert into ab values(1,1),(1,2),(2,1),(2,2);

# select distinct on (a) a, b, generate_series(1,2) g from ab order by a, b;
a | b | g
---+---+---
1 | 1 | 1
1 | 1 | 2
2 | 1 | 1
2 | 1 | 2
(4 rows)

# select distinct on (a) a, b, g from ab, lateral generate_series(1,2) as g
order by a, b;
a | b | g
---+---+---
1 | 1 | 1
2 | 1 | 1
(2 rows)

According to the doc these two queries are supposed to be 'almost
exactly the same'. So it's weird to see they give different number of
output rows.

Should we also fix this issue? If so it seems we need some changes
about postponing SRFs in make_sort_input_target().

Thanks
Richard

#16David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#14)
Re: BUG #17502: View based on window functions returns wrong results when queried

On Tue, 31 Jan 2023 at 11:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:

This bug seems to have slipped off the radar screen, but it's still
a bug. I continue to believe that the best fix is to disallow SRFs
in window definitions, and present the trivial patch to do so.

I don't really object to doing this, but I'm not convinced that
someone else won't eventually come along complaining about us changing
this. As far as I'm aware, the behaviour is not wrong today provided
you always use the column so that remove_unused_subquery_outputs()
does not get rid of it.

If we disallow this, then maybe it's worth also giving ourselves a bit
more flexibility to make it work correctly in the future should
someone come along and convince us we shouldn't have disallowed this.
Adjusting expression_returns_set() to pass it the relevant
PlannerInfo for the Node might be a good idea. If we change our minds
in the future then we could easily then modify
expression_returns_set_walker() to have it check if the given
WindowFunc's WindowClause has SRFs so that we don't remove the unused
WindowFunc subquery columns in remove_unused_subquery_outputs(). I'd
have probably gone with this as a master-only fix, but is quite likely
you're thinking of something that I'm not. Certainly, the consistency
of when SRFs are evaluated is pretty terrible so I understand the
desire to disallow the WindowClause SRFs so we have one less thing to
get wrong.

David