BUG #13863: Select from views gives wrong results

Started by wrbover 10 years ago14 messagesbugs
Jump to latest
#1wrb
wrb@autistici.org

The following bug has been logged on the website:

Bug reference: 13863
Logged by: W
Email address: wrb@autistici.org
PostgreSQL version: 9.5.0
Operating system: Mac OS X 10.11.2 (postgres is from homebrew)
Description:

Hello,

I noticed strange behavior while using views with the new grouping sets
functionality.

Steps to reproduce are here:
https://gist.github.com/wrb/a73f45dcd335c8bc166c

On my machine, selecting from view gives zero results while selecting from
materialize view (or using CTE) gives the correct result. There's nothing
unusual in log.

Looking at explain analyze function, it seems that the filter is pushed too
deep and removes rows from the original table and not from the aggreage. It
works correctly when using CTE.

Thanks!

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: wrb (#1)
Re: BUG #13863: Select from views gives wrong results

On Thu, Jan 14, 2016 at 5:35 AM, <wrb@autistici.org> wrote:

The following bug has been logged on the website:

Bug reference: 13863
Logged by: W
Email address: wrb@autistici.org
PostgreSQL version: 9.5.0
Operating system: Mac OS X 10.11.2 (postgres is from homebrew)
Description:

Hello,

I noticed strange behavior while using views with the new grouping sets
functionality.

Steps to reproduce are here:
https://gist.github.com/wrb/a73f45dcd335c8bc166c

I feel, It is always better to place the simple sql file in the mail itself,
instead of link.

Following are the sql statements that i took from the link.

-- table
create table emp (emp_id serial, department text, position text, sex
text, salary numeric);

-- data
begin;
insert into emp (department, position, sex, salary) values
('marketing', 'junior', 'F', 20000);
insert into emp (department, position, sex, salary) values
('marketing', 'junior', 'F', 25000);
insert into emp (department, position, sex, salary) values
('marketing', 'junior', 'M', 25000);
insert into emp (department, position, sex, salary) values ('it',
'senior', 'M', 125000);
insert into emp (department, position, sex, salary) values ('sales',
'normal', 'M', 50000);
insert into emp (department, position, sex, salary) values ('sales',
'junior', 'F', 30000);
insert into emp (department, position, sex, salary) values ('sales',
'junior', 'F', 30000);
commit;

-- view
create view v_emp_stats as select department, position, sex,
avg(salary) from emp group by cube (department, position, sex);

-- materialized view
create materialized view mv_emp_stats as select department, position,
sex, avg(salary) from emp group by cube (department, position, sex);

-- those two selects should be identical, right?
select * from v_emp_stats where department is not null and position is null;
select * from mv_emp_stats where department is not null and position is null;

--CTE
with emp_data as (
select department, position, sex, avg(salary) from emp group by cube
(department, position, sex)
)
select * from emp_data where department is not null and position is null;

On my machine, selecting from view gives zero results while selecting from
materialize view (or using CTE) gives the correct result. There's nothing
unusual in log.

Looking at explain analyze function, it seems that the filter is pushed too
deep and removes rows from the original table and not from the aggreage. It
works correctly when using CTE.

As per my understanding of the result, this is correct.

-- those two selects should be identical, right?
select * from v_emp_stats where department is not null and position is null;

View's don't have storage, so any query on the view is rewritten to
the target relation.
In the above scenario, the same happened and the where clause is pushed into
the view select query. There is no row that is presented in the emp
table satisfies
the where clause that is provided. Because of this reason no data is selected.

select * from mv_emp_stats where department is not null and position is null;

with emp_data as (
select department, position, sex, avg(salary) from emp group by cube (department, position, sex)
)
select * from emp_data where department is not null and position is null;

But where as with materialized view, it has the storage. The materialized view
is populated with the query result during creation. So adding an where clause
on materialized view satisfies with it and thus it returned the
results. The number
of records in the materialized view are 27 compared to the number of records in
table are 7.

The same with the CTE also, first the inner query is executed and on top of that
result the outer query is executed. Because of this reason, the where clause is
satisfied and the results are returned.

This is just an usage problem.

Regards,
Hari Babu
Fujitsu Australia

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3wrb
wrb@autistici.org
In reply to: Haribabu Kommi (#2)
Re: BUG #13863: Select from views gives wrong results

On 14. 1. 2016, at 0:50, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

View's don't have storage, so any query on the view is rewritten to
the target relation.
In the above scenario, the same happened and the where clause is pushed into
the view select query. There is no row that is presented in the emp
table satisfies
the where clause that is provided. Because of this reason no data is selected.

But where as with materialized view, it has the storage. The materialized view
is populated with the query result during creation. So adding an where clause
on materialized view satisfies with it and thus it returned the
results. The number
of records in the materialized view are 27 compared to the number of records in
table are 7.

The same with the CTE also, first the inner query is executed and on top of that
result the outer query is executed. Because of this reason, the where clause is
satisfied and the results are returned.

I understand why it happens - with normal group by, the predicate *can* be pushed down
unless it is on generated column. But with cube, every column is a geenrated column because
of the way cube works.

It works correctly unless cube is involved:

create table test (a int, b int);
create view v_test as select a, count(*) from test group by a;
create view v_test2 as select a, count(*) from test group by cube (a);
insert into test values (1, 1), (2, 2);

this doesn't get pushed down either as CTE or from view:

explain with t as (select a, count(*) from test group by a) select * from t where count = 1;
QUERY PLAN
----------------------------------------------------------------------
CTE Scan on t (cost=45.90..50.40 rows=1 width=12)
Filter: (count = 1)
CTE t
-> HashAggregate (cost=43.90..45.90 rows=200 width=4)
Group Key: test.a
-> Seq Scan on test (cost=0.00..32.60 rows=2260 width=4)
(6 rows)

explain select * from v_test where count = 1;
QUERY PLAN
--------------------------------------------------------------
HashAggregate (cost=49.55..52.05 rows=200 width=4)
Group Key: test.a
Filter: (count(*) = 1)
-> Seq Scan on test (cost=0.00..32.60 rows=2260 width=4)
(4 rows)

This does get pushed down correctly:

explain select * from v_test where a = 1;
QUERY PLAN
------------------------------------------------------------
GroupAggregate (cost=0.00..38.32 rows=1 width=4)
Group Key: test.a
-> Seq Scan on test (cost=0.00..38.25 rows=11 width=4)
Filter: (a = 1)
(4 rows)

This doesn't get pushed down, but it's a known issue with CTEs being optimization boundary in postgres:

explain with t as (select a, count(*) from test group by a) select * from t where a = 1;
QUERY PLAN
----------------------------------------------------------------------
CTE Scan on t (cost=45.90..50.40 rows=1 width=12)
Filter: (a = 1)
CTE t
-> HashAggregate (cost=43.90..45.90 rows=200 width=4)
Group Key: test.a
-> Seq Scan on test (cost=0.00..32.60 rows=2260 width=4)
(6 rows)

But when cube gets involved:

with t as (select a, count(*) from test group by cube (a)) select * from t where a is null;
a | count
---+-------
| 2

CTE still works, because it doesn't get pushed down:

explain with t as (select a, count(*) from test group by cube (a)) select * from t where a is null;
QUERY PLAN
----------------------------------------------------------------------------
CTE Scan on t (cost=177.47..181.49 rows=1 width=12)
Filter: (a IS NULL)
CTE t
-> GroupAggregate (cost=158.51..177.47 rows=201 width=4)
Group Key: test.a
Group Key: ()
-> Sort (cost=158.51..164.16 rows=2260 width=4)
Sort Key: test.a
-> Seq Scan on test (cost=0.00..32.60 rows=2260 width=4)
(9 rows)

but with a view, it gets pushed down to the table, even though it doesn't make any sense for cube,
and gives incorrect results:

select * from v_test2 where a is null;
a | count
---+-------
| 0

explain select * from v_test2 where a is null;
QUERY PLAN
------------------------------------------------------------------
GroupAggregate (cost=32.79..32.89 rows=2 width=4)
Group Key: test.a
Group Key: ()
Filter: (test.a IS NULL)
-> Sort (cost=32.79..32.82 rows=11 width=4)
Sort Key: test.a
-> Seq Scan on test (cost=0.00..32.60 rows=11 width=4)
Filter: (a IS NULL)

No, I'm pretty sure this is incorrect result, because views are supposed to provide logical data independence and running the same query through nested select, CTE and view should give the same result, right? If not, I'm pretty sure this is the first case where it doesn't work like this.

W

This is just an usage problem.

Regards,
Hari Babu
Fujitsu Australia

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4wrb
wrb@autistici.org
In reply to: wrb (#3)
Re: BUG #13863: Select from views gives wrong results

<html><head></head><body class="ApplePlainTextBody" dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">Ooops, forgot to mail the list.<br><br>W<br><br><blockquote type="cite">Begin forwarded message:<br><br>From: wrb &lt;wrb@autistici.org&gt;<br>Subject: Re: [BUGS] BUG #13863: Select from views gives wrong results<br>Date: 14. ledna 2016 10:49:27 SEČ<br>To: Haribabu Kommi &lt;kommi.haribabu@gmail.com&gt;<br><br>Followup: <br><br>This query should show better what's wrong with the optimizer:<br><br><br>select a, count(*) &nbsp;from test group by cube (a);<br>a | count <br>---+-------<br>1 | &nbsp;&nbsp;&nbsp;&nbsp;1<br>2 | &nbsp;&nbsp;&nbsp;&nbsp;1<br> &nbsp;| &nbsp;&nbsp;&nbsp;&nbsp;2<br>(3 rows)<br><br>select * from (select a, count(*) &nbsp;from test group by cube (a)) t where a is null;<br>a | count <br>---+-------<br> &nbsp;| &nbsp;&nbsp;&nbsp;&nbsp;0<br><br>explain select * from (select a, count(*) from test group by cube (a)) t where a is null;<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;QUERY PLAN &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>------------------------------------------------------------------<br>GroupAggregate &nbsp;(cost=32.79..32.89 rows=2 width=4)<br> &nbsp;Group Key: test.a<br> &nbsp;Group Key: ()<br> &nbsp;Filter: (test.a IS NULL)<br> &nbsp;-&gt; &nbsp;Sort &nbsp;(cost=32.79..32.82 rows=11 width=4)<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Sort Key: test.a<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;-&gt; &nbsp;Seq Scan on test &nbsp;(cost=0.00..32.60 rows=11 width=4)<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Filter: (a IS NULL)<br><br><br>The predicate "a is null" in the outer select should refer to results of inner select and should therefore return (null, 2) instead of no rows, right?<br><br>W<br></blockquote><br></body></html>

#5Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: wrb (#3)
Re: BUG #13863: Select from views gives wrong results

On 14 January 2016 at 09:19, wrb <wrb@autistici.org> wrote:

No, I'm pretty sure this is incorrect result, because views are supposed to provide logical data independence and running the same query through nested select, CTE and view should give the same result, right? If not, I'm pretty sure this is the first case where it doesn't work like this.

Yes, there appears to be a real bug here, but actually the underlying
cause is more to do with the way HAVING and GROUPING SETS interact.

When the clause is pushed down into the view it is turned from a WHERE
clause into a HAVING clause, because the view query has grouping. That
part is OK. However, the query planner then decides that since the new
HAVING clause doesn't contain any aggregates or volatile expressions,
it is safe to turn it back into a WHERE clause in the inner query.
That would be OK for normal grouping, because in that case the clause
would evaluate the same for all the rows in a group. However, that
doesn't work for grouping sets, since the output of the grouping sets
aggregation may contains nulls not present before grouping.

Here is a simpler example:

create table foo(a int, b int);
insert into foo values (1,1), (2,2);

select a, count(*) from foo group by cube(a);
a | count
---+-------
1 | 1
2 | 1
| 2
(3 rows)

select a, count(*) from foo group by cube(a) having a is null;
a | count
---+-------
| 0
(1 row)

select a, count(*) from foo group by cube(a) having a is distinct from 1;
a | count
---+-------
2 | 1
| 1
(2 rows)

The results from these last 2 queries are incorrect -- they're not the
same as evaluating the HAVING clause after grouping.

This bug appears to have been introduced by commit
61444bfb809d3a088a270a59f383af3d4cd157b0, which was discussed here:
/messages/by-id/20150726151456.GD12755@awork2.anarazel.de

Probably that change should be undone and the comment expanded to
explain why it isn't safe when the query has grouping sets.

Regards,
Dean

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Andres Freund
andres@anarazel.de
In reply to: Dean Rasheed (#5)
Re: BUG #13863: Select from views gives wrong results

Hi,

Thanks for digging!

On 2016-01-16 11:23:31 +0000, Dean Rasheed wrote:

When the clause is pushed down into the view it is turned from a WHERE
clause into a HAVING clause, because the view query has grouping. That
part is OK. However, the query planner then decides that since the new
HAVING clause doesn't contain any aggregates or volatile expressions,
it is safe to turn it back into a WHERE clause in the inner query.
That would be OK for normal grouping, because in that case the clause
would evaluate the same for all the rows in a group. However, that
doesn't work for grouping sets, since the output of the grouping sets
aggregation may contains nulls not present before grouping.

Probably that change should be undone and the comment expanded to
explain why it isn't safe when the query has grouping sets.

Hm, I'm a bit hesitant to potentially noticeably regress plans here,
although it seems unikely that already "relies" on this. Can't we
essentially treat references to columns which can be set to null by due
to a grouping set, as containing an agg clause and only prevent moving
in that case?

Greetings,

Andres Freund

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Andres Freund (#6)
Re: BUG #13863: Select from views gives wrong results

On 16 January 2016 at 13:12, Andres Freund <andres@anarazel.de> wrote:

Hm, I'm a bit hesitant to potentially noticeably regress plans here,
although it seems unikely that already "relies" on this. Can't we
essentially treat references to columns which can be set to null by due
to a grouping set, as containing an agg clause and only prevent moving
in that case?

I suppose it could be done, but it seems like a fair bit of work. It
would be a bit like check_ungrouped_columns() except that it would
only be safe to move the clause if it only referred to expressions
that appeared in the intersection of all the grouping sets. I think
that would rule out most real-world examples, including just about
anything using rollup or cube, so I'm not sure it would be worth the
effort.

Regards,
Dean

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Dean Rasheed (#7)
Re: BUG #13863: Select from views gives wrong results

"Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On 16 January 2016 at 13:12, Andres Freund <andres@anarazel.de> wrote:

Hm, I'm a bit hesitant to potentially noticeably regress plans here,
although it seems unikely that already "relies" on this. Can't we
essentially treat references to columns which can be set to null by
due to a grouping set, as containing an agg clause and only prevent
moving in that case?

Dean> I suppose it could be done, but it seems like a fair bit of
Dean> work. It would be a bit like check_ungrouped_columns() except
Dean> that it would only be safe to move the clause if it only referred
Dean> to expressions that appeared in the intersection of all the
Dean> grouping sets. I think that would rule out most real-world
Dean> examples, including just about anything using rollup or cube, so
Dean> I'm not sure it would be worth the effort.

Obvious question is, can we keep the optimization in the case that the
qual is strict? and if so, should we?

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#9Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Andrew Gierth (#8)
Re: BUG #13863: Select from views gives wrong results

On 18 January 2016 at 12:54, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:

Obvious question is, can we keep the optimization in the case that the
qual is strict? and if so, should we?

Perhaps. I don't think it's entirely obvious what it means for a qual
to be strict.

I think that it would have to be the case that for every nullable
expression (every expression appearing in at least one grouping set,
but not appearing in all the grouping sets) the qual would have to be
provably false or null when that expression was null, or the qual
would have to be independent of that expression.

That seems like a pretty tricky piece of code to get right. It might
be more reasonable if you limited yourself to the case where all the
expressions were Vars, then you could use find_nonnullable_vars().

Regards,
Dean

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Dean Rasheed (#9)
Re: BUG #13863: Select from views gives wrong results

I think for now the following patch is best. This preserves the ability
to push degenerate HAVING clauses through an explicit GROUP BY () as
well as an implicit one, but prevents pushdown in all other cases.

Includes comment and regression test.

--
Andrew (irc:RhodiumToad)

Attachments:

havingfix.patchtext/x-patchDownload+81-8
#11Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Andrew Gierth (#10)
Re: BUG #13863: Select from views gives wrong results

On 7 February 2016 at 23:21, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:

I think for now the following patch is best. This preserves the ability
to push degenerate HAVING clauses through an explicit GROUP BY () as
well as an implicit one, but prevents pushdown in all other cases.

Includes comment and regression test.

Yes, I think that's probably the best solution for now. Longer term I
think it might be possible to do better, along the lines discussed
above, but I haven't had time to explore any of those options.

Note that with this change to the "if" condition, the next "else if"
condition can be simplified back to just:

else if (parse->groupClause)
{
/* move it to WHERE */
...

Andres, are you going to take this one?

Regards,
Dean

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#12Andres Freund
andres@anarazel.de
In reply to: Dean Rasheed (#11)
Re: BUG #13863: Select from views gives wrong results

On 2016-02-08 09:41:10 +0000, Dean Rasheed wrote:

On 7 February 2016 at 23:21, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:

I think for now the following patch is best. This preserves the ability
to push degenerate HAVING clauses through an explicit GROUP BY () as
well as an implicit one, but prevents pushdown in all other cases.

Includes comment and regression test.

Yes, I think that's probably the best solution for now. Longer term I
think it might be possible to do better, along the lines discussed
above, but I haven't had time to explore any of those options.

Note that with this change to the "if" condition, the next "else if"
condition can be simplified back to just:

else if (parse->groupClause)
{
/* move it to WHERE */
...

FWIW, I find the current phrasing clearer.

Andres, are you going to take this one?

Working on it as we speak. I thought you might appreciate doing your
first commit under a littlebit less pressure.

Andres

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#13Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Andres Freund (#12)
Re: BUG #13863: Select from views gives wrong results

On 8 February 2016 at 09:44, Andres Freund <andres@anarazel.de> wrote:

Note that with this change to the "if" condition, the next "else if"
condition can be simplified back to just:

else if (parse->groupClause)
{
/* move it to WHERE */
...

FWIW, I find the current phrasing clearer.

OK, fair enough.

Andres, are you going to take this one?

Working on it as we speak. I thought you might appreciate doing your
first commit under a littlebit less pressure.

Thanks. I'm a bit swamped with work for my day job right now.

Regards,
Dean

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#14Andres Freund
andres@anarazel.de
In reply to: Andrew Gierth (#10)
Re: BUG #13863: Select from views gives wrong results

On 2016-02-07 23:21:57 +0000, Andrew Gierth wrote:

I think for now the following patch is best. This preserves the ability
to push degenerate HAVING clauses through an explicit GROUP BY () as
well as an implicit one, but prevents pushdown in all other cases.

Includes comment and regression test.

I've pushed this now. I'd greatly appreciate sending/pinging earlier
than the day of wrapping, next time round...

Andres

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs