BUG #14136: select distinct from a materialized view does not preserve result order

Started by Sean Knutsonalmost 10 years ago4 messagesbugs
Jump to latest
#1Sean Knutson
seandknutson@gmail.com

The following bug has been logged on the website:

Bug reference: 14136
Logged by: Sean
Email address: seandknutson@gmail.com
PostgreSQL version: 9.5.0
Operating system: Arch Linux
Description:

It seems that ordering is lost when doing a "select distinct" from a
materialized view.

Say I have a table called "test" that contains

id | name
----+-------
1 | steve
2 | adam
3 | jim
4 | steve
5 | adam
6 | sean

and a materialized view defined as

create materialized view view_test as (select * from test order by name);

If I run

select distinct name from view_test;

I get

name
-------
steve
adam
sean
jim

instead of

name
-------
adam
jim
sean
steve

like I'd expect. If I have a non-materialized view with the same definition,
the query returns

name
-------
adam
jim
sean
steve

as expected.

For simple cases, the workaround is relatively simple: just reapply the same
"order by" in the "select distinct.." query. E.g.

select distinct name from view_test order by name;

However, if the ordering defined in the mat view is complex, it may be
difficult to replicate, or even impossible (say if the order is based on a
column from a table that isn't included in the "select" part of the view
definition).

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sean Knutson (#1)
Re: BUG #14136: select distinct from a materialized view does not preserve result order

seandknutson@gmail.com writes:

It seems that ordering is lost when doing a "select distinct" from a
materialized view.

SELECT DISTINCT doesn't promise to preserve order in any context,
matview or otherwise. If you want a particular output ordering
you need to say ORDER BY explicitly in the query. Otherwise the
planner is free to do the DISTINCT via hashing, as it evidently
did here. (Actually, it's free to do it by hashing anyhow; but
with ORDER BY it'd then have to re-sort.)

and a materialized view defined as
create materialized view view_test as (select * from test order by name);

AFAIK, a matview is just a table, and even if you define it like that
there's no guarantee that the rows in the matview will be physically
ordered by name. Certainly, if you haven't created an index on the
matview, the planner will assume that it must either sort-and-unique
or hash in order to do the DISTINCT correctly ... and unless the table
is too large for the hashtable to fit in memory, it's likely to think
the hash approach is preferable.

This seems like a documentation shortcoming, though, as the documentation
doesn't really say that a matview won't preserve row order (or at least
I see nothing about it in the CREATE MATERIALIZED VIEW ref page --- maybe
it's mentioned somewhere else?)

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#2)
Re: BUG #14136: select distinct from a materialized view does not preserve result order

On Fri, May 13, 2016 at 5:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

seandknutson@gmail.com writes:

and a materialized view defined as
create materialized view view_test as (select * from test order by name);

AFAIK, a matview is just a table, and even if you define it like that
there's no guarantee that the rows in the matview will be physically
ordered by name.

This seems like a documentation shortcoming, though, as the documentation
doesn't really say that a matview won't preserve row order (or at least
I see nothing about it in the CREATE MATERIALIZED VIEW ref page --- maybe
it's mentioned somewhere else?)

I agree that it should be documented. I'll do something about that.

Note that even if the table is initially created in order, there is
no guarantee that REFRESH CONCURRENTLY will leave it in order. (In
that regard it's more-or-less like CLUSTER.) And even if you query
it while it is in order, there is no guarantee that this is the
order in which the rows would be returned, as there could be a
synchronous scan or an index-only scan on an index in some other
order.

If I have a non-materialized view with the same definition,
the query returns

name
-------
adam
jim
sean
steve

as expected.

It is bad technique to assume an order from a query that is not
explicitly specified in that query, regardless of whether the data
is being drawn from a table, view, or materialized view. A
DISTINCT or GROUP BY clause should not be assumed to order the
output -- it might sometimes do so, if that is the fastest way to
group things; but there is no requirement that it do so.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Sean Knutson
seandknutson@gmail.com
In reply to: Kevin Grittner (#3)
Re: BUG #14136: select distinct from a materialized view does not preserve result order

It is bad technique to assume an order from a query that is not
explicitly specified in that query, regardless of whether the data
is being drawn from a table, view, or materialized view. A
DISTINCT or GROUP BY clause should not be assumed to order the
output -- it might sometimes do so, if that is the fastest way to
group things; but there is no requirement that it do so.

Thanks for the quick replies! Yeah, that makes sense. I think at first I
blindly assumed that it would preserve the order of the result set when
doing a DISTINCT, although it makes sense why it wouldn't (or at least that
you can't count on it). I think the bigger point of confusion for me (and
the reason I felt it worth reporting as a bug) was why the DISTINCT would
behave differently when selecting from a mat view or table vs from a view
or subquery, and it sounds like that's simply a result of which algorithm
the query planner chooses to evaluate the DISTINCT in each situation?

I agree that it should be documented. I'll do something about that.

Thanks, I think that will help a great deal!

All the best,
-Sean

On Fri, May 13, 2016 at 7:33 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

Show quoted text

On Fri, May 13, 2016 at 5:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

seandknutson@gmail.com writes:

and a materialized view defined as
create materialized view view_test as (select * from test order by

name);

AFAIK, a matview is just a table, and even if you define it like that
there's no guarantee that the rows in the matview will be physically
ordered by name.

This seems like a documentation shortcoming, though, as the documentation
doesn't really say that a matview won't preserve row order (or at least
I see nothing about it in the CREATE MATERIALIZED VIEW ref page --- maybe
it's mentioned somewhere else?)

I agree that it should be documented. I'll do something about that.

Note that even if the table is initially created in order, there is
no guarantee that REFRESH CONCURRENTLY will leave it in order. (In
that regard it's more-or-less like CLUSTER.) And even if you query
it while it is in order, there is no guarantee that this is the
order in which the rows would be returned, as there could be a
synchronous scan or an index-only scan on an index in some other
order.

If I have a non-materialized view with the same definition,
the query returns

name
-------
adam
jim
sean
steve

as expected.

It is bad technique to assume an order from a query that is not
explicitly specified in that query, regardless of whether the data
is being drawn from a table, view, or materialized view. A
DISTINCT or GROUP BY clause should not be assumed to order the
output -- it might sometimes do so, if that is the fastest way to
group things; but there is no requirement that it do so.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company