union vs. sort

Started by Karel Zakalmost 22 years ago5 messages
#1Karel Zak
zakkr@zf.jcu.cz

I'm surprise with query plan that PostgreSQL planner prepare for
selects with ORDER BY if all data are from sub-select that is already
sorted.

# explain select data from
(select distinct data from addr)
as x order by x.data;
-------------------------------------------------
Subquery Scan x
-> Unique
-> Sort
Sort Key: data
-> Seq Scan on addr

This is right -- the main of query doesn't use "Sort" for ORDER BY,
because subselect is sorted by "Unique".

And almost same query, but in the subselect is union:

# explain select data from
(select data from addr
union
select data from addr2)
as x order by x.data;
-----------------------------------------
Sort
Sort Key: data
-> Subquery Scan x
-> Unique
-> Sort
Sort Key: data
-> Append
-> Subquery Scan "*SELECT* 1"
-> Seq Scan on addr
-> Subquery Scan "*SELECT* 2"
-> Seq Scan on addr2

I think it's bad, because there is used extra sort for ORDER BY for
already by "Unique" sorted data.

If I add ORDER BY to subselect:

# explain select data from
(select data from addr
union
select data from addr2 order by data)
as x order by x.data;
---------------------------------------------------
Sort
Sort Key: data
-> Subquery Scan x
-> Sort
Sort Key: data
-> Unique
-> Sort
Sort Key: data
-> Append
-> Subquery Scan "*SELECT* 1"
-> Seq Scan on addr
-> Subquery Scan "*SELECT* 2"
-> Seq Scan on addr2

I see two unnecessary sorts for unique and already sorted data.

The core of problem is probbaly UNION, because if I use simple query without
subselect it still sort already sorderd data:

# explain select data from addr
union
select data from addr2
order by data;
-----------------------------------
Sort
Sort Key: data
-> Unique
-> Sort
Sort Key: data
-> Append
-> Subquery Scan "*SELECT* 1"
-> Seq Scan on addr
-> Subquery Scan "*SELECT* 2"
-> Seq Scan on addr2

Or order of data which returns "unique" is for UNION diffrent that data
from DISTINCT? (see first example).

Karel

--
Karel Zak <zakkr@zf.jcu.cz>
http://home.zf.jcu.cz/~zakkr/

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Karel Zak (#1)
Re: union vs. sort

Karel Zak <zakkr@zf.jcu.cz> writes:

I'm surprise with query plan that PostgreSQL planner prepare for
selects with ORDER BY if all data are from sub-select that is already
sorted.

This isn't simply a matter of "omitting the sort". Even if the inputs
are sorted, their concatenation (Append result) isn't sorted: "1 2 3 4"
and "1 3 7 9" are sorted, but "1 2 3 4 1 3 7 9" isn't.

To do what you're thinking about, we'd have to build a variant
implementation of Unique that merges two presorted inputs --- and it
wouldn't work for more than two inputs (at least not without a lot of
pain ... Append is a messy special case in many ways, and we'd have to
duplicate most of that cruft to make an N-input version of Unique).
This is possible, without doubt, but I'm not excited about expending
that much time on it. You haven't shown any evidence that this would be
an important optimization in practice.

regards, tom lane

#3Karel Zak
zakkr@zf.jcu.cz
In reply to: Tom Lane (#2)
Re: union vs. sort

On Tue, Apr 06, 2004 at 10:33:25AM -0400, Tom Lane wrote:

Karel Zak <zakkr@zf.jcu.cz> writes:

I'm surprise with query plan that PostgreSQL planner prepare for
selects with ORDER BY if all data are from sub-select that is already
sorted.

This isn't simply a matter of "omitting the sort". Even if the inputs
are sorted, their concatenation (Append result) isn't sorted: "1 2 3 4"
and "1 3 7 9" are sorted, but "1 2 3 4 1 3 7 9" isn't.

I didn't talk about "Append" result, but about "Unique" result. The
ORDER BY in UNION query works with final concanated data -- that's
right. My question is why a result from this ORDER BY is again sorted:

# explain select data from
(select data from addr
union
select data from addr2 order by data)
as x order by x.data;
-----------------------------------------------
(1) Sort
Sort Key: data
-> Subquery Scan x
(2) -> Sort
Sort Key: data
-> Unique
(3) -> Sort
Sort Key: data
-> Append
-> Subquery Scan "*SELECT* 1"
-> Seq Scan on addr
-> Subquery Scan "*SELECT* 2"
-> Seq Scan on addr2

I see three sorts with same data.

To do what you're thinking about, we'd have to build a variant
implementation of Unique that merges two presorted inputs --- and it
wouldn't work for more than two inputs (at least not without a lot of
pain ... Append is a messy special case in many ways, and we'd have to
duplicate most of that cruft to make an N-input version of Unique).

I think it is not needful touch Append, but it should detect
redundant sorts. Why

"select data from (select data from addr order by data) as x order by x.data"

use only one sort?

This is possible, without doubt, but I'm not excited about expending
that much time on it. You haven't shown any evidence that this would be
an important optimization in practice.

It's nothing important for me. It's from Czech databases mailing list
where some PostgreSQL users was surprised with EXPLAIN result of UNION
and speed of these queries.

Karel

--
Karel Zak <zakkr@zf.jcu.cz>
http://home.zf.jcu.cz/~zakkr/

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Karel Zak (#3)
Re: union vs. sort

Karel Zak <zakkr@zf.jcu.cz> writes:

On Tue, Apr 06, 2004 at 10:33:25AM -0400, Tom Lane wrote:

This isn't simply a matter of "omitting the sort".

I didn't talk about "Append" result, but about "Unique" result. The
ORDER BY in UNION query works with final concanated data -- that's
right. My question is why a result from this ORDER BY is again sorted:

Oh, okay, that's just something that never got done, per this old
comment:

/*
* We set current_pathkeys NIL indicating we do not know sort
* order. This is correct when the top set operation is UNION
* ALL, since the appended-together results are unsorted even if
* the subplans were sorted. For other set operations we could be
* smarter --- room for future improvement!
*/

I've committed changes to do the right thing in CVS tip.

regards, tom lane

#5Karel Zak
zakkr@zf.jcu.cz
In reply to: Tom Lane (#4)
Re: union vs. sort

On Wed, Apr 07, 2004 at 02:20:55PM -0400, Tom Lane wrote:

I've committed changes to do the right thing in CVS tip.

Thanks man!

Karel

--
Karel Zak <zakkr@zf.jcu.cz>
http://home.zf.jcu.cz/~zakkr/