DISTINCT ON without ORDER BY

Started by Martijn van Oosterhoutabout 17 years ago4 messagesgeneral
Jump to latest
#1Martijn van Oosterhout
kleptog@svana.org

Hi,

I was going through the queries of an SQL application and came across
queries like:

SELECT * FROM foo
WHERE id in (SELECT max(id) FROM foo GROUP BY bar);

I thought, here's a case where this could be better written using
DISTINCT ON, since then you avoid the self-join:

SELECT DISTINCT ON (bar) * FROM
ORDER BY bar, id DESC;

However, this was slower because the original query could use a hash
aggregate whereas the new query needed to do a sort. The way DISTINCT
ON is defined it requires an ORDER BY whereas semantically the ordering
on the first attribute is just a by product of the old implementation.

Is there a way to acheive the above result without a sort and without a
self-join?

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#2Bruce Momjian
bruce@momjian.us
In reply to: Martijn van Oosterhout (#1)
Re: DISTINCT ON without ORDER BY

Martijn van Oosterhout <kleptog@svana.org> writes:

SELECT * FROM foo
WHERE id in (SELECT max(id) FROM foo GROUP BY bar);

Is there a way to acheive the above result without a sort and without a
self-join?

Something like

SELECT bar, (magic_agg_func(foo)).* FROM foo GROUP BY bar

where you define an aggregate function magic_agg_func to remember the whole
record for the largest value of id. Something like:

postgres=# create function magic_transition(a,a) returns a as 'select case when $1.aid > $2.aid then $1 else $2 end' language sql;
postgres=# create aggregate magic (a) (sfunc = magic_transition, stype = a);

Not sure it'll be faster though.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning

#3Jasen Betts
jasen@xnet.co.nz
In reply to: Martijn van Oosterhout (#1)
Re: DISTINCT ON without ORDER BY

On 2009-04-19, Martijn van Oosterhout <kleptog@svana.org> wrote:

Hi,

I was going through the queries of an SQL application and came across
queries like:

SELECT * FROM foo
WHERE id in (SELECT max(id) FROM foo GROUP BY bar);

I thought, here's a case where this could be better written using
DISTINCT ON, since then you avoid the self-join:

SELECT DISTINCT ON (bar) * FROM
ORDER BY bar, id DESC;

However, this was slower because the original query could use a hash
aggregate whereas the new query needed to do a sort. The way DISTINCT
ON is defined it requires an ORDER BY whereas semantically the ordering
on the first attribute is just a by product of the old implementation.

Is there a way to acheive the above result without a sort and without a
self-join?

anyway you could possibly write an agregate function that returns a
copy of the row with the highest id?

#4Martijn van Oosterhout
kleptog@svana.org
In reply to: Jasen Betts (#3)
Re: DISTINCT ON without ORDER BY

On Tue, Apr 21, 2009 at 12:11:26PM +0000, Jasen Betts wrote:

Is there a way to acheive the above result without a sort and without a
self-join?

anyway you could possibly write an agregate function that returns a
copy of the row with the highest id?

Put that way it sounds like something for a window function. Not sure
if they can use a HashAggrgate though.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.