DISTINCT ON without ORDER BY
Hi,
I was going through the queries of an SQL application and came across
queries like:
SELECT * FROM foo
WHERE id in (SELECT max(id) FROM foo GROUP BY bar);
I thought, here's a case where this could be better written using
DISTINCT ON, since then you avoid the self-join:
SELECT DISTINCT ON (bar) * FROM
ORDER BY bar, id DESC;
However, this was slower because the original query could use a hash
aggregate whereas the new query needed to do a sort. The way DISTINCT
ON is defined it requires an ORDER BY whereas semantically the ordering
on the first attribute is just a by product of the old implementation.
Is there a way to acheive the above result without a sort and without a
self-join?
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.
Martijn van Oosterhout <kleptog@svana.org> writes:
SELECT * FROM foo
WHERE id in (SELECT max(id) FROM foo GROUP BY bar);Is there a way to acheive the above result without a sort and without a
self-join?
Something like
SELECT bar, (magic_agg_func(foo)).* FROM foo GROUP BY bar
where you define an aggregate function magic_agg_func to remember the whole
record for the largest value of id. Something like:
postgres=# create function magic_transition(a,a) returns a as 'select case when $1.aid > $2.aid then $1 else $2 end' language sql;
postgres=# create aggregate magic (a) (sfunc = magic_transition, stype = a);
Not sure it'll be faster though.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning
On 2009-04-19, Martijn van Oosterhout <kleptog@svana.org> wrote:
Hi,
I was going through the queries of an SQL application and came across
queries like:SELECT * FROM foo
WHERE id in (SELECT max(id) FROM foo GROUP BY bar);I thought, here's a case where this could be better written using
DISTINCT ON, since then you avoid the self-join:SELECT DISTINCT ON (bar) * FROM
ORDER BY bar, id DESC;However, this was slower because the original query could use a hash
aggregate whereas the new query needed to do a sort. The way DISTINCT
ON is defined it requires an ORDER BY whereas semantically the ordering
on the first attribute is just a by product of the old implementation.Is there a way to acheive the above result without a sort and without a
self-join?
anyway you could possibly write an agregate function that returns a
copy of the row with the highest id?
On Tue, Apr 21, 2009 at 12:11:26PM +0000, Jasen Betts wrote:
Is there a way to acheive the above result without a sort and without a
self-join?anyway you could possibly write an agregate function that returns a
copy of the row with the highest id?
Put that way it sounds like something for a window function. Not sure
if they can use a HashAggrgate though.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.