question (or feature-request): over ( partition by ... order by LIMIT N)

Started by A. Kretschmeralmost 16 years ago4 messages
#1A. Kretschmer
andreas.kretschmer@schollglas.com

Hello @all,

I know, i can do:

select * from (select ... row_number() over (...) ...) foo where row_number < N

to limit the rows per group, but the inner select has to retrieve the
whole set of records and in the outer select most of them discarded.

Why isn't there an over ( ... LIMIT N) ?

Other (better) solution?

Regards, Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99

#2David Fetter
david@fetter.org
In reply to: A. Kretschmer (#1)
Re: [GENERAL] question (or feature-request): over ( partition by ... order by LIMIT N)

On Wed, Mar 24, 2010 at 06:31:59PM +0100, A. Kretschmer wrote:

Hello @all,

I know, i can do:

select * from (select ... row_number() over (...) ...) foo where
row_number < N

to limit the rows per group, but the inner select has to retrieve
the whole set of records and in the outer select most of them
discarded.

That sounds like the optimizer's falling down on the job. Would this
be difficult to fix?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#3Robert Haas
robertmhaas@gmail.com
In reply to: David Fetter (#2)
Re: Re: [GENERAL] question (or feature-request): over ( partition by ... order by LIMIT N)

On Thu, Mar 25, 2010 at 5:17 PM, David Fetter <david@fetter.org> wrote:

On Wed, Mar 24, 2010 at 06:31:59PM +0100, A. Kretschmer wrote:

Hello @all,

I know, i can do:

select * from (select ... row_number() over (...) ...) foo where
row_number < N

to limit the rows per group, but the inner select has to retrieve
the whole set of records and in the outer select most of them
discarded.

That sounds like the optimizer's falling down on the job.  Would this
be difficult to fix?

I may not be the best person to offer an opinion on this topic, but it
sounds tricky to me. I think it would need some kind of extremely
specific special-case logic. The planner would have to recognize
row_number() < n, row_number() <= n, and row_number = n as special
cases indicating that n-1, n, and n records respectively should be
expected to be fetched from the partition. And you might also worry
about n > row_number(), and n >= row_number().

It might be worth doing because I suspect that is actually going to be
a fairly common type of query, but some thought needs to be given to
how to do it without resorting to abject kludgery.

...Robert

#4Hitoshi Harada
umi.tanuki@gmail.com
In reply to: David Fetter (#2)
Re: Re: [GENERAL] question (or feature-request): over ( partition by ... order by LIMIT N)

2010/3/26 David Fetter <david@fetter.org>:

On Wed, Mar 24, 2010 at 06:31:59PM +0100, A. Kretschmer wrote:

Hello @all,

I know, i can do:

select * from (select ... row_number() over (...) ...) foo where
row_number < N

to limit the rows per group, but the inner select has to retrieve
the whole set of records and in the outer select most of them
discarded.

That sounds like the optimizer's falling down on the job.  Would this
be difficult to fix?

I believe this isn't the task of window functions. In fact, "over( ...
LIMIT n)" or optimizer hack will fail on multiple window definitions.

To take top N items of each group (I agree this is quite common job),
I'd suggest syntax that is done by extending DISTINCT ON.

SELECT DISTINCT n ON(key1, key2) ...

where "n" means top "n" items on each "key1, key2" group. The current
DISTINCT ON() syntax is equivalent to DISTINCT 1 ON() in this way.
That'll be fairly easy to implement and you aren't be bothered by this
like multiple window definitions. The cons of this is that it can be
applied to only row_number logic. You may want to use rank,
dense_rank, etc. sometimes.

Regards,

--
Hitoshi Harada