Keeping top N records of a group

Started by Alex Magnumalmost 10 years ago2 messagesgeneral
Jump to latest
#1Alex Magnum
magnum11200@gmail.com

Hi,
i want to archive data and am looking for a query to replace a rather slow
function i am currently using.

The idea is that for every sym and doc_key I want to keep the records of
the top 2 sources. Eg. in below table I want to archive
sym 1022 of source 3000 but keep the 4 records from sources 4045 and 4081.

Any help on this would be appreciated.

source | sym | doc_id | doc_key
-----------+--------------+---------+--------------
3000 | 1022 | 551008 | 23834363 <-- Archive
3000 | 1022 | 551008 | 23834363 <-- Archive
4045 | 1022 | 3699290 | 23834363
4045 | 1022 | 3699290 | 23834363
4081 | 1022 | 3811196 | 23834363
4081 | 1022 | 3811196 | 23834363
3000 | 1028 | 550997 | 23834363
3468 | 1085 | 1740526 | 23834363
3000 | 1149 | 551017 | 23834363
4045 | 1149 | 3699291 | 23834363
3000 | 12 | 551015 | 23834363
3000 | 12 | 551015 | 23834363
3951 | 12 | 3147700 | 23834363
3951 | 12 | 3147700 | 23834363
3000 | 13 | 551000 | 23834363
4045 | 13 | 3699283 | 23834363
3000 | 1327 | 551010 | 23834363
3971 | 1327 | 3394469 | 23834363
3000 | 15 | 551001 | 23834363
4045 | 15 | 3699284 | 23834363
3000 | 16 | 551002 | 23834363
4045 | 16 | 3699285 | 23834363
3000 | 18 | 551013 | 23834363
4045 | 18 | 3699286 | 23834363
3000 | 257 | 551005 | 23834363
3951 | 257 | 3147701 | 23834363
3000 | 2795 | 551011 | 23834363 <-- Archive
3459 | 2795 | 1710571 | 23834363 <-- Archive
3905 | 2795 | 2994791 | 23834363
4045 | 2795 | 3699292 | 23834363
3060 | 2913 | 856199 | 23834363
3000 | 2954 | 551012 | 23834363 <-- Archive
3971 | 2954 | 3394470 | 23834363
4212 | 2954 | 4650870 | 23834363
3183 | 3427 | 1055492 | 23834363 <-- Archive
3971 | 3427 | 3394471 | 23834363
4248 | 3427 | 4763105 | 23834363
3188 | 594 | 1062642 | 23834363 <-- Archive
3188 | 594 | 1062642 | 23834363 <-- Archive
3191 | 594 | 1067501 | 23834363 <-- Archive
3191 | 594 | 1067501 | 23834363 <-- Archive
3192 | 594 | 1068391 | 23834363 <-- Archive
3192 | 594 | 1068391 | 23834363 <-- Archive
3199 | 594 | 1096070 | 23834363 <-- Archive
3199 | 594 | 1096070 | 23834363 <-- Archive
3303 | 594 | 1305467 | 23834363
3303 | 594 | 1305467 | 23834363
4117 | 594 | 4000987 | 23834363

Thanks a lot for any ideas.
Alex

#2Andreas Kretschmer
akretschmer@spamfence.net
In reply to: Alex Magnum (#1)
Re: Keeping top N records of a group

Alex Magnum <magnum11200@gmail.com> wrote:

Hi,
i want to archive data and am looking for a query to replace a rather slow
function i am currently using.

The idea is that for every sym and doc_key I want to keep the records of the
top 2 sources. Eg. in below table I want to archive�

sounds like a case for window-functions here (maybe row_number() or
rank()), please read that for the start:

http://stackoverflow.com/questions/7613785/postgresql-top-n-entries-per-item-in-same-table

Regards, Andreas Kretschmer
--
Andreas Kretschmer
http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general