Efficiently obtaining (any) one record per group.

Started by Allan Kamauover 15 years ago3 messagesgeneral
Jump to latest
#1Allan Kamau
kamauallan@gmail.com

Hi all,

I have a large table that contains redundancies as per one field.

I am looking for a way to identify (or extract) a non redundant set of
rows ( _any_ one record per group) from this table and for each record
of this "distinct" set of rows, I would like to capture it's other
fields.
Below is a simplified example. In this example I would like to base
groups on the value of field "reading".

CREATE TABLE foo
(id INTEGER
,reading INTEGER
,entry_date TIMESTAMP
,source TEXT
,primary key(id)
);

INSERT INTO foo(1,55,'2010-04-01 06:31:13','A');
INSERT INTO foo(2,55,'2010-04-01 06:31:20','X');
INSERT INTO foo(3,45,'2010-04-01 06:38:02','P');
INSERT INTO foo(6,55,'2010-04-01 06:21:44','B');
INSERT INTO foo(4,34,'2010-04-01 06:51:24','K');
INSERT INTO foo(8,61,'2010-04-01 06:22:03','A');
INSERT INTO foo(9,34,'2010-04-01 06:48:07','C');

Desired output (any record selected based on "reading" field).

id,reading,entry_date,source
1,55,'2010-04-01 06:31:13','A'
3,45,'2010-04-01 06:38:02','P'
9,34,'2010-04-01 06:48:07','C'
8,61,'2010-04-01 06:22:03','A'

I am worried that using min() and group by to generate a relation
which is then used in a join may be slow.

SELECT
a.*
FROM
foo a
JOIN
(
SELECT
min(a.id)AS id_min
FROM
foo a
GROUP BY
a.reading
)b
ON
a.id=b.id_min
;

How is the performance of rank() (window function) in general?

Allan.

#2Scott Marlowe
scott.marlowe@gmail.com
In reply to: Allan Kamau (#1)
Re: Efficiently obtaining (any) one record per group.

On Sat, Jul 24, 2010 at 12:56 AM, Allan Kamau <kamauallan@gmail.com> wrote:

Hi all,

I have a large table that contains redundancies as per one field.

I am looking for a way to identify (or extract) a non redundant set of
rows ( _any_ one record per group) from this table and for each record
of this "distinct" set of rows, I would like to capture it's other
fields.

Would "Select distinct on ... " work for you?

http://www.postgresql.org/docs/8.4/static/sql-select.html

search the page for "distinct on".

#3Allan Kamau
kamauallan@gmail.com
In reply to: Scott Marlowe (#2)
Re: Efficiently obtaining (any) one record per group.

On Sat, Jul 24, 2010 at 10:38 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Sat, Jul 24, 2010 at 12:56 AM, Allan Kamau <kamauallan@gmail.com> wrote:

Hi all,

I have a large table that contains redundancies as per one field.

I am looking for a way to identify (or extract) a non redundant set of
rows ( _any_ one record per group) from this table and for each record
of this "distinct" set of rows, I would like to capture it's other
fields.

Would "Select distinct on ... " work for you?

http://www.postgresql.org/docs/8.4/static/sql-select.html

search the page for "distinct on".

Thank you Scott, you have made my day, this is exactly what I was looking for.

Allan.