Finding (and deleting) dupes in relation table
I have a table that relates id's of two other tables:
table1id, table2id
Dupes have found their way into it (create unique
index across both fields fails). Is there a quick and
easy way to find and delete the dupes (there are tens
of thousands of records)?
Thanks,
CSN
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
Nevermind, figured it out:
select distinct on (table1id, table2id) * into temp
from table3;
delete from table3;
insert into table3 select * from temp;
--- CSN <cool_screen_name90001@yahoo.com> wrote:
I have a table that relates id's of two other
tables:table1id, table2id
Dupes have found their way into it (create unique
index across both fields fails). Is there a quick
and
easy way to find and delete the dupes (there are
tens
of thousands of records)?Thanks,
CSN__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
Import Notes
Resolved by subject fallback
Hello!
CSN [cool_screen_name90001@yahoo.com] wrote:
I have a table that relates id's of two other tables:
table1id, table2id
Dupes have found their way into it (create unique index
across both fields fails). Is there a quick and easy way to
find and delete the dupes (there are tens of thousands of records)?Thanks,
CSN
If your table was created WITH OIDS you could identify the duplicates
thus:
select a.table1id
, a.table12d
, max(a.oid) as maxoid
, count(a.oid) as coid
from schema.mytable a,
schema.mytable b
where a.table1id = b.table1id
and a.table2id=b.table2id
and a.oid <> b.oid
group by a.table1id, a.table2id
order by a.table1id;
If you wish to delete surplus rows, you might do the following:
delete from schema.mytable where oid in (
select maxoid from (
select a.table1id, a.table12d, max(a.oid) as
maxoid, count(a.oid) as coid
from schema.mytable a,
schema.mytable b
where a.table1id = b.table1id
and a.table2id=b.table2id
and a.oid <> b.oid
group by a.table1id, a.table2id
order by a.table1id ) as foo
where coid >1 );
This will delete the oldest tuple of a duplicate set of rows; if there
are more than two tuples in a set, you'll want to execute this a couple
of times until there's no duplicate left, as the delete will only reduce
a set by one tuple at a time. I'd also recommend to apply a PRIMARY KEY
constraint afterwards instead of just a unique index - this will prevent
NULL-entries as well as creating the desired unique index - and I think
it's good practice to have a primary key on about every table there is,
except when it's just a junk data table like a logging table where
content is regularly evaluated and discarded.
Kind regards
Markus
Import Notes
Resolved by subject fallback