Finding (and deleting) dupes in relation table

Started by CSNover 20 years ago3 messagesgeneral
Jump to latest
#1CSN
cool_screen_name90001@yahoo.com

I have a table that relates id's of two other tables:

table1id, table2id

Dupes have found their way into it (create unique
index across both fields fails). Is there a quick and
easy way to find and delete the dupes (there are tens
of thousands of records)?

Thanks,
CSN

__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com

#2CSN
cool_screen_name90001@yahoo.com
In reply to: CSN (#1)
Re: Finding (and deleting) dupes in relation table

Nevermind, figured it out:

select distinct on (table1id, table2id) * into temp
from table3;
delete from table3;
insert into table3 select * from temp;

--- CSN <cool_screen_name90001@yahoo.com> wrote:

I have a table that relates id's of two other
tables:

table1id, table2id

Dupes have found their way into it (create unique
index across both fields fails). Is there a quick
and
easy way to find and delete the dupes (there are
tens
of thousands of records)?

Thanks,
CSN

__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com

__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com

#3Markus Wollny
Markus.Wollny@computec.de
In reply to: CSN (#2)
Re: Finding (and deleting) dupes in relation table

Hello!

CSN [cool_screen_name90001@yahoo.com] wrote:

I have a table that relates id's of two other tables:

table1id, table2id

Dupes have found their way into it (create unique index
across both fields fails). Is there a quick and easy way to
find and delete the dupes (there are tens of thousands of records)?

Thanks,
CSN

If your table was created WITH OIDS you could identify the duplicates
thus:

select a.table1id
, a.table12d
, max(a.oid) as maxoid
, count(a.oid) as coid
from schema.mytable a,
schema.mytable b
where a.table1id = b.table1id
and a.table2id=b.table2id
and a.oid <> b.oid
group by a.table1id, a.table2id
order by a.table1id;

If you wish to delete surplus rows, you might do the following:

delete from schema.mytable where oid in (
select maxoid from (
select a.table1id, a.table12d, max(a.oid) as
maxoid, count(a.oid) as coid
from schema.mytable a,
schema.mytable b
where a.table1id = b.table1id
and a.table2id=b.table2id
and a.oid <> b.oid
group by a.table1id, a.table2id
order by a.table1id ) as foo
where coid >1 );

This will delete the oldest tuple of a duplicate set of rows; if there
are more than two tuples in a set, you'll want to execute this a couple
of times until there's no duplicate left, as the delete will only reduce
a set by one tuple at a time. I'd also recommend to apply a PRIMARY KEY
constraint afterwards instead of just a unique index - this will prevent
NULL-entries as well as creating the desired unique index - and I think
it's good practice to have a primary key on about every table there is,
except when it's just a junk data table like a logging table where
content is regularly evaluated and discarded.

Kind regards

Markus