How to idenity duplicate rows
Can anybody assist with this problem I have a table that has some duplicated
rows of data, I want to place a unique constraint on the columns userid
and procno to eliminate this problem in the future but how do I identify and
get rid of the existing duplication.
Thanks in advance
Regards
David Inglis
David Inglis wrote:
Can anybody assist with this problem I have a table that has some
duplicated rows of data, I want to place a unique constraint on the
columns userid and procno to eliminate this problem in the future but
how do I identify and get rid of the existing duplication.
To find them, something like
SELECT a, b, c FROM table GROUP BY a, b, c HAVING count(*) > 1;
comes to mind, where you have to list all columns of the table in place
of a, b, c.
As for deleting all but one row in a duplicated group, you're going to
have to get at them by the oid or ctid columns perhaps.
The other idea is to run CREATE TABLE newtable AS SELECT DISTINCT * FROM
oldtable;.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Peter Eisentraut wrote:
David Inglis wrote:
Can anybody assist with this problem I have a table that has some
duplicated rows of data, I want to place a unique constraint on the
columns userid and procno to eliminate this problem in the future but
how do I identify and get rid of the existing duplication.To find them, something like
SELECT a, b, c FROM table GROUP BY a, b, c HAVING count(*) > 1;
comes to mind, where you have to list all columns of the table in place
of a, b, c.As for deleting all but one row in a duplicated group, you're going to
have to get at them by the oid or ctid columns perhaps.The other idea is to run CREATE TABLE newtable AS SELECT DISTINCT * FROM
oldtable;.
This doesn't bring over to the new table any foreign key relationships
or triggers.
Another approach (if you don't have OID's) is to create uniqueness by
appending a column to the table, populating it with sequential integers.
Then you proceed as otherwise suggested above by using aggregation to
identify the duplicated rows.
As for deleting all but one row in a duplicated group, you're going to
have to get at them by the oid or ctid columns perhaps.The other idea is to run CREATE TABLE newtable AS SELECT DISTINCT * FROM
oldtable;.
I believe getting oid and/or ctid is not possible since it would not
display/get duplicate records
in a "HAVING CLAUSE" since their oid/ctid wouldn't be the same.
And creating a newtable and use SELECT DISTINCT isn't an acceptable idea to
the audit team when you have millions of records in a table in production
server when you only need to remove let say 70 records from that table.
I tried another approach where I queried and insert the duplicate record (35
records) into
a new/temporary table. Then I created a stored procedure something like
this:
--START
DECLARE
foo table;
BEGIN
FOR foo IN
SELECT * FROM newtable
LOOP
DELETE FROM oldtable
where oldtable.field1 = foo.field1
and oldtable.field2 = foo.field2
and oldtable.field3 = foo.field3
and oldtable.field4 = foo.field4
...';
END LOOP;
END;
-- END
Problem with this approach I got 35 duplicate records (count = 2)
from the new table and delete only 20 records from the oldtable.
If I am not mistaken it should delete 70 records.
I wonder