How to get rid of dups...

Started by Jeremy Cowgarover 23 years ago2 messagesgeneral

Jump to latest

Jeremy Cowgar

develop@cowgar.com

over 23 years ago

I need to get rid of all rows that have dups in the columns
tpa,pun,grn,claim ... i.e.

1--- 001 001 001 00-000001 John Doe
2--- 001 001 001 00-000001 Jane Doe
3--- 001 002 001 00-000001 John Doe

1 and 2 would be dups, 1 and 3 are diff records, 2 and 3 are diff
records.

I tried this as a test:

select count(claimid), tpa, pun, grn, claim FROM claim_import GROUP BY
tpa, pun, grn, claim HAVING count(claimid) > 1;
26 rows returned.

then

select distinct on (tpa,pun,grn,claim) count(claimid), tpa, pun, grn,
claim FROM claim_import GROUP BY tpa, pun, grn, claim HAVING
count(claimid) > 1;

and still had 26 rows returned. not sure how that can happen.

Anyway, then I tried

CREATE UNIQUE INDEX tmpidx ON claim (tpa,pun,grn,claim);

that of course failed, stating dups existed.

Any help would be appreciated.

Thanks,

Jeremy

Import Notes

Reply to msg id not found: LBEELEIJLNEGIFHEOKDICEANCAAA.jeff@tsunamicreek.comReference msg id not found: LBEELEIJLNEGIFHEOKDICEANCAAA.jeff@tsunamicreek.com

Kevin Brannen

kevinb@nurseamerica.net

over 23 years ago

In reply to: Jeremy Cowgar (#1)

Re: How to get rid of dups...

Jeremy Cowgar wrote:

I need to get rid of all rows that have dups in the columns
tpa,pun,grn,claim ... i.e.

1--- 001 001 001 00-000001 John Doe
2--- 001 001 001 00-000001 Jane Doe
3--- 001 002 001 00-000001 John Doe

1 and 2 would be dups, 1 and 3 are diff records, 2 and 3 are diff
records.

I tried this as a test:

select count(claimid), tpa, pun, grn, claim FROM claim_import GROUP BY
tpa, pun, grn, claim HAVING count(claimid) > 1;
26 rows returned.

then

select distinct on (tpa,pun,grn,claim) count(claimid), tpa, pun, grn,
claim FROM claim_import GROUP BY tpa, pun, grn, claim HAVING
count(claimid) > 1;

It's not obvious to me what your key(s) is (all 3 columns?), but this is
a place where self-joins are useful. Assuming a table like:

create table stuff (
id int, -- primary table key
value int, -- unique data key
...);

You should be able to find the dups with something like:

select b.id
from stuff a, stuff b
where a.value = b.value
and a.id < b.id;

Given that, then use it to get:

delete from stuff
where id in (select b.id from stuff a, stuff b where ...);

Be careful and experiment with the select until you're 110% sure you
like what you see. :-) Adapt this approach to your real table and you
should be set.

HTH,
Kevin

Import Notes

Reference msg id not found: LBEELEIJLNEGIFHEOKDICEANCAAA.jeff@tsunamicreek.com