Select duplicated values

Started by jeffreyover 14 years ago5 messagesgeneral
Jump to latest
#1jeffrey
johjeffrey@hotmail.com

Lets say that the primary key column is A. I am trying to select all
the rows with duplicated values in columns B, C, and D.

I am not too experienced in SQL syntax, and I've used the following:
select A from table_name where B+C+D in (select B+C+D from table_name
group by B+C+D having count(*)>1 )

I'm looking for a better way, since I am just adding the three columns
together right now.

Jeffrey

#2Edson Carlos Ericksson Richter
richter@simkorp.com.br
In reply to: jeffrey (#1)
Re: Select duplicated values

I think you should not "add columns", but concatenate them.

Instead

select A from table_name where B+C+D in (select B+C+D from table_name
group by B+C+D having count(*)>1 )

use "B || '/' || C || '/' || D"

select A from table_name where B || '/' || C || '/' || D in (select B || '/' || C || '/' || D from table_name
group by 1 having count(*)>1 )

You may need to apply some conversion (cast) depending of data type and
output format.

Regards,

*Edson Carlos Ericksson Richter*
/SimKorp Informática Ltda/
Fone: (51) 3366-7964
Celular: (51)9318-9766/(51) 8585-0796
Embedded Image

Em 21-11-2011 20:23, jeffrey escreveu:

Show quoted text

Lets say that the primary key column is A. I am trying to select all
the rows with duplicated values in columns B, C, and D.

I am not too experienced in SQL syntax, and I've used the following:
select A from table_name where B+C+D in (select B+C+D from table_name
group by B+C+D having count(*)>1 )

I'm looking for a better way, since I am just adding the three columns
together right now.

Jeffrey

Attachments:

ffjhaadd.pngimage/png; name=ffjhaadd.pngDownload
#3Pete Yunker
peter.yunker@homejunction.com
In reply to: jeffrey (#1)
Re: Select duplicated values

Another option is to perform a self-join on columns B, C, and D (filtering out the 'same' record where a=a) instead of using the sub-select. This may yield better performance depending on the size of the table. Also, I don't believe the concatenation / sub-select will work if all of B, C, and D are all NULL.

SELECT x.a
FROM my_table x
INNER JOIN my_table y
ON ((x.b = y.b) or (x.b IS NULL AND y.b IS NULL))
AND ((x.c = y.c) or (x.c IS NULL AND y.c IS NULL))
AND ((x.d = y.d) or (x.d IS NULL AND y.d IS NULL))
AND x.a <> y.a

Another alternative to handling the NULL values is to COALESCE them to a value that would never exist in columns B, C, or D. I don't know the datatypes you are using, so I'll just use 'junk' for now.

SELECT x.a
FROM my_table x
INNER JOIN my_table y
ON COALESCE(x.b,'junk') = COALESCE(y.b,'junk')
AND COALESCE(x.c,'junk') = COALESCE(y.c,'junk')
AND COALESCE(x.d,'junk') = COALESCE(y.d,'junk')
AND x.a <> y.a

---
Pete Yunker
Vice President of Data Products
Home Junction, Inc.

On Nov 21, 2011, at 5:23 PM, jeffrey wrote:

Show quoted text

Lets say that the primary key column is A. I am trying to select all
the rows with duplicated values in columns B, C, and D.

I am not too experienced in SQL syntax, and I've used the following:
select A from table_name where B+C+D in (select B+C+D from table_name
group by B+C+D having count(*)>1 )

I'm looking for a better way, since I am just adding the three columns
together right now.

Jeffrey

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: jeffrey (#1)
Re: Select duplicated values

On Nov 21, 2011, at 17:23, jeffrey <johjeffrey@hotmail.com> wrote:

Lets say that the primary key column is A. I am trying to select all
the rows with duplicated values in columns B, C, and D.

I am not too experienced in SQL syntax, and I've used the following:
select A from table_name where B+C+D in (select B+C+D from table_name
group by B+C+D having count(*)>1 )

I'm looking for a better way, since I am just adding the three columns
together right now.

Jeffrey

Do you really want these to evaluate to equal (B, C, D)?

(1, 0, 1) = (0, 1, 1); they both sum to 2 but both B and C are unequal

There is no need to do anything other than list each field individually in both the SELECT and the GROUP BY.

SELECT B, C, D
FROM table
GROUP BY B, C, D
HAVING count(*) > 1;

Whatever possessed you to consider that you had to add them to get what you need? (assuming you don't want my examples to be considered equal) Besides the fact it will give you WRONG RESULTS any decent set of GROUP BY examples will show you that you can list/use multiple fields in a grouping query.

Now, if you truly want the two samples above to evaluate to equal then you do need to do some form of consolidation (like the adding in your example). That, however, would be very unusual.

David J.

#5David G. Johnston
david.g.johnston@gmail.com
In reply to: David G. Johnston (#4)
Re: Select duplicated values

On Nov 21, 2011, at 21:11, David Johnston <polobo@yahoo.com> wrote:

On Nov 21, 2011, at 17:23, jeffrey <johjeffrey@hotmail.com> wrote:

Lets say that the primary key column is A. I am trying to select all
the rows with duplicated values in columns B, C, and D.

I am not too experienced in SQL syntax, and I've used the following:
select A from table_name where B+C+D in (select B+C+D from table_name
group by B+C+D having count(*)>1 )

I'm looking for a better way, since I am just adding the three columns
together right now.

Jeffrey

Do you really want these to evaluate to equal (B, C, D)?

(1, 0, 1) = (0, 1, 1); they both sum to 2 but both B and C are unequal

There is no need to do anything other than list each field individually in both the SELECT and the GROUP BY.

SELECT B, C, D
FROM table
GROUP BY B, C, D
HAVING count(*) > 1;

Whatever possessed you to consider that you had to add them to get what you need? (assuming you don't want my examples to be considered equal) Besides the fact it will give you WRONG RESULTS any decent set of GROUP BY examples will show you that you can list/use multiple fields in a grouping query.

Now, if you truly want the two samples above to evaluate to equal then you do need to do some form of consolidation (like the adding in your example). That, however, would be very unusual.

David J.

So, in a sub-select you would do:

... WHERE ROW(B, C, D) IN (SELECT B, C, D FROM ... GROUP BY B, C, D)

Note that the word ROW is optional ( but not the parentheses )

David J.