problem with distinct not distincting...

Started by John Beynonover 13 years ago8 messagesgeneral
Jump to latest
#1John Beynon
john@kyan.com

I have a pretty basic query;

select distinct on (name) name, length(name) from
drugs
where customer_id IS NOT NULL
order by name;

which I'd expect to only return me a single drug name if there are
duplicates, yet I get

name | length
==========
Roaccutane | 10
Roaccutane | 10

table encoding is UTF8...

I'm scratching my head!

Thanks,

John.

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: John Beynon (#1)
Re: problem with distinct not distincting...

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of John Beynon
Sent: Wednesday, October 17, 2012 6:48 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] problem with distinct not distincting...

I have a pretty basic query;

select distinct on (name) name, length(name) from drugs where
customer_id IS NOT NULL order by name;

which I'd expect to only return me a single drug name if there are

duplicates,

yet I get

name | length
==========
Roaccutane | 10
Roaccutane | 10

table encoding is UTF8...

I'm scratching my head!

Thanks,

John.

So, the following returns one record as expected on 9.0.3:

SELECT DISTINCT ON (f) f, length(l)
FROM (VALUES ('David','Johnston'),('David','Smith')) x (f, l)
ORDER BY f

Try:

SELECT name, count(*) FROM drugs where customer_id IS NOT NULL GROUP BY name

To see whether the GROUP BY logic considers the names identical.

Using "name" as a column name also sometimes has issues so maybe try giving
it an alias:

SELECT ... FROM (SELECT name AS customer_name FROM drugs WHERE ....) AS
drug_aliased ...

DISTINCT ON has uses but I try to avoid it myself. In this specific case
the "ON" is redundant since a simple DISTINCT will give you the same
results.

You also need to provide the PostgreSQL version and possibly server
platform.

David J.

#3Susan Cassidy
scassidy@edgewave.com
In reply to: John Beynon (#1)
Re: problem with distinct not distincting...

Are you sure that one of those entries doesn't have a trailing space?

Susan

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John Beynon
Sent: Wednesday, October 17, 2012 3:48 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] problem with distinct not distincting...

I have a pretty basic query;

select distinct on (name) name, length(name) from drugs where customer_id IS NOT NULL order by name;

which I'd expect to only return me a single drug name if there are duplicates, yet I get

name | length
==========
Roaccutane | 10
Roaccutane | 10

table encoding is UTF8...

I'm scratching my head!

Thanks,

John.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4John Beynon
john@kyan.com
In reply to: David G. Johnston (#2)
Re: problem with distinct not distincting...

I just managed to solve the problem infact.

The trailing 'e' character on the name was different for one row. All
my tools, (pgadmin and the source data in openoffice) showed the same
'e' character but psql showed it as different character...

Thanks for all taking the time to read / answer. It stumped me for a while!

John.

On Wed, Oct 17, 2012 at 4:01 PM, David Johnston <polobo@yahoo.com> wrote:

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of John Beynon
Sent: Wednesday, October 17, 2012 6:48 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] problem with distinct not distincting...

I have a pretty basic query;

select distinct on (name) name, length(name) from drugs where
customer_id IS NOT NULL order by name;

which I'd expect to only return me a single drug name if there are

duplicates,

yet I get

name | length
==========
Roaccutane | 10
Roaccutane | 10

table encoding is UTF8...

I'm scratching my head!

Thanks,

John.

So, the following returns one record as expected on 9.0.3:

SELECT DISTINCT ON (f) f, length(l)
FROM (VALUES ('David','Johnston'),('David','Smith')) x (f, l)
ORDER BY f

Try:

SELECT name, count(*) FROM drugs where customer_id IS NOT NULL GROUP BY name

To see whether the GROUP BY logic considers the names identical.

Using "name" as a column name also sometimes has issues so maybe try giving
it an alias:

SELECT ... FROM (SELECT name AS customer_name FROM drugs WHERE ....) AS
drug_aliased ...

DISTINCT ON has uses but I try to avoid it myself. In this specific case
the "ON" is redundant since a simple DISTINCT will give you the same
results.

You also need to provide the PostgreSQL version and possibly server
platform.

David J.

--
John Beynon
Kyanmedia Ltd.
Direct line: 01483 405210
http://kyan.com

Registered in England. Company number: 4575679
Registered Office: 171 High Street, Guildford GU1 3AJ

#5Susan Cassidy
scassidy@edgewave.com
In reply to: Susan Cassidy (#3)
Re: problem with distinct not distincting...

I see you have the length included.

Perhaps, UTF8 characters in one and ascii in the other?

Susan

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Susan Cassidy
Sent: Wednesday, October 17, 2012 8:27 AM
To: John Beynon; pgsql-general@postgresql.org
Subject: Re: [GENERAL] problem with distinct not distincting...

Are you sure that one of those entries doesn't have a trailing space?

Susan

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John Beynon
Sent: Wednesday, October 17, 2012 3:48 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] problem with distinct not distincting...

I have a pretty basic query;

select distinct on (name) name, length(name) from drugs where customer_id IS NOT NULL order by name;

which I'd expect to only return me a single drug name if there are duplicates, yet I get

name | length
==========
Roaccutane | 10
Roaccutane | 10

table encoding is UTF8...

I'm scratching my head!

Thanks,

John.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Beynon (#1)
Re: problem with distinct not distincting...

John Beynon <john@kyan.com> writes:

I have a pretty basic query;
select distinct on (name) name, length(name) from
drugs
where customer_id IS NOT NULL
order by name;

which I'd expect to only return me a single drug name if there are
duplicates, yet I get

name | length
==========
Roaccutane | 10
Roaccutane | 10

table encoding is UTF8...

I'm scratching my head!

Yeah, me too. What PG version is this exactly? What does EXPLAIN
show for the query? Can you extract a self-contained test case?

regards, tom lane

#7Merlin Moncure
mmoncure@gmail.com
In reply to: John Beynon (#1)
Re: problem with distinct not distincting...

On Wed, Oct 17, 2012 at 5:48 AM, John Beynon <john@kyan.com> wrote:

I have a pretty basic query;

select distinct on (name) name, length(name) from
drugs
where customer_id IS NOT NULL
order by name;

which I'd expect to only return me a single drug name if there are
duplicates, yet I get

name | length
==========
Roaccutane | 10
Roaccutane | 10

table encoding is UTF8...

I'm scratching my head!

as a sanity check, try:

select distinct on (name) name, length(name), md5(name) from
drugs
where customer_id IS NOT NULL
order by name;

merlin

#8Chris Angelico
rosuav@gmail.com
In reply to: John Beynon (#4)
Re: problem with distinct not distincting...

On Thu, Oct 18, 2012 at 2:32 AM, John Beynon <john@kyan.com> wrote:

I just managed to solve the problem infact.

The trailing 'e' character on the name was different for one row. All
my tools, (pgadmin and the source data in openoffice) showed the same
'e' character but psql showed it as different character...

Thanks for all taking the time to read / answer. It stumped me for a while!

Cool! That's where the sanity check Merlin suggested comes in really
handy. You can manually key in what you think it's showing, and
compare with the two rows:

sikorsky@sikorsky:~$ psql -c "select md5('Roaccutane')"
md5
----------------------------------
70f818454267c719ed612f50fe563f64
(1 row)

Anything that shows a different hash is clearly different... somehow.

ChrisA