distinct not working in a multiple join

Started by David Rio Deirosabout 20 years ago5 messagesgeneral
Jump to latest
#1David Rio Deiros
driodeiros@gmail.com

Hi there,

I have some issues with the query attached at the end of this email.
If I run that query I got this output ( I have removed some of the
fields) despite the distinct clause:

QC Q&A | www.xxx.com | 44281
QC Q&A | www.xxx.com | 44281
WhyMAX? | | 44285

But, and here is the weird thing, if I remove g.group_id from the
selection list then I got what I expected:

QC Q&A | www.xxx.com | 44281
WhyMAX? | | 44285
toto | rufus | 44286

Can someone explain me what is going on here? What am I missing?

Thanks for you help in advance,

David

SELECT
distinct ar.title,
ar.raw_data,
ar.upload_dt,
ar.artifact_id,
g.group_id,
acl.read
FROM
artifact_acl acl,
artifacts a,
artifact_revisions ar,
revisions_to_types rt,
artifact_types at,
groups g
WHERE
a.expire_dt > NOW() and
acl.artifact_id = a.artifact_id and
a.published_revision = ar.revision_id and
ar.revision_id = rt.revision_id and
rt.type_id = at.type_id and
acl.group_id = g.group_id and
a.suppress = false and
at.is_resource = true and
(
acl.group_id = (
select
group_id
from
groups
where
group_nm = 'ting'
)
OR
acl.group_id = (
select
group_id
from
groups
where
group_nm = 'tors'
)
) and
acl.group_id IN (4,17,54,2,1,123) and
acl.read = true and
((g.back_or_front = 'front') or
(g.group_nm = 'PR Admin'))
GROUP BY
ar.title
ORDER BY
ar.upload_dt DESC
LIMIT 3;

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rio Deiros (#1)
Re: distinct not working in a multiple join

David Rio Deiros <driodeiros@gmail.com> writes:

I have some issues with the query attached at the end of this email.
If I run that query I got this output ( I have removed some of the
fields) despite the distinct clause:

QC Q&A | www.xxx.com | 44281
QC Q&A | www.xxx.com | 44281
WhyMAX? | | 44285

Since you removed some fields, no one can tell if this output
is wrong or not.

SELECT
distinct ar.title,
ar.raw_data,
ar.upload_dt,
ar.artifact_id,
g.group_id,
acl.read
FROM

The way you formatted that makes me wonder if you think that the
DISTINCT applies only to the first column. It does not, it applies
to all the columns together --- that is, it only removes rows that
are identical in all columns to some other row. So if there were
some rows that were identical except for group_id, you'd get the
behavior you described.

BTW, I'm not sure I believe this is actually the same query you
ran. The presence of the "GROUP BY ar.title" clause should have
provoked errors about ungrouped columns. If this is an exact
copy of what you did, what Postgres version is this?

regards, tom lane

#3David Rio Deiros
driodeiros@gmail.com
In reply to: Tom Lane (#2)
Re: distinct not working in a multiple join

On Thu, Feb 09, 2006 at 07:20:19PM -0500, Tom Lane wrote:

David Rio Deiros <driodeiros@gmail.com> writes:

I have some issues with the query attached at the end of this email.
If I run that query I got this output ( I have removed some of the
fields) despite the distinct clause:

QC Q&A | www.xxx.com | 44281
QC Q&A | www.xxx.com | 44281
WhyMAX? | | 44285

Since you removed some fields, no one can tell if this output
is wrong or not.

Tom,

Thanks for the answer and apologizes for the lack of information.
I think you found the problem already (see bellow). In anycase, just
to clarify:

This is the information about the machine/OS/Psql version:

1. Psql 8.0.4 - Linux 2.6.13

Now, This is the first query I tried:

SELECT
distinct ar.title,
ar.raw_data,
ar.upload_dt,
ar.artifact_id,
g.group_id,
acl.read
FROM
artifact_acl acl,
artifacts a,
artifact_revisions ar,
revisions_to_types rt,
artifact_types at,
groups g
WHERE
a.expire_dt > NOW() and
acl.artifact_id = a.artifact_id and
a.published_revision = ar.revision_id and
ar.revision_id = rt.revision_id and
rt.type_id = at.type_id and
acl.group_id = g.group_id and
a.suppress = false and
at.is_resource = true and
(
acl.group_id = (
select
group_id
from
groups
where
group_nm = 'Marketing'
)
OR
acl.group_id = (
select
group_id
from
groups
where
group_nm = 'Communicators'
)
) and
acl.group_id IN (4,17,54,2,1,123) and
acl.read = true and
((g.back_or_front = 'front') or
(g.group_nm = 'PR Admin'))
ORDER BY
ar.upload_dt DESC
LIMIT 3;

and here you have the output:

-[ RECORD 1 ]---------------------------
title | QC Q&A
raw_data | www.com
upload_dt | 2006-02-09 11:15:04.724525
artifact_id | 44281
group_id | 2
read | t
-[ RECORD 2 ]---------------------------
title | QC Q&A
raw_data | www.com
upload_dt | 2006-02-09 11:15:04.724525
artifact_id | 44281
group_id | 54
read | t
-[ RECORD 3 ]---------------------------
title | WhyMAX?
raw_data |
upload_dt | 2006-02-09 09:25:27.717663
artifact_id | 44061
group_id | 2
read | t

Now, I tried this query, which is the first one but removing the g.group_id
from the select clause. This is the output:

-[ RECORD 1 ]-----------------------------
title | QC Q&A
raw_data | www.com
upload_dt | 2006-02-09 11:15:04.724525
artifact_id | 44281
read | t
-[ RECORD 2 ]----------------------------
title | WhyMAX?
raw_data |
upload_dt | 2006-02-09 09:25:27.717663
artifact_id | 44061
read | t
-[ RECORD 3 ]---------------------------
title | Business Assets-test
raw_data | corpcomm.com/
upload_dt | 2006-02-08 15:58:06.81578
artifact_id | 44280
read | t

Which is the desired output.

SELECT
distinct ar.title,
ar.raw_data,
ar.upload_dt,
ar.artifact_id,
g.group_id,
acl.read
FROM

The way you formatted that makes me wonder if you think that the
DISTINCT applies only to the first column. It does not, it applies
to all the columns together --- that is, it only removes rows that
are identical in all columns to some other row. So if there were
some rows that were identical except for group_id, you'd get the
behavior you described.

Yes, that was the reason, I thought that distinct was only applying
to the first column.

BTW, I'm not sure I believe this is actually the same query you
ran. The presence of the "GROUP BY ar.title" clause should have
provoked errors about ungrouped columns. If this is an exact
copy of what you did, what Postgres version is this?

It wasn't the same query. Forget about the first email, in this email
you have the queries I launched (sorry again about that).

Now I have to redefine my query because I want to get the second
output but keeping the group_id. Ideas and suggestions are welcome.

Thanks again your help,

David

#4Stephen Frost
sfrost@snowman.net
In reply to: David Rio Deiros (#3)
Re: distinct not working in a multiple join

* David Rio Deiros (driodeiros@gmail.com) wrote:

Now I have to redefine my query because I want to get the second
output but keeping the group_id. Ideas and suggestions are welcome.

You might want to look at 'distinct on'.

Stephen

#5David Rio Deiros
driodeiros@gmail.com
In reply to: Stephen Frost (#4)
Re: distinct not working in a multiple join

On Thu, Feb 09, 2006 at 09:51:13PM -0500, Stephen Frost wrote:

* David Rio Deiros (driodeiros@gmail.com) wrote:

Now I have to redefine my query because I want to get the second
output but keeping the group_id. Ideas and suggestions are welcome.

You might want to look at 'distinct on'.

I am reading about it right now. It sounds that it will make it.

Thanks guys for your help,

David