bug in JOIN or COUNT or ... ?

Started by The Hermit Hackerover 24 years ago7 messages

The Hermit Hacker

scrappy@hub.org

over 24 years ago

Sorry, worst Subject I've ever come up with, but this is one of those "I
haven't got a clue how to describe" emails ...

Simple query:

SELECT distinct s.gid, s.created, i.title
FROM status s LEFT JOIN images i ON (s.gid = i.gid AND i.active), personal_data pd, relationship_wanted rw
WHERE s.active AND s.status != 0
AND s.gid = 17111
AND (s.gid = pd.gid AND pd.gender = 0)
AND (s.gid = rw.gid AND rw.gender = 0 );

Produces:

gid | created | title
-------+------------------------+--------
17111 | 2000-10-19 15:20:46-04 | image1
17111 | 2000-10-19 15:20:46-04 | image2
17111 | 2000-10-19 15:20:46-04 | image3
(3 rows)

Great, what I expect ...

But:

SELECT distinct s.gid, s.created, count(i.title) AS images
FROM status s LEFT JOIN images i ON (s.gid = i.gid AND i.active), personal_data pd, relationship_wanted rw
WHERE s.active AND s.status != 0
AND s.gid = 17111
AND (s.gid = pd.gid AND pd.gender = 0)
AND (s.gid = rw.gid AND rw.gender = 0 )
GROUP BY s.gid, s.created;

Produces:

/tmp/psql.edit.70.62491: 7 lines, 353 characters.
gid | created | images
-------+------------------------+--------
17111 | 2000-10-19 15:20:46-04 | 15
(1 row)

So why is it counting 12 more images then are actually found/exist:

testdb=# select title from images where gid = 17111;
title
--------
image1
image3
image2
(3 rows)

Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: The Hermit Hacker (#1)

Re: bug in JOIN or COUNT or ... ?

The Hermit Hacker <scrappy@hub.org> writes:

So why is it counting 12 more images then are actually found/exist:

Hm. Could we see the EXPLAIN output for both of those?

regards, tom lane

The Hermit Hacker

scrappy@hub.org

over 24 years ago

In reply to: Tom Lane (#2)

Re: bug in JOIN or COUNT or ... ?

On Sat, 12 May 2001, Tom Lane wrote:

The Hermit Hacker <scrappy@hub.org> writes:

So why is it counting 12 more images then are actually found/exist:

Hm. Could we see the EXPLAIN output for both of those?

without count:

NOTICE: QUERY PLAN:

Unique (cost=8.66..8.67 rows=1 width=37)
-> Sort (cost=8.66..8.66 rows=1 width=37)
-> Nested Loop (cost=0.00..8.65 rows=1 width=37)
-> Nested Loop (cost=0.00..6.52 rows=1 width=33)
-> Nested Loop (cost=0.00..4.26 rows=1 width=29)
-> Index Scan using status_gid on status s (cost=0.00..2.23 rows=1 width=12)
-> Index Scan using images_gid on images i (cost=0.00..2.02 rows=1 width=17)
-> Index Scan using personal_data_gid on personal_data pd (cost=0.00..2.25 rows=1 width=4)
-> Index Scan using relationship_wanted_gid on relationship_wanted rw (cost=0.00..2.11 rows=1 width=4)

EXPLAIN

with count:

NOTICE: QUERY PLAN:

Unique (cost=8.68..8.69 rows=1 width=37)
-> Sort (cost=8.68..8.68 rows=1 width=37)
-> Aggregate (cost=8.66..8.67 rows=1 width=37)
-> Group (cost=8.66..8.67 rows=1 width=37)
-> Sort (cost=8.66..8.66 rows=1 width=37)
-> Nested Loop (cost=0.00..8.65 rows=1 width=37)
-> Nested Loop (cost=0.00..6.52 rows=1 width=33)
-> Nested Loop (cost=0.00..4.26 rows=1 width=29)
-> Index Scan using status_gid on status s (cost=0.00..2.23 rows=1 width=12)
-> Index Scan using images_gid on images i (cost=0.00..2.02 rows=1 width=17)
-> Index Scan using personal_data_gid on personal_data pd (cost=0.00..2.25 rows=1 width=4)
-> Index Scan using relationship_wanted_gid on relationship_wanted rw (cost=0.00..2.11 rows=1 width=4)

EXPLAIN

regards, tom lane

Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: The Hermit Hacker (#3)

Re: bug in JOIN or COUNT or ... ?

Ah, I see it: your join against relationship_wanted isn't unique.

globalmatch=# select count(*) from personal_data pd
globalmatch-# where pd.gid = 17111 AND pd.gender = 0;
count
-------
1
(1 row)

globalmatch=# select count(*) from relationship_wanted rw
globalmatch-# where rw.gid = 17111 AND rw.gender = 0;
count
-------
5
(1 row)

globalmatch=#

So that inflates the number of rows coming out of the join by 5.

regards, tom lane

Import Notes

Reply to msg id not found: Pine.BSF.4.33.0105122225230.629-100000@mobile.hub.orgReference msg id not found: Pine.BSF.4.33.0105122225230.629-100000@mobile.hub.org | Resolved by subject fallback

The Hermit Hacker

scrappy@hub.org

over 24 years ago

In reply to: Tom Lane (#4)

Re: bug in JOIN or COUNT or ... ?

On Sat, 12 May 2001, Tom Lane wrote:

Ah, I see it: your join against relationship_wanted isn't unique.

globalmatch=# select count(*) from personal_data pd
globalmatch-# where pd.gid = 17111 AND pd.gender = 0;
count
-------
1
(1 row)

globalmatch=# select count(*) from relationship_wanted rw
globalmatch-# where rw.gid = 17111 AND rw.gender = 0;
count
-------
5
(1 row)

globalmatch=#

So that inflates the number of rows coming out of the join by 5.

Okay, then I'm lost ... why wouldn't that show up without the COUNT()? I
doubt doubt your analysis, I just want to understand why ...

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: The Hermit Hacker (#5)

Re: bug in JOIN or COUNT or ... ?

The Hermit Hacker <scrappy@hub.org> writes:

So that inflates the number of rows coming out of the join by 5.

Okay, then I'm lost ... why wouldn't that show up without the COUNT()? I
doubt doubt your analysis, I just want to understand why ...

You had DISTINCT on your query, which hid the duplicated rows from you.
But that happens *after* aggregate processing, so it doesn't hide the
dups from COUNT().

regards, tom lane

The Hermit Hacker

scrappy@hub.org

over 24 years ago

In reply to: Tom Lane (#6)

Re: bug in JOIN or COUNT or ... ?

On Sat, 12 May 2001, Tom Lane wrote:

The Hermit Hacker <scrappy@hub.org> writes:

So that inflates the number of rows coming out of the join by 5.

Okay, then I'm lost ... why wouldn't that show up without the COUNT()? I
doubt doubt your analysis, I just want to understand why ...

You had DISTINCT on your query, which hid the duplicated rows from you.
But that happens *after* aggregate processing, so it doesn't hide the
dups from COUNT().

Ahhhh, okay, that makes sense ... thanks for taking the time to check it
for me ... and explaining what I was missing ...