subquery/alias question

Started by Madison Kellyover 18 years ago12 messagesgeneral
Jump to latest
#1Madison Kelly
linux@alteeve.com

Hi all,

I've read 7.2.1.3 (as short as it is) in the PgSQL docs, but don't
see what I am doing wrong... Maybe you can help?

I've got a query;

SELECT
d.dom_id,
d.dom_name,
(SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id)
AS
usr_count
FROM
domains d
ORDER BY d.dom_name ASC;

Where 'usr_count' returns the number of entries in 'users' that point
to a given entry in 'domains'. Pretty straight forward so far. The
trouble is:

SELECT
d.dom_id,
d.dom_name,
(SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id)
AS
usr_count
FROM
domains d
WHERE
usr_count > 0
ORDER BY d.dom_name ASC;

Causes the error:

ERROR: column "usr_count" does not exist

It works if I use:

SELECT
d.dom_id,
d.dom_name,
(SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id)
AS
usr_count
FROM
domains d
WHERE
(SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) > 0
ORDER BY d.dom_name ASC;

This seems terribly inefficient (and ugly), and I can't see why the
results from 'usr_count' can't be counted... I can use 'usr_count' to
sort the results...

Thanks all!

Madi

#2Michael Glaesemann
grzm@seespotcode.net
In reply to: Madison Kelly (#1)
Re: subquery/alias question

On Sep 25, 2007, at 16:59 , Madison Kelly wrote:

SELECT
d.dom_id,
d.dom_name,
(SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id)
AS
usr_count
FROM
domains d
WHERE
(SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) > 0
ORDER BY d.dom_name ASC;

Why not just use a join? Something like this would work, I should think:

select dom_id,
dom_name,
usr_count
from domains
natural join (select usr_dom_id as dom_id,
count(usr_dom_id) as usr_count
from users) u
where usr_count > 0
order by dom_name;

Michael Glaesemann
grzm seespotcode net

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Glaesemann (#2)
Re: subquery/alias question

Michael Glaesemann wrote:

On Sep 25, 2007, at 16:59 , Madison Kelly wrote:

SELECT
d.dom_id,
d.dom_name,
(SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id)
AS
usr_count
FROM
domains d
WHERE
(SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) > 0
ORDER BY d.dom_name ASC;

Why not just use a join? Something like this would work, I should think:

select dom_id,
dom_name,
usr_count
from domains
natural join (select usr_dom_id as dom_id,
count(usr_dom_id) as usr_count
from users) u
where usr_count > 0
order by dom_name;

Maybe the usr_count should be tested in a HAVING clause instead of
WHERE? And put the count(*) in the result list instead of a subselect.
That feels more natural to me anyway.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#4Michael Glaesemann
grzm@seespotcode.net
In reply to: Alvaro Herrera (#3)
Re: subquery/alias question

On Sep 25, 2007, at 17:30 , Alvaro Herrera wrote:

Michael Glaesemann wrote:

select dom_id,
dom_name,
usr_count
from domains
natural join (select usr_dom_id as dom_id,
count(usr_dom_id) as usr_count
from users) u
where usr_count > 0
order by dom_name;

Maybe the usr_count should be tested in a HAVING clause instead of
WHERE? And put the count(*) in the result list instead of a
subselect.
That feels more natural to me anyway.

I believe you'd have to write it like

select dom_id, dom_name, count(usr_dom_id) as usr_count
from domains
join users on (usr_dom_id = dom_id)
having count(usr_dom_id) > 0
order by dom_name;

I don't know how the performance would compare. I think the backend
is smart enough to know it doesn't need to perform two seq scans to
calculate count(usr_dom_id), but I wasn't sure.

Madison, how do the two queries compare with explain analyze?

Michael Glaesemann
grzm seespotcode net

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Glaesemann (#4)
Re: subquery/alias question

Michael Glaesemann <grzm@seespotcode.net> writes:

I believe you'd have to write it like

select dom_id, dom_name, count(usr_dom_id) as usr_count
from domains
join users on (usr_dom_id = dom_id)
having count(usr_dom_id) > 0
order by dom_name;

I don't know how the performance would compare. I think the backend
is smart enough to know it doesn't need to perform two seq scans to
calculate count(usr_dom_id), but I wasn't sure.

It has been smart enough for a few years now --- don't recall when
exactly, but nodeAgg.c quoth

* Perform lookups of aggregate function info, and initialize the
* unchanging fields of the per-agg data. We also detect duplicate
* aggregates (for example, "SELECT sum(x) ... HAVING sum(x) > 0"). When
* duplicates are detected, we only make an AggStatePerAgg struct for the
* first one. The clones are simply pointed at the same result entry by
* giving them duplicate aggno values.

... which in English means we just do the calculation once ...

regards, tom lane

#6Michael Glaesemann
grzm@seespotcode.net
In reply to: Tom Lane (#5)
Re: subquery/alias question

On Sep 25, 2007, at 21:44 , Tom Lane wrote:

... which in English means we just do the calculation once ...

As always, thanks, Tom, for the explanation (and Alvaro, who probably
already knew this :))

Michael Glaesemann
grzm seespotcode net

#7Madison Kelly
linux@alteeve.com
In reply to: Michael Glaesemann (#4)
Re: subquery/alias question

Michael Glaesemann wrote:

On Sep 25, 2007, at 17:30 , Alvaro Herrera wrote:

Michael Glaesemann wrote:

select dom_id,
dom_name,
usr_count
from domains
natural join (select usr_dom_id as dom_id,
count(usr_dom_id) as usr_count
from users) u
where usr_count > 0
order by dom_name;

Maybe the usr_count should be tested in a HAVING clause instead of
WHERE? And put the count(*) in the result list instead of a subselect.
That feels more natural to me anyway.

I believe you'd have to write it like

select dom_id, dom_name, count(usr_dom_id) as usr_count
from domains
join users on (usr_dom_id = dom_id)
having count(usr_dom_id) > 0
order by dom_name;

I don't know how the performance would compare. I think the backend is
smart enough to know it doesn't need to perform two seq scans to
calculate count(usr_dom_id), but I wasn't sure.

Madison, how do the two queries compare with explain analyze?

Thanks for your reply!

Unfortunately, in both cases I get the error:

nmc=> SELECT dom_id, dom_name, COUNT(usr_dom_id) AS usr_count FROM
domains JOIN users ON (usr_dom_id=dom_id) HAVING COUNT (usr_dom_id) > 0
ORDER BY dom_name;
ERROR: syntax error at or near "COUNT" at character 25
LINE 1: SELECT dom_id, dom_name COUNT(usr_dom_id) AS usr_count FROM ...

I've been struggling with some deadlines, so for now I'm using just:

SELECT d.dom_id, d.dom_name FROM domains d WHERE (SELECT COUNT(*) FROM
users u WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC;

Which gives me just the domains with at least one user under them,
but not the count. This is not ideal, and I will have to come back to it
next week. In the meantime, any idea what the GROUP BY error is? If not,
I'll read through the docs on 'GROUP'ing once I get this deadline out of
the way.

Thank you all for your help! I am sure I will have more question(s)
next week as soon as I can get back to this.

Madi

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Madison Kelly (#7)
Re: subquery/alias question

Madison Kelly wrote:

Thanks for your reply!

Unfortunately, in both cases I get the error:

nmc=> SELECT dom_id, dom_name, COUNT(usr_dom_id) AS usr_count FROM domains
JOIN users ON (usr_dom_id=dom_id) HAVING COUNT (usr_dom_id) > 0 ORDER BY
dom_name;
ERROR: syntax error at or near "COUNT" at character 25
LINE 1: SELECT dom_id, dom_name COUNT(usr_dom_id) AS usr_count FROM ...

Try to avoid missing the comma before the COUNT (and do not cheat when
cut'n pasting ...)

Also it seems you will need a GROUP BY clause:
GROUP BY dom_id, dom_name
(placed just before the HAVING clause).

--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"Some men are heterosexual, and some are bisexual, and some
men don't think about sex at all... they become lawyers" (Woody Allen)

#9Bruce Momjian
bruce@momjian.us
In reply to: Madison Kelly (#7)
Re: subquery/alias question

"Madison Kelly" <linux@alteeve.com> writes:

SELECT d.dom_id, d.dom_name FROM domains d WHERE (SELECT COUNT(*) FROM users u
WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC;

Which gives me just the domains with at least one user under them, but not
the count. This is not ideal, and I will have to come back to it next week. In
the meantime, any idea what the GROUP BY error is? If not, I'll read through
the docs on 'GROUP'ing once I get this deadline out of the way.

I think you just want simply:

SELECT dom_id, dom_name, count(*)
FROM users
JOIN domains ON (usr_dom_id=dom_id)
GROUP BY dom_id, dom_nmae
ORDER BY dom_name

You don't actually need the HAVING (though it wouldn't do any harm either)
since only domains which match a user will come out of the join anyways.

You can also write it using a subquery instead of a join

SELECT *
FROM (
SELECT dom_id, dom_name,
(SELECT count(*) FROM users WHERE user_dom_id = dom_id) as nusers
FROM domains
) as subq
WHERE nusers > 0
ORDER BY dom_name

But that will perform worse in many cases.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#10Madison Kelly
linux@alteeve.com
In reply to: Alvaro Herrera (#8)
Solved! Was: (subquery/alias question)

Alvaro Herrera wrote:

Madison Kelly wrote:

Thanks for your reply!

Unfortunately, in both cases I get the error:

nmc=> SELECT dom_id, dom_name, COUNT(usr_dom_id) AS usr_count FROM domains
JOIN users ON (usr_dom_id=dom_id) HAVING COUNT (usr_dom_id) > 0 ORDER BY
dom_name;
ERROR: syntax error at or near "COUNT" at character 25
LINE 1: SELECT dom_id, dom_name COUNT(usr_dom_id) AS usr_count FROM ...

Try to avoid missing the comma before the COUNT (and do not cheat when
cut'n pasting ...)

Also it seems you will need a GROUP BY clause:
GROUP BY dom_id, dom_name
(placed just before the HAVING clause).

Bingo!

Now to answer the performance questions (using my actual queries,
unedited so they are a little longer):

-=-=-=-=-=-
nmc=> EXPLAIN ANALYZE SELECT dom_id, dom_name, dom_note,
COUNT(usr_dom_id) AS usr_count FROM domains JOIN users ON
(usr_dom_id=dom_id) GROUP BY dom_id, dom_name, dom_note HAVING COUNT
(usr_dom_id) > 0 ORDER BY dom_name;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
Sort (cost=10.70..10.78 rows=31 width=72) (actual time=2.107..2.133
rows=17 loops=1)
Sort Key: domains.dom_name
-> HashAggregate (cost=9.39..9.93 rows=31 width=72) (actual
time=1.899..1.956 rows=17 loops=1)
Filter: (count(usr_dom_id) > 0)
-> Hash Join (cost=7.20..9.00 rows=31 width=72) (actual
time=0.942..1.411 rows=96 loops=1)
Hash Cond: ("outer".dom_id = "inner".usr_dom_id)
-> Seq Scan on domains (cost=0.00..1.31 rows=31
width=68) (actual time=0.227..0.321 rows=31 loops=1)
-> Hash (cost=6.96..6.96 rows=96 width=4) (actual
time=0.673..0.673 rows=96 loops=1)
-> Seq Scan on users (cost=0.00..6.96 rows=96
width=4) (actual time=0.010..0.371 rows=96 loops=1)
Total runtime: 2.454 ms
(10 rows)
-=-=-=-=-=-

Versus:

-=-=-=-=-=-
nmc=> EXPLAIN ANALYZE SELECT d.dom_id, d.dom_name, d.dom_note, (SELECT
COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) AS usr_count FROM
domains d WHERE (SELECT COUNT(*) FROM users u WHERE
u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
Sort (cost=297.37..297.39 rows=10 width=68) (actual
time=10.171..10.196 rows=17 loops=1)
Sort Key: dom_name
-> Seq Scan on domains d (cost=0.00..297.20 rows=10 width=68)
(actual time=0.508..10.013 rows=17 loops=1)
Filter: ((subplan) > 0)
SubPlan
-> Aggregate (cost=7.21..7.21 rows=1 width=0) (actual
time=0.203..0.204 rows=1 loops=31)
-> Seq Scan on users u (cost=0.00..7.20 rows=1
width=0) (actual time=0.127..0.189 rows=3 loops=31)
Filter: (usr_dom_id = $0)
-> Aggregate (cost=7.21..7.21 rows=1 width=0) (actual
time=0.184..0.186 rows=1 loops=17)
-> Seq Scan on users u (cost=0.00..7.20 rows=1
width=0) (actual time=0.058..0.164 rows=6 loops=17)
Filter: (usr_dom_id = $0)
Total runtime: 10.593 ms
(12 rows)
-=-=-=-=-=-

So using the JOIN you all helped me with, the query returns in 2.454
ms compared to my early query of 10.593 ms!

I have not yet looked into any indexing either. I am waiting until
the program is done and then will go back and review queries to look for
bottlenecks.

Thanks to all of you!!

Madi

#11Michael Glaesemann
grzm@seespotcode.net
In reply to: Madison Kelly (#7)
Re: subquery/alias question

On Sep 26, 2007, at 7:41 , Madison Kelly wrote:

Unfortunately, in both cases I get the error:

Um, the two cases could not be giving the same error as they don't
both contain the syntax that the error is complaining about: the
first case uses count in a subquery so it couldn't throw this exact
error.

nmc=> SELECT dom_id, dom_name, COUNT(usr_dom_id) AS usr_count FROM
domains JOIN users ON (usr_dom_id=dom_id) HAVING COUNT (usr_dom_id)

0 ORDER BY dom_name;

ERROR: syntax error at or near "COUNT" at character 25
LINE 1: SELECT dom_id, dom_name COUNT(usr_dom_id) AS usr_count
FROM ...

The error message doesn't match the query you've provided. Note that
in the line marked LINE 1, there's no comma after dom_name, which I
assume is what the server is complaining about. However, the query
you show *does* have this comma. Something isn't right. Is this an
exact copy and paste from psql?

I've been struggling with some deadlines, so for now I'm using just:

SELECT d.dom_id, d.dom_name FROM domains d WHERE (SELECT COUNT(*)
FROM users u WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC;

Which gives me just the domains with at least one user under
them, but not the count. This is not ideal, and I will have to come
back to it next week. In the meantime, any idea what the GROUP BY
error is?

Ah. You haven't actually shown us a GROUP BY error. A GROUP BY clause
is needed when you've got columns that aren't included in the
aggregate (COUNT in this case), e.g.,

select dom_id,
dom_name,
usr_count
from domains
natural join (select usr_dom_id as dom_id,
count(usr_dom_id) as usr_count
from users
group by dom_id) u
where usr_count > 0
order by dom_name;

select dom_id, dom_name, count(usr_dom_id) as usr_count
from domains
join users on (usr_dom_id = dom_id)
group by dom_id, dom_name
having count(usr_dom_id) > 0
order by dom_name;

Michael Glaesemann
grzm seespotcode net

#12Madison Kelly
linux@alteeve.com
In reply to: Bruce Momjian (#9)
Re: subquery/alias question

Gregory Stark wrote:

"Madison Kelly" <linux@alteeve.com> writes:

SELECT d.dom_id, d.dom_name FROM domains d WHERE (SELECT COUNT(*) FROM users u
WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC;

Which gives me just the domains with at least one user under them, but not
the count. This is not ideal, and I will have to come back to it next week. In
the meantime, any idea what the GROUP BY error is? If not, I'll read through
the docs on 'GROUP'ing once I get this deadline out of the way.

I think you just want simply:

SELECT dom_id, dom_name, count(*)
FROM users
JOIN domains ON (usr_dom_id=dom_id)
GROUP BY dom_id, dom_nmae
ORDER BY dom_name

You don't actually need the HAVING (though it wouldn't do any harm either)
since only domains which match a user will come out of the join anyways.

You can also write it using a subquery instead of a join

SELECT *
FROM (
SELECT dom_id, dom_name,
(SELECT count(*) FROM users WHERE user_dom_id = dom_id) as nusers
FROM domains
) as subq
WHERE nusers > 0
ORDER BY dom_name

But that will perform worse in many cases.

You are right, the 'HAVING' clause does seem to be redundant. I removed
it and ran several 'EXPLAIN ANALYZE's on it with and without the
'HAVING' clause and found no perceivable difference. I removed the
'HAVING' clause anyway, since I like to keep queries as minimal as possible.

Thank you!

Madi