SQL Help: Multiple LEFT OUTER JOINs

Started by Bill Moseleyover 20 years ago8 messagesgeneral
Jump to latest
#1Bill Moseley
moseley@hank.org

I need to generate a table of teachers, and the count of classes they
taught in the past and are scheduled to teach in the future.

id | last_name | total future_class_count | past_class_count
-----+--------------+----------+--------------------+-----------------
3 | Smith | 12 | 3 | 9
8 | Jones | 0 | 0 | 0

table person
id
last_name

table class
id
class_time

table role
id
role_name -- for limiting to a type of teacher

-- link tables

table person_role
person references person
role references role

-- This table ties a person to a class, thus making them an instructor

table instructors
person references person
class references class

I can easily get instructors and the total count of their classes:

SELECT person.id AS id, last_name, count(instructors.class)

FROM person LEFT OUTER JOIN instructors ON (person.id = instructors.person),
person_role

WHERE person_role.person = person.id
AND person_role.role = 3 -- limit to this type of teacher

GROUP BY id, last_name;

Here's where I'm missing something. Trying to do an outer join on
to bring in the class row with its class_time column:

SELECT person.id AS id, last_name,
count(instructors.class) as total,
sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as future_class_count,
sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as past_class_count

FROM (person LEFT OUTER JOIN instructors ON (person.id = instructors.person)) t
LEFT OUTER JOIN class on ( t.class = class.id ),
person_role

WHERE person_role.person = person.id
AND person_role.role = 3

GROUP BY person.id, last_name;

--
Bill Moseley
moseley@hank.org

#2Bruno Wolff III
bruno@wolff.to
In reply to: Bill Moseley (#1)
Re: SQL Help: Multiple LEFT OUTER JOINs

On Mon, Nov 21, 2005 at 05:40:10 -0800,
Bill Moseley <moseley@hank.org> wrote:

Here's where I'm missing something. Trying to do an outer join on
to bring in the class row with its class_time column:

You don't say exactly why you are having a problem with this, but I think you
would be better off doing an inner join between instructors and class and
then do an outer join of that result to person.

Show quoted text

SELECT person.id AS id, last_name,
count(instructors.class) as total,
sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as future_class_count,
sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as past_class_count

FROM (person LEFT OUTER JOIN instructors ON (person.id = instructors.person)) t
LEFT OUTER JOIN class on ( t.class = class.id ),
person_role

WHERE person_role.person = person.id
AND person_role.role = 3

GROUP BY person.id, last_name;

--
Bill Moseley
moseley@hank.org

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#3Bill Moseley
moseley@hank.org
In reply to: Bruno Wolff III (#2)
Re: SQL Help: Multiple LEFT OUTER JOINs

On Mon, Nov 21, 2005 at 11:45:34AM -0600, Bruno Wolff III wrote:

On Mon, Nov 21, 2005 at 05:40:10 -0800,
Bill Moseley <moseley@hank.org> wrote:

Here's where I'm missing something. Trying to do an outer join on
to bring in the class row with its class_time column:

You don't say exactly why you are having a problem with this, but I think you
would be better off doing an inner join between instructors and class and
then do an outer join of that result to person.

Sorry, I thought I was so far off it might be obvious. I suspect I'm
making the query harder than it really is.

This query just eats CPU and doesn't seem to finish, but I didn't let
it run more than a minute (which is forever as far as I'm concerned).
The tables are not that big (10,000 people, 1500 classes)

SELECT person.id AS id, last_name,
count(instructors.class) as total,
sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as future_class_count,
sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as past_class_count

FROM (person LEFT OUTER JOIN instructors ON (person.id = instructors.person)) t
LEFT OUTER JOIN class on ( t.class = class.id ),
person_role

WHERE person_role.person = person.id
AND person_role.role = 3

GROUP BY person.id, last_name;

Well, I'm stabbing in the dark now. You mean like:

SELECT person.id AS id, first_name, last_name,
count(instructors.class) as total_classes,
sum (CASE WHEN class.id IS NULL THEN 0 ELSE 1 END) as total_class_count, -- which is better?
sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as future_class_count,
sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as past_class_count

FROM (class INNER JOIN instructors ON ( class.id = instructors.class )) t
LEFT OUTER JOIN person ON ( person.id = t.person ),
person_role

WHERE person_role.person = person.id
AND person_role.role = 3

GROUP BY person.id, first_name, last_name;

Still eats CPU.

GroupAggregate (cost=1750458.67..1890662.91 rows=10212 width=39)
-> Sort (cost=1750458.67..1767958.67 rows=7000000 width=39)
Sort Key: person.id, person.first_name, person.last_name
-> Nested Loop (cost=111.27..140276.35 rows=7000000 width=39)
-> Nested Loop (cost=91.27..256.35 rows=7000 width=35)
-> Hash Join (cost=71.27..96.35 rows=7 width=31)
Hash Cond: ("outer".id = "inner"."class")
-> Seq Scan on "class" (cost=0.00..20.00 rows=1000 width=12)
-> Hash (cost=71.25..71.25 rows=7 width=27)
-> Nested Loop (cost=3.20..71.25 rows=7 width=27)
-> Hash Join (cost=3.20..30.77 rows=7 width=12)
Hash Cond: ("outer".person = "inner".person)
-> Seq Scan on instructors (cost=0.00..20.00 rows=1000 width=8)
-> Hash (cost=3.01..3.01 rows=75 width=4)
-> Index Scan using person_role_role_index on person_role (cost=0.00..3.01 rows=75 width=4)
Index Cond: (role = 3)
-> Index Scan using person_pkey on person (cost=0.00..5.77 rows=1 width=23)
Index Cond: ("outer".person = person.id)
-> Materialize (cost=20.00..30.00 rows=1000 width=4)
-> Seq Scan on instructors (cost=0.00..20.00 rows=1000 width=4)
-> Materialize (cost=20.00..30.00 rows=1000 width=4)
-> Seq Scan on "class" (cost=0.00..20.00 rows=1000 width=4)
(22 rows)

--
Bill Moseley
moseley@hank.org

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Bill Moseley
moseley@hank.org

#4John McCawley
nospam@hardgeus.com
In reply to: Bill Moseley (#3)
Re: SQL Help: Multiple LEFT OUTER JOINs

It looks to me like your problem is that weird area where you alias your
inner join as "t" and thenn inner join based on this alias. You're
getting a cartesian product somewhere, as evidenced by the
"rows=7000000" in your explain.
I already deleted the old mail with your table structure, but try
changing that FROM section to:

class INNER JOIN instructors ON class.id = instructors.class
LEFT OUTER JOIN person ON person.id = (whatevertable).person

Bill Moseley wrote:

Show quoted text

Well, I'm stabbing in the dark now. You mean like:

SELECT person.id AS id, first_name, last_name,
count(instructors.class) as total_classes,
sum (CASE WHEN class.id IS NULL THEN 0 ELSE 1 END) as
total_class_count, -- which is better?
sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as
future_class_count,
sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END)
as past_class_count

FROM (class INNER JOIN instructors ON ( class.id =
instructors.class )) t
LEFT OUTER JOIN person ON ( person.id = t.person ),
person_role
WHERE person_role.person = person.id
AND person_role.role = 3

GROUP BY person.id, first_name, last_name;

Still eats CPU.

GroupAggregate (cost=1750458.67..1890662.91 rows=10212 width=39)
-> Sort (cost=1750458.67..1767958.67 rows=7000000 width=39)
Sort Key: person.id, person.first_name, person.last_name
-> Nested Loop (cost=111.27..140276.35 rows=7000000 width=39)
-> Nested Loop (cost=91.27..256.35 rows=7000 width=35)
-> Hash Join (cost=71.27..96.35 rows=7 width=31)
Hash Cond: ("outer".id = "inner"."class")
-> Seq Scan on "class" (cost=0.00..20.00
rows=1000 width=12)
-> Hash (cost=71.25..71.25 rows=7 width=27)
-> Nested Loop (cost=3.20..71.25
rows=7 width=27)
-> Hash Join (cost=3.20..30.77
rows=7 width=12)
Hash Cond: ("outer".person
= "inner".person)
-> Seq Scan on
instructors (cost=0.00..20.00 rows=1000 width=8)
-> Hash (cost=3.01..3.01
rows=75 width=4)
-> Index Scan using
person_role_role_index on person_role (cost=0.00..3.01 rows=75 width=4)
Index Cond:
(role = 3)
-> Index Scan using person_pkey
on person (cost=0.00..5.77 rows=1 width=23)
Index Cond:
("outer".person = person.id)
-> Materialize (cost=20.00..30.00 rows=1000
width=4)
-> Seq Scan on instructors
(cost=0.00..20.00 rows=1000 width=4)
-> Materialize (cost=20.00..30.00 rows=1000 width=4)
-> Seq Scan on "class" (cost=0.00..20.00
rows=1000 width=4)
(22 rows)

#5John McCawley
nospam@hardgeus.com
In reply to: John McCawley (#4)
Re: SQL Help: Multiple LEFT OUTER JOINs

I just noticed, also goofy is your ", person_role" in your from with no
criteria. I would generally put the "person_role.person = person.id" as
an INNER JOIN, and then only have the "person_role.role=3" in the
where. It doesn't look like that's the specific problem, but I
generally find that kind of mixed syntax muddles a query.

John McCawley wrote:

Show quoted text

Well, I'm stabbing in the dark now. You mean like:

SELECT person.id AS id, first_name, last_name,
count(instructors.class) as total_classes,
sum (CASE WHEN class.id IS NULL THEN 0 ELSE 1 END) as
total_class_count, -- which is better?
sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END)
as future_class_count,
sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END)
as past_class_count

FROM (class INNER JOIN instructors ON ( class.id =
instructors.class )) t
LEFT OUTER JOIN person ON ( person.id = t.person ),
person_role
WHERE person_role.person = person.id
AND person_role.role = 3

GROUP BY person.id, first_name, last_name;

Still eats CPU.

GroupAggregate (cost=1750458.67..1890662.91 rows=10212 width=39)
-> Sort (cost=1750458.67..1767958.67 rows=7000000 width=39)
Sort Key: person.id, person.first_name, person.last_name
-> Nested Loop (cost=111.27..140276.35 rows=7000000 width=39)
-> Nested Loop (cost=91.27..256.35 rows=7000 width=35)
-> Hash Join (cost=71.27..96.35 rows=7 width=31)
Hash Cond: ("outer".id = "inner"."class")
-> Seq Scan on "class" (cost=0.00..20.00
rows=1000 width=12)
-> Hash (cost=71.25..71.25 rows=7 width=27)
-> Nested Loop (cost=3.20..71.25
rows=7 width=27)
-> Hash Join
(cost=3.20..30.77 rows=7 width=12)
Hash Cond:
("outer".person = "inner".person)
-> Seq Scan on
instructors (cost=0.00..20.00 rows=1000 width=8)
-> Hash
(cost=3.01..3.01 rows=75 width=4)
-> Index Scan
using person_role_role_index on person_role (cost=0.00..3.01 rows=75
width=4)
Index Cond:
(role = 3)
-> Index Scan using
person_pkey on person (cost=0.00..5.77 rows=1 width=23)
Index Cond:
("outer".person = person.id)
-> Materialize (cost=20.00..30.00 rows=1000
width=4)
-> Seq Scan on instructors
(cost=0.00..20.00 rows=1000 width=4)
-> Materialize (cost=20.00..30.00 rows=1000 width=4)
-> Seq Scan on "class" (cost=0.00..20.00
rows=1000 width=4)
(22 rows)

#6Bill Moseley
moseley@hank.org
In reply to: John McCawley (#5)
Re: SQL Help: Multiple LEFT OUTER JOINs

On Mon, Nov 21, 2005 at 03:25:56PM -0600, John McCawley wrote:

I just noticed, also goofy is your ", person_role" in your from with no
criteria. I would generally put the "person_role.person = person.id" as
an INNER JOIN, and then only have the "person_role.role=3" in the
where. It doesn't look like that's the specific problem, but I
generally find that kind of mixed syntax muddles a query.

I need to read more about the FROM clause, as I can't seem to get what
you are suggesting.

The now working query (thanks to you!) is:

SELECT person.id AS id, first_name, last_name,
count(instructors.class) as total_classes,
-- sum (CASE WHEN class.id IS NULL THEN 0 ELSE 1 END) as total_class_count,
sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as future_class_count,
sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as past_class_count

FROM class INNER JOIN instructors ON class.id = instructors.class
LEFT OUTER JOIN person ON person.id = instructors.person,
person_role

WHERE person_role.person = person.id
AND person_role.role = 2

GROUP BY person.id, first_name, last_name

ORDER BY future_class_count;

Not sure how to construct that. Not this, as it returns odd counts

SELECT person.id AS id,
count(instructors.class) as total_classes,
sum (CASE WHEN class.id IS NULL THEN 0 ELSE 1 END) as total_class_count, -- which is better?
sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as future_class_count,
sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as past_class_count

FROM class INNER JOIN instructors ON class.id = instructors.class
LEFT OUTER JOIN person ON person.id = instructors.person
INNER JOIN person_role ON person_role.person = person.id

WHERE person_role.role = 2

GROUP BY person.id, first_name, last_name

ORDER BY total_classes;

id | total_classes | total_class_count | future_class_count | past_class_count
-----+---------------+-------------------+--------------------+------------------
90 | 1 | 1 | 0 | 1
98 | 1 | 1 | 0 | 1
92 | 1 | 1 | 0 | 1
123 | 1 | 1 | 0 | 1
122 | 1 | 1 | 0 | 1
121 | 2 | 2 | 0 | 2
66 | 2 | 2 | 0 | 2
74 | 2 | 2 | 0 | 2
56 | 2 | 2 | 0 | 2
85 | 2 | 2 | 0 | 2
119 | 2 | 2 | 0 | 2
41 | 2 | 2 | 0 | 2
33 | 2 | 2 | 0 | 2
65 | 2 | 2 | 0 | 2
105 | 3 | 3 | 0 | 3
83 | 3 | 3 | 0 | 3
102 | 3 | 3 | 0 | 3
32 | 4 | 4 | 0 | 4
71 | 4 | 4 | 0 | 4
70 | 4 | 4 | 0 | 4
14 | 4 | 4 | 0 | 4
29 | 4 | 4 | 0 | 4
77 | 4 | 4 | 0 | 4
86 | 4 | 4 | 0 | 4
50 | 4 | 4 | 0 | 4
107 | 4 | 4 | 0 | 4
8 | 4 | 4 | 0 | 4
114 | 4 | 4 | 0 | 4
42 | 4 | 4 | 0 | 4
82 | 4 | 4 | 0 | 4
28 | 4 | 4 | 0 | 4
17 | 4 | 4 | 0 | 4
52 | 4 | 4 | 0 | 4
9 | 4 | 4 | 0 | 4
31 | 4 | 4 | 0 | 4

--
Bill Moseley
moseley@hank.org

#7Bill Moseley
moseley@hank.org
In reply to: Bill Moseley (#6)
Re: SQL Help: Multiple LEFT OUTER JOINs

On Mon, Nov 21, 2005 at 02:06:35PM -0800, Bill Moseley wrote:

The now working query (thanks to you!) is:

No that doesn't work. It's dropping the people that have never
been assigned a class to teach (i.e. don't have a row in the
"instructors" link table).

FROM class INNER JOIN instructors ON class.id = instructors.class
LEFT OUTER JOIN person ON person.id = instructors.person,
person_role

I really seem to need the multiple left outer join. This works:

SELECT person.id AS id, last_name,
person_role.role AS role,
count(instructors.class),
sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as future_class_count,
sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as past_class_count

FROM person LEFT OUTER JOIN instructors ON (person.id = instructors.person)
LEFT OUTER JOIN class ON (instructors.class = class.id),
person_role

WHERE person_role.person = person.id
-- AND person_role.role = 2

GROUP BY person.id, last_name, person_role.role;

I'm not clear how to move that "person_role.person = person.id" into
the FROM statement. Does it matter?

--
Bill Moseley
moseley@hank.org

#8John McCawley
nospam@hardgeus.com
In reply to: Bill Moseley (#7)
Re: SQL Help: Multiple LEFT OUTER JOINs

I'm not clear how to move that "person_role.person = person.id" into

the FROM statement. Does it matter?

This should work:

FROM person
INNER JOIN person_role ON person.id = person_role.person
LEFT OUTER JOIN instructors ON (person.id = instructors.person)
LEFT OUTER JOIN class ON (instructors.class = class.id), person_role
GROUP BY person.id, last_name, person_role.role;

The reason *I* think it matters is that I like to keep my "constant"
join clauses in the from, and my variable criteria in the where. i.e.
in your query, you always want the role that joins to the person...there
are no cases (in this query) where you want other roles, therefore I
wouldn't put it in the where. I leave my where clause free for criteria
such as "where person.id = $variable" etc. I will occasionally put
variables higher in my joins, but only if EXPLAIN ANALYZE tells me it's
faster (which rarely happens for me), but I NEVER wait until my where to
clarify the fundamental criteria whereby a table in the FROM is joined.
It's just my preference for the sake of clarity.

In other words I would:

SELECT person.last_name, person_role.role FROM
person_role INNER JOIN person ON person_role.person = person.id
WHERE person.id = $variable

rather than:

SELECT person.last_name, person_role.role
FROM
person_role, person
WHERE
person_role.person = person.id AND
person.id = $variable

Show quoted text