Tricky join question

Started by Tim Tassonisover 19 years ago8 messagesgeneral
Jump to latest
#1Tim Tassonis
timtas@cubic.ch

Hi all

I have a join problem that seems to be too difficult for me to solve:

I have:

table person
id integer,
name varchar(32)

data:

1,"Jack"
2,"Jill"
3,"Bob"

table course
id integer,
name varchar(32)

data:

1,"SQL Beginner"
2,"SQL Advanced"

table person_course
person_id number,
course_id number

data:

(currently empty)

Now, I would like to know for every person the courses they have taken.

In mysql, the following statement:

SELECT c.id, c.name, pc.person_id
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
right outer join course as c on pc.course_id = c.id
where p.id = 2 order by 1;

will get me the following result:

+---+-----------------------------------------+----------+
| id| name                                    | person_id|
+---+-----------------------------------------+----------+
| 1 | SQL Beginner                            |          |
| 2 | SQL Advanced                            |          |
+---+-----------------------------------------+----------+

Can I get Postgres to give me the same result somehow? The above
statement will return nothing at all under postgres.

Bye
Tim

#2A. Kretschmer
andreas.kretschmer@schollglas.com
In reply to: Tim Tassonis (#1)
Re: Tricky join question

am Fri, dem 22.12.2006, um 12:12:06 +0100 mailte Tim Tassonis folgendes:

Hi all

I have a join problem that seems to be too difficult for me to solve:

I have:

table person
id integer,
name varchar(32)

data:

1,"Jack"
2,"Jill"
3,"Bob"

table course
id integer,
name varchar(32)

data:

1,"SQL Beginner"
2,"SQL Advanced"

table person_course
person_id number,
course_id number

data:

First, you should use referential integrity:

test=# create table person(id int primary key, name text);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "person_pkey" for table "person"
CREATE TABLE
test=# create table course(id int primary key, name text);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "course_pkey" for table "course"
CREATE TABLE
test=# create table person_course(person_id int references person, course_id int references course );
CREATE TABLE

(currently empty)

Okay, i insert some data:

test=# insert into person_course values (1,1);
INSERT 0 1
test=# insert into person_course values (3,1);
INSERT 0 1
test=# insert into person_course values (3,2);
INSERT 0 1

Now, I would like to know for every person the courses they have taken.

Similar to your result:

test=# select b.id, b.name, c.id from person_course a left join course b on a.course_id=b.id left join person c on a.person_id=c.id;
id | name | id
----+--------------+----
1 | SQL Beginner | 1
1 | SQL Beginner | 3
2 | SQL Advanced | 3
(3 rows)

In my opinion better:

test=# select c.id, c.name, b.name from person_course a left join course b on a.course_id=b.id left join person c on a.person_id=c.id;
id | name | name
----+------+--------------
1 | Jack | SQL Beginner
3 | Bob | SQL Beginner
3 | Bob | SQL Advanced
(3 rows)

Please, read more about referential integrity.

Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47215, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net

#3Tim Tassonis
timtas@cubic.ch
In reply to: A. Kretschmer (#2)
Re: Tricky join question

Hi Andreas

First, you should use referential integrity:

I do, that is not the point. It was a simplified data model. Of course I
have primary keys and stuff, but they don't affect join behaviour at all.

test=# create table person(id int primary key, name text);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "person_pkey" for table "person"
CREATE TABLE
test=# create table course(id int primary key, name text);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "course_pkey" for table "course"
CREATE TABLE
test=# create table person_course(person_id int references person, course_id int references course );
CREATE TABLE

Well, you shouldn't, thats not my problem.

Now, I would like to know for every person the courses they have taken.

Similar to your result:

test=# select b.id, b.name, c.id from person_course a left join course b on a.course_id=b.id left join person c on a.person_id=c.id;
id | name | id
----+--------------+----
1 | SQL Beginner | 1
1 | SQL Beginner | 3
2 | SQL Advanced | 3
(3 rows)

This is absolutely not what I want. I want a row for every person and
every course, regardless whether the person has taken the course or not.
If the person has not taken the course, I want a null value in the
person id column:

SELECT c.id, c.name, pc.person_id
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
right outer join course as c on pc.course_id = c.id
where p.id = 2 order by 1;

+---+-----------------------------------------+----------+
| id| name                                    | person_id|
+---+-----------------------------------------+----------+
| 1 | SQL Beginner                            |          |
| 2 | SQL Advanced                            |          |
+---+-----------------------------------------+----------+

Note here that I restrict my select to the person with the ID 2. Since
this person has not taken any course, the person_id is null. If I leave
the restriction on the person, I get person times courses rows, the
person_id only filled when a person has actually taken a course.

With the rows you added person_course and without restrictin to a
specific person, the result of your query should be:

+---+-----------------------------------------+------------+
| id| name                                    | person_id  |
+---+-----------------------------------------+------------+
| 1 | SQL Beginner                            |  1         |
| 1 | SQL Beginner                            |            |
| 1 | SQL Beginner                            |  3         |
| 2 | SQL Advanced                            |            |
| 2 | SQL Advanced                            |            |
| 2 | SQL Advanced                            |  3         |
+---+-----------------------------------------+------------+

In mysql, you get this with the following clause:

SELECT c.id, c.name, pc.person_id
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
right outer join course as c on pc.course_id = c.id
order by 1;

Please, read more about referential integrity.

Thanks, but I already have read a lot about it 14 years ago.

Bye
Tim

#4Martijn van Oosterhout
kleptog@svana.org
In reply to: Tim Tassonis (#3)
Re: Tricky join question

Hi,

Without restriction you're getting:

On Fri, Dec 22, 2006 at 02:55:56PM +0100, Tim Tassonis wrote:

+---+-----------------------------------------+------------+
| id| name                                    | person_id  |
+---+-----------------------------------------+------------+
| 1 | SQL Beginner                            |  1         |
| 1 | SQL Beginner                            |            |
| 1 | SQL Beginner                            |  3         |
| 2 | SQL Advanced                            |            |
| 2 | SQL Advanced                            |            |
| 2 | SQL Advanced                            |  3         |
+---+-----------------------------------------+------------+

There are no rows in the table with person_id=2, so PostgreSQL is
returning the correct result (no rows). Seems like a bug in MySQL.

In mysql, you get this with the following clause:

SELECT c.id, c.name, pc.person_id
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
right outer join course as c on pc.course_id = c.id
order by 1;

I think what you want is to apply to restriction on person earlier,
maybe:

SELECT c.id, c.name, pc.person_id
FROM person as p
left outer join person_course as pc on (p.id = pc.person_id and p.id = 2)
right outer join course as c on pc.course_id = c.id
order by 1;

Hope this helps,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#5brian
brian@zijn-digital.com
In reply to: Tim Tassonis (#3)
Re: Tricky join question

Tim Tassonis wrote:

Hi Andreas

First, you should use referential integrity:

I do, that is not the point. It was a simplified data model. Of course I
have primary keys and stuff, but they don't affect join behaviour at all.

test=# create table person(id int primary key, name text);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"person_pkey" for table "person"
CREATE TABLE
test=# create table course(id int primary key, name text);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"course_pkey" for table "course"
CREATE TABLE
test=# create table person_course(person_id int references person,
course_id int references course );
CREATE TABLE

Well, you shouldn't, thats not my problem.

Now, I would like to know for every person the courses they have taken.

Similar to your result:

test=# select b.id, b.name, c.id from person_course a left join course
b on a.course_id=b.id left join person c on a.person_id=c.id;
id | name | id
----+--------------+----
1 | SQL Beginner | 1
1 | SQL Beginner | 3
2 | SQL Advanced | 3
(3 rows)

This is absolutely not what I want. I want a row for every person and
every course, regardless whether the person has taken the course or not.
If the person has not taken the course, I want a null value in the
person id column:

SELECT c.id, c.name, pc.person_id
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
right outer join course as c on pc.course_id = c.id
where p.id = 2 order by 1;

+---+-----------------------------------------+----------+
| id| name                                    | person_id|
+---+-----------------------------------------+----------+
| 1 | SQL Beginner                            |          |
| 2 | SQL Advanced                            |          |
+---+-----------------------------------------+----------+

Note here that I restrict my select to the person with the ID 2. Since
this person has not taken any course, the person_id is null. If I leave
the restriction on the person, I get person times courses rows, the
person_id only filled when a person has actually taken a course.

With the rows you added person_course and without restrictin to a
specific person, the result of your query should be:

+---+-----------------------------------------+------------+
| id| name                                    | person_id  |
+---+-----------------------------------------+------------+
| 1 | SQL Beginner                            |  1         |
| 1 | SQL Beginner                            |            |
| 1 | SQL Beginner                            |  3         |
| 2 | SQL Advanced                            |            |
| 2 | SQL Advanced                            |            |
| 2 | SQL Advanced                            |  3         |
+---+-----------------------------------------+------------+

In mysql, you get this with the following clause:

SELECT c.id, c.name, pc.person_id
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
right outer join course as c on pc.course_id = c.id
order by 1;

Please, read more about referential integrity.

Thanks, but I already have read a lot about it 14 years ago.

Bye
Tim

In all your long years of experience, perhaps you haven't come across this?

http://catb.org/~esr/faqs/smart-questions.html

If you're going to ask a question here the least you could do is meet us
half-way.

b

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tim Tassonis (#1)
Re: Tricky join question

Tim Tassonis <timtas@cubic.ch> writes:

In mysql, the following statement:

SELECT c.id, c.name, pc.person_id
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
right outer join course as c on pc.course_id = c.id
where p.id = 2 order by 1;

will get me the following result:

+---+-----------------------------------------+----------+
| id| name                                    | person_id|
+---+-----------------------------------------+----------+
| 1 | SQL Beginner                            |          |
| 2 | SQL Advanced                            |          |
+---+-----------------------------------------+----------+

Really? It would be unbelievably broken if so, but a quick experiment
with mysql 5.0.27 says they return an empty set same as us.

You *would* get that answer without the WHERE clause, but neither of
those rows meet the WHERE. Look at the complete join output:

regression=# SELECT *
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
right outer join course as c on pc.course_id = c.id
;
id | name | person_id | course_id | id | name
----+------+-----------+-----------+----+--------------
| | | | 1 | SQL Beginner
| | | | 2 | SQL Advanced
(2 rows)

The person-left-join-person_course join produces rows, but none of them
can match course during the right join, so they don't get through.

I think what you want might be a full join for the second step:

regression=# SELECT *
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
full outer join course as c on pc.course_id = c.id
;
id | name | person_id | course_id | id | name
----+------+-----------+-----------+----+--------------
1 | Jack | | | |
2 | Jill | | | |
3 | Bob | | | |
| | | | 1 | SQL Beginner
| | | | 2 | SQL Advanced
(5 rows)

regression=# SELECT *
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
full outer join course as c on pc.course_id = c.id
where p.id = 2 order by 1;
id | name | person_id | course_id | id | name
----+------+-----------+-----------+----+------
2 | Jill | | | |
(1 row)

regression=# insert into person_course values(2,2);
INSERT 0 1
regression=# SELECT *
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
full outer join course as c on pc.course_id = c.id
where p.id = 2 order by 1;
id | name | person_id | course_id | id | name
----+------+-----------+-----------+----+--------------
2 | Jill | 2 | 2 | 2 | SQL Advanced
(1 row)

BTW, I tried to duplicate this in mysql and was surprised to find that
5.0.27 doesn't seem to support full join at all :-(

regards, tom lane

#7Andreas Kretschmer
akretschmer@spamfence.net
In reply to: Tim Tassonis (#3)
Re: Tricky join question

Tim Tassonis <timtas@cubic.ch> schrieb:

This is absolutely not what I want. I want a row for every person and every
course, regardless whether the person has taken the course or not. If the
person has not taken the course, I want a null value in the person id
column:

test=# select c.id, c.name, pc.person_id
from course c
cross join person p
left outer join person_course pc on (p.id,c.id)=(pc.person_id,pc.course_id);
id | name | person_id
----+--------------+-----------
1 | SQL Beginner | 1
1 | SQL Beginner |
1 | SQL Beginner | 3
2 | SQL Advanced |
2 | SQL Advanced |
2 | SQL Advanced | 3
(6 rows)

But i think, this is a little bit stupid, because row 4 and 5 are the
same.

Perhaps this would be better:

test=# select c.id, c.name, p.id as person, pc.person_id
from course c
cross join person p
left outer join person_course pc on (p.id,c.id)=(pc.person_id,pc.course_id);
id | name | person | person_id
----+--------------+--------+-----------
1 | SQL Beginner | 1 | 1
1 | SQL Beginner | 2 |
1 | SQL Beginner | 3 | 3
2 | SQL Advanced | 1 |
2 | SQL Advanced | 2 |
2 | SQL Advanced | 3 | 3
(6 rows)

Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknow)
Kaufbach, Saxony, Germany, Europe. N 51.05082�, E 13.56889�

#8Tim Tassonis
timtas@cubic.ch
In reply to: Martijn van Oosterhout (#4)
Re: Tricky join question

Thanks to you all for your replies. I was able to solve my problem after
some more reading in the manual:

select c.id, c.name, pc.person_id
from person as p
cross join course as c
left outer join person_course as pc
on (p.id = pc.person_id and c.id = pc.course_id)
where p.id = 2;

A few remarks to the answers:

I seem to have been misunderstood in a way that people understood that I
implied that mysql is right and postgres is wrong. This was in no way my
opinion. I just reported what results I got under mysql. I very much
prefer Postgres over mysql and never questioned postgres' correctness.

To brian:

Please, read more about referential integrity.

Thanks, but I already have read a lot about it 14 years ago.

Bye
Tim

In all your long years of experience, perhaps you haven't come across this?

http://catb.org/~esr/faqs/smart-questions.html

If you're going to ask a question here the least you could do is meet us half-way.

I think I asked quite politely, did not blame anybody else and just
described my problem. I agree, I felt a bit insulted when being told to
read about referential integrity, because that clearly had nothing to do
with my question and I do know about it. I'm coming from Oracle and not
from mysql.

And thanks Andreas, I just saw your latest post which has the same
solution as I got in it.

Bye
Tim