Urgent - SQL left join bug?

Started by Kaijiang Chenalmost 9 years ago7 messagesbugs
Jump to latest
#1Kaijiang Chen
chenkaijiang@gmail.com

Hi, I'm running PostgreSQL 9.4.10 on CentOS 6.5. It looks like that I found
a bug with left join. It is very URGENT since it is running in the
production servers.

### Conditions: ###

I have 2 tables:

TABLE 1: (2171209 records)
\d prescription_herbs
Table "public.prescription_herbs"
Column | Type |
Modifiers
---------------------+--------------------------------+-----------------------------------------------------------------
id | integer | not null default
nextval('prescription_herbs_id_seq'::regclass)
prescription_id | integer | not null
herb_id | integer | not null
weight | integer | not null
created_at | timestamp(0) without time zone | not null
updated_at | timestamp(0) without time zone | not null
deleted_at | timestamp(0) without time zone |
price | numeric(10,5) |
special_manufacture | character varying(255) |
cost | numeric(10,5) |
pharmacy_id | integer |
Indexes:
"prescription_herbs_pkey" PRIMARY KEY, btree (id)
"prescription_herbs_hid" btree (herb_id)
"prescription_herbs_prid" btree (prescription_id)

TABLE 2: (4406 records)

\d pharmacy_herbs
Table "public.pharmacy_herbs"
Column | Type |
Modifiers
-------------+--------------------------------+-------------------------------------------------------------
id | integer | not null default
nextval('pharmacy_herbs_id_seq'::regclass)
pharmacy_id | integer |
herb_id | integer |
cost | numeric(10,5) |
price | numeric(10,5) |
no | character varying(255) |
deleted_at | timestamp(0) without time zone |
created_at | timestamp(0) without time zone | not null
updated_at | timestamp(0) without time zone | not null
name | character varying(255) |
Indexes:
"pharmacy_herbs_pkey" PRIMARY KEY, btree (id)
"pharmacy_herbs_herb_id" btree (herb_id)

### BUG: ###

I ran a SQL:
select pha.id,ph.herb_id,pha.name,ph.weight
from prescription_herbs as ph *left join* pharmacy_herbs as pha on
ph.herb_id=pha.herb_id
where ph.prescription_id=116285 and ph.deleted_at is null and
pha.deleted_at is null and pha.pharmacy_id=22 order by ph.herb_id;

Expected:

It should have 10 rows because the SQL "select * from prescription_herbs as
ph where ph.prescription_id=116285 and ph.deleted_at is null" returned 10
rows and I'm using *LEFT JOIN *in the above SQL.

Actual Result:

It returned only 9 rows and the result is the same as JOIN (not LEFT JOIN).

### More info: ###

I explain the SQL:
explain select pha.id,ph.herb_id,pha.name,ph.weight
from prescription_herbs as ph left join pharmacy_herbs as pha on
ph.herb_id=pha.herb_id
where ph.prescription_id=116285 and ph.deleted_at is null and
pha.deleted_at is null and pha.pharmacy_id=22 order by ph.herb_id;

I got:

Sort (cost=131.73..131.76 rows=10 width=24)
Sort Key: ph.herb_id
-> * Hash Join* (cost=122.02..131.56 rows=10 width=24)
Hash Cond: (ph.herb_id = pha.herb_id)
-> Index Scan using prescription_herbs_prid on prescription_herbs
ph (cost=0.43..9.68 rows=23 width=8)
Index Cond: (prescription_id = 116285)
Filter: (deleted_at IS NULL)
-> Hash (cost=113.08..113.08 rows=681 width=20)
-> Seq Scan on pharmacy_herbs pha (cost=0.00..113.08
rows=681 width=20)
Filter: ((deleted_at IS NULL) AND (pharmacy_id = 22))

I think the above* "Hash Join" SHOULD BE "Hash Left Join"*, right?

I tried to explain another SQL:
explain select * from doctors d left join prescriptions p on d.id
=p.doctor_id;

I got:
Hash Right Join (cost=2159.33..31453.58 rows=130330 width=2936)
Hash Cond: (p.doctor_id = d.id)
-> Seq Scan on prescriptions p (cost=0.00..9273.30 rows=130330
width=495)
-> Hash (cost=576.37..576.37 rows=5037 width=2441)
-> Seq Scan on doctors d (cost=0.00..576.37 rows=5037 width=2441)

The "Hash Right Join" is the correct node.

Any help is very appreciated! WAITING...

Thanks,
Kaijiang

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kaijiang Chen (#1)
Re: Urgent - SQL left join bug?

On 06/21/2017 11:06 AM, Kaijiang Chen wrote:

I ran a SQL:
select pha.id,ph.herb_id,pha.name,ph.weight
from prescription_herbs as ph *left join* pharmacy_herbs as pha on
ph.herb_id=pha.herb_id
where ph.prescription_id=116285 and ph.deleted_at is null and
pha.deleted_at is null and pha.pharmacy_id=22 order by ph.herb_id;

Expected:

It should have 10 rows because the SQL "select * from prescription_herbs as
ph where ph.prescription_id=116285 and ph.deleted_at is null" returned 10
rows and I'm using *LEFT JOIN *in the above SQL.

Actual Result:

It returned only 9 rows and the result is the same as JOIN (not LEFT JOIN).

Note that you have the condition "pha.pharmacy_id=22" in the WHERE part
of the query. That filters out rows with no matching pharmacy_herbs
rows, because pha.pharmacy_id is NULL for non-matching rows.

Put the "pha.pharmacy_id=22" condition in the ON join qual part instead.
And for readability and consistency, I'd suggest putting the
"pha.deleted_at is null" qual in the ON clause too, although that won't
affect the result:

select pha.id,ph.herb_id,pha.name,ph.weight
from prescription_herbs as ph
left join pharmacy_herbs as pha on ph.herb_id=pha.herb_id and
pha.deleted_at is null and pha.pharmacy_id=22
where ph.prescription_id=116285 and ph.deleted_at is null
order by ph.herb_id;

- Heikki

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Pantelis Theodosiou
ypercube@gmail.com
In reply to: Kaijiang Chen (#1)
Re: Urgent - SQL left join bug?

On Wed, Jun 21, 2017 at 9:06 AM, Kaijiang Chen <chenkaijiang@gmail.com>
wrote:

Hi, I'm running PostgreSQL 9.4.10 on CentOS 6.5. It looks like that I
found a bug with left join. It is very URGENT since it is running in the
production servers.

### Conditions: ###

I have 2 tables:

TABLE 1: (2171209 records)
\d prescription_herbs
Table
"public.prescription_herbs"
Column | Type |
Modifiers
---------------------+--------------------------------+-----
------------------------------------------------------------
id | integer | not null default
nextval('prescription_herbs_id_seq'::regclass)
prescription_id | integer | not null
herb_id | integer | not null
weight | integer | not null
created_at | timestamp(0) without time zone | not null
updated_at | timestamp(0) without time zone | not null
deleted_at | timestamp(0) without time zone |
price | numeric(10,5) |
special_manufacture | character varying(255) |
cost | numeric(10,5) |
pharmacy_id | integer |
Indexes:
"prescription_herbs_pkey" PRIMARY KEY, btree (id)
"prescription_herbs_hid" btree (herb_id)
"prescription_herbs_prid" btree (prescription_id)

TABLE 2: (4406 records)

\d pharmacy_herbs
Table "public.pharmacy_herbs"
Column | Type |
Modifiers
-------------+--------------------------------+-------------
------------------------------------------------
id | integer | not null default
nextval('pharmacy_herbs_id_seq'::regclass)
pharmacy_id | integer |
herb_id | integer |
cost | numeric(10,5) |
price | numeric(10,5) |
no | character varying(255) |
deleted_at | timestamp(0) without time zone |
created_at | timestamp(0) without time zone | not null
updated_at | timestamp(0) without time zone | not null
name | character varying(255) |
Indexes:
"pharmacy_herbs_pkey" PRIMARY KEY, btree (id)
"pharmacy_herbs_herb_id" btree (herb_id)

### BUG: ###

I ran a SQL:
select pha.id,ph.herb_id,pha.name,ph.weight
from prescription_herbs as ph *left join* pharmacy_herbs as pha on
ph.herb_id=pha.herb_id
where ph.prescription_id=116285 and ph.deleted_at is null and
pha.deleted_at is null and pha.pharmacy_id=22 order by ph.herb_id;

Expected:

It should have 10 rows because the SQL "select * from prescription_herbs
as ph where ph.prescription_id=116285 and ph.deleted_at is null" returned
10 rows and I'm using *LEFT JOIN *in the above SQL.

Actual Result:

It returned only 9 rows and the result is the same as JOIN (not LEFT JOIN).

### More info: ###

I explain the SQL:
explain select pha.id,ph.herb_id,pha.name,ph.weight
from prescription_herbs as ph left join pharmacy_herbs as pha on
ph.herb_id=pha.herb_id
where ph.prescription_id=116285 and ph.deleted_at is null and
pha.deleted_at is null and pha.pharmacy_id=22 order by ph.herb_id;

I got:

Sort (cost=131.73..131.76 rows=10 width=24)
Sort Key: ph.herb_id
-> * Hash Join* (cost=122.02..131.56 rows=10 width=24)
Hash Cond: (ph.herb_id = pha.herb_id)
-> Index Scan using prescription_herbs_prid on
prescription_herbs ph (cost=0.43..9.68 rows=23 width=8)
Index Cond: (prescription_id = 116285)
Filter: (deleted_at IS NULL)
-> Hash (cost=113.08..113.08 rows=681 width=20)
-> Seq Scan on pharmacy_herbs pha (cost=0.00..113.08
rows=681 width=20)
Filter: ((deleted_at IS NULL) AND (pharmacy_id = 22))

I think the above* "Hash Join" SHOULD BE "Hash Left Join"*, right?

I tried to explain another SQL:
explain select * from doctors d left join prescriptions p on d.id
=p.doctor_id;

I got:
Hash Right Join (cost=2159.33..31453.58 rows=130330 width=2936)
Hash Cond: (p.doctor_id = d.id)
-> Seq Scan on prescriptions p (cost=0.00..9273.30 rows=130330
width=495)
-> Hash (cost=576.37..576.37 rows=5037 width=2441)
-> Seq Scan on doctors d (cost=0.00..576.37 rows=5037
width=2441)

The "Hash Right Join" is the correct node.

Any help is very appreciated! WAITING...

Thanks,
Kaijiang

This is not a bug. The conditions of the "right" table pha - basically
(pha.pharmacy_id=22) and secondary ((pha.deleted_at is null) )- that you
have put in the WHERE clause make the query act as if it was an INNER join.

Move them to the ON and you'll get your 10 rows.

select ...
from prescription_herbs as ph *left join* pharmacy_herbs as pha
on ph.herb_id=pha.herb_id and pha.deleted_at is null and
pha.pharmacy_id=22
where ph.prescription_id=116285 and ph.deleted_at is null
order by ...

The (pha.deleted_at is null) part may give different results depending on
where it is placed (ON vs WHERE) but my guess - since you want all the 10
rows of the left table - is that it should be in ON, too

Pantelis

#4Kaijiang Chen
chenkaijiang@gmail.com
In reply to: Pantelis Theodosiou (#3)
Re: Urgent - SQL left join bug?

Hi, Pantelis and Heikki, thank you very much for such a quick response!

I got it. I was so silly.....

On Wed, Jun 21, 2017 at 4:23 PM, Pantelis Theodosiou <ypercube@gmail.com>
wrote:

Show quoted text

On Wed, Jun 21, 2017 at 9:06 AM, Kaijiang Chen <chenkaijiang@gmail.com>
wrote:

Hi, I'm running PostgreSQL 9.4.10 on CentOS 6.5. It looks like that I
found a bug with left join. It is very URGENT since it is running in the
production servers.

### Conditions: ###

I have 2 tables:

TABLE 1: (2171209 records)
\d prescription_herbs
Table
"public.prescription_herbs"
Column | Type |
Modifiers
---------------------+--------------------------------+-----
------------------------------------------------------------
id | integer | not null default
nextval('prescription_herbs_id_seq'::regclass)
prescription_id | integer | not null
herb_id | integer | not null
weight | integer | not null
created_at | timestamp(0) without time zone | not null
updated_at | timestamp(0) without time zone | not null
deleted_at | timestamp(0) without time zone |
price | numeric(10,5) |
special_manufacture | character varying(255) |
cost | numeric(10,5) |
pharmacy_id | integer |
Indexes:
"prescription_herbs_pkey" PRIMARY KEY, btree (id)
"prescription_herbs_hid" btree (herb_id)
"prescription_herbs_prid" btree (prescription_id)

TABLE 2: (4406 records)

\d pharmacy_herbs
Table "public.pharmacy_herbs"
Column | Type |
Modifiers
-------------+--------------------------------+-------------
------------------------------------------------
id | integer | not null default
nextval('pharmacy_herbs_id_seq'::regclass)
pharmacy_id | integer |
herb_id | integer |
cost | numeric(10,5) |
price | numeric(10,5) |
no | character varying(255) |
deleted_at | timestamp(0) without time zone |
created_at | timestamp(0) without time zone | not null
updated_at | timestamp(0) without time zone | not null
name | character varying(255) |
Indexes:
"pharmacy_herbs_pkey" PRIMARY KEY, btree (id)
"pharmacy_herbs_herb_id" btree (herb_id)

### BUG: ###

I ran a SQL:
select pha.id,ph.herb_id,pha.name,ph.weight
from prescription_herbs as ph *left join* pharmacy_herbs as pha on
ph.herb_id=pha.herb_id
where ph.prescription_id=116285 and ph.deleted_at is null and
pha.deleted_at is null and pha.pharmacy_id=22 order by ph.herb_id;

Expected:

It should have 10 rows because the SQL "select * from prescription_herbs
as ph where ph.prescription_id=116285 and ph.deleted_at is null" returned
10 rows and I'm using *LEFT JOIN *in the above SQL.

Actual Result:

It returned only 9 rows and the result is the same as JOIN (not LEFT
JOIN).

### More info: ###

I explain the SQL:
explain select pha.id,ph.herb_id,pha.name,ph.weight
from prescription_herbs as ph left join pharmacy_herbs as pha on
ph.herb_id=pha.herb_id
where ph.prescription_id=116285 and ph.deleted_at is null and
pha.deleted_at is null and pha.pharmacy_id=22 order by ph.herb_id;

I got:

Sort (cost=131.73..131.76 rows=10 width=24)
Sort Key: ph.herb_id
-> * Hash Join* (cost=122.02..131.56 rows=10 width=24)
Hash Cond: (ph.herb_id = pha.herb_id)
-> Index Scan using prescription_herbs_prid on
prescription_herbs ph (cost=0.43..9.68 rows=23 width=8)
Index Cond: (prescription_id = 116285)
Filter: (deleted_at IS NULL)
-> Hash (cost=113.08..113.08 rows=681 width=20)
-> Seq Scan on pharmacy_herbs pha (cost=0.00..113.08
rows=681 width=20)
Filter: ((deleted_at IS NULL) AND (pharmacy_id = 22))

I think the above* "Hash Join" SHOULD BE "Hash Left Join"*, right?

I tried to explain another SQL:
explain select * from doctors d left join prescriptions p on d.id
=p.doctor_id;

I got:
Hash Right Join (cost=2159.33..31453.58 rows=130330 width=2936)
Hash Cond: (p.doctor_id = d.id)
-> Seq Scan on prescriptions p (cost=0.00..9273.30 rows=130330
width=495)
-> Hash (cost=576.37..576.37 rows=5037 width=2441)
-> Seq Scan on doctors d (cost=0.00..576.37 rows=5037
width=2441)

The "Hash Right Join" is the correct node.

Any help is very appreciated! WAITING...

Thanks,
Kaijiang

This is not a bug. The conditions of the "right" table pha - basically
(pha.pharmacy_id=22) and secondary ((pha.deleted_at is null) )- that you
have put in the WHERE clause make the query act as if it was an INNER join.

Move them to the ON and you'll get your 10 rows.

select ...
from prescription_herbs as ph *left join* pharmacy_herbs as pha
on ph.herb_id=pha.herb_id and pha.deleted_at is null and
pha.pharmacy_id=22
where ph.prescription_id=116285 and ph.deleted_at is null
order by ...

The (pha.deleted_at is null) part may give different results depending on
where it is placed (ON vs WHERE) but my guess - since you want all the 10
rows of the left table - is that it should be in ON, too

Pantelis

#5Kaijiang Chen
chenkaijiang@gmail.com
In reply to: Kaijiang Chen (#4)
Re: Urgent - SQL left join bug?

The correct SQL should be:

select pha.id,ph.herb_id,pha.name,ph.weight
from prescription_herbs as ph left join pharmacy_herbs as pha on
ph.herb_id=pha.herb_id *and pha.pharmacy_id=22*
where ph.prescription_id=116285 and ph.deleted_at is null and
pha.deleted_at is null order by ph.herb_id;

On Wed, Jun 21, 2017 at 4:26 PM, Kaijiang Chen <chenkaijiang@gmail.com>
wrote:

Show quoted text

Hi, Pantelis and Heikki, thank you very much for such a quick response!

I got it. I was so silly.....

On Wed, Jun 21, 2017 at 4:23 PM, Pantelis Theodosiou <ypercube@gmail.com>
wrote:

On Wed, Jun 21, 2017 at 9:06 AM, Kaijiang Chen <chenkaijiang@gmail.com>
wrote:

Hi, I'm running PostgreSQL 9.4.10 on CentOS 6.5. It looks like that I
found a bug with left join. It is very URGENT since it is running in the
production servers.

### Conditions: ###

I have 2 tables:

TABLE 1: (2171209 records)
\d prescription_herbs
Table
"public.prescription_herbs"
Column | Type |
Modifiers
---------------------+--------------------------------+-----
------------------------------------------------------------
id | integer | not null default
nextval('prescription_herbs_id_seq'::regclass)
prescription_id | integer | not null
herb_id | integer | not null
weight | integer | not null
created_at | timestamp(0) without time zone | not null
updated_at | timestamp(0) without time zone | not null
deleted_at | timestamp(0) without time zone |
price | numeric(10,5) |
special_manufacture | character varying(255) |
cost | numeric(10,5) |
pharmacy_id | integer |
Indexes:
"prescription_herbs_pkey" PRIMARY KEY, btree (id)
"prescription_herbs_hid" btree (herb_id)
"prescription_herbs_prid" btree (prescription_id)

TABLE 2: (4406 records)

\d pharmacy_herbs
Table "public.pharmacy_herbs"
Column | Type |
Modifiers
-------------+--------------------------------+-------------
------------------------------------------------
id | integer | not null default
nextval('pharmacy_herbs_id_seq'::regclass)
pharmacy_id | integer |
herb_id | integer |
cost | numeric(10,5) |
price | numeric(10,5) |
no | character varying(255) |
deleted_at | timestamp(0) without time zone |
created_at | timestamp(0) without time zone | not null
updated_at | timestamp(0) without time zone | not null
name | character varying(255) |
Indexes:
"pharmacy_herbs_pkey" PRIMARY KEY, btree (id)
"pharmacy_herbs_herb_id" btree (herb_id)

### BUG: ###

I ran a SQL:
select pha.id,ph.herb_id,pha.name,ph.weight
from prescription_herbs as ph *left join* pharmacy_herbs as pha on
ph.herb_id=pha.herb_id
where ph.prescription_id=116285 and ph.deleted_at is null and
pha.deleted_at is null and pha.pharmacy_id=22 order by ph.herb_id;

Expected:

It should have 10 rows because the SQL "select * from prescription_herbs
as ph where ph.prescription_id=116285 and ph.deleted_at is null" returned
10 rows and I'm using *LEFT JOIN *in the above SQL.

Actual Result:

It returned only 9 rows and the result is the same as JOIN (not LEFT
JOIN).

### More info: ###

I explain the SQL:
explain select pha.id,ph.herb_id,pha.name,ph.weight
from prescription_herbs as ph left join pharmacy_herbs as pha on
ph.herb_id=pha.herb_id
where ph.prescription_id=116285 and ph.deleted_at is null and
pha.deleted_at is null and pha.pharmacy_id=22 order by ph.herb_id;

I got:

Sort (cost=131.73..131.76 rows=10 width=24)
Sort Key: ph.herb_id
-> * Hash Join* (cost=122.02..131.56 rows=10 width=24)
Hash Cond: (ph.herb_id = pha.herb_id)
-> Index Scan using prescription_herbs_prid on
prescription_herbs ph (cost=0.43..9.68 rows=23 width=8)
Index Cond: (prescription_id = 116285)
Filter: (deleted_at IS NULL)
-> Hash (cost=113.08..113.08 rows=681 width=20)
-> Seq Scan on pharmacy_herbs pha (cost=0.00..113.08
rows=681 width=20)
Filter: ((deleted_at IS NULL) AND (pharmacy_id =
22))

I think the above* "Hash Join" SHOULD BE "Hash Left Join"*, right?

I tried to explain another SQL:
explain select * from doctors d left join prescriptions p on d.id
=p.doctor_id;

I got:
Hash Right Join (cost=2159.33..31453.58 rows=130330 width=2936)
Hash Cond: (p.doctor_id = d.id)
-> Seq Scan on prescriptions p (cost=0.00..9273.30 rows=130330
width=495)
-> Hash (cost=576.37..576.37 rows=5037 width=2441)
-> Seq Scan on doctors d (cost=0.00..576.37 rows=5037
width=2441)

The "Hash Right Join" is the correct node.

Any help is very appreciated! WAITING...

Thanks,
Kaijiang

This is not a bug. The conditions of the "right" table pha - basically
(pha.pharmacy_id=22) and secondary ((pha.deleted_at is null) )- that you
have put in the WHERE clause make the query act as if it was an INNER join.

Move them to the ON and you'll get your 10 rows.

select ...
from prescription_herbs as ph *left join* pharmacy_herbs as pha
on ph.herb_id=pha.herb_id and pha.deleted_at is null and
pha.pharmacy_id=22
where ph.prescription_id=116285 and ph.deleted_at is null
order by ...

The (pha.deleted_at is null) part may give different results depending on
where it is placed (ON vs WHERE) but my guess - since you want all the 10
rows of the left table - is that it should be in ON, too

Pantelis

#6Kaijiang Chen
chenkaijiang@gmail.com
In reply to: Kaijiang Chen (#5)
Re: Urgent - SQL left join bug?

Thank you PG guys! I fix the SQL and everything works so well!

On Wed, Jun 21, 2017 at 4:28 PM, Kaijiang Chen <chenkaijiang@gmail.com>
wrote:

Show quoted text

The correct SQL should be:

select pha.id,ph.herb_id,pha.name,ph.weight
from prescription_herbs as ph left join pharmacy_herbs as pha on
ph.herb_id=pha.herb_id *and pha.pharmacy_id=22*
where ph.prescription_id=116285 and ph.deleted_at is null and
pha.deleted_at is null order by ph.herb_id;

On Wed, Jun 21, 2017 at 4:26 PM, Kaijiang Chen <chenkaijiang@gmail.com>
wrote:

Hi, Pantelis and Heikki, thank you very much for such a quick response!

I got it. I was so silly.....

On Wed, Jun 21, 2017 at 4:23 PM, Pantelis Theodosiou <ypercube@gmail.com>
wrote:

On Wed, Jun 21, 2017 at 9:06 AM, Kaijiang Chen <chenkaijiang@gmail.com>
wrote:

Hi, I'm running PostgreSQL 9.4.10 on CentOS 6.5. It looks like that I
found a bug with left join. It is very URGENT since it is running in the
production servers.

### Conditions: ###

I have 2 tables:

TABLE 1: (2171209 records)
\d prescription_herbs
Table
"public.prescription_herbs"
Column | Type |
Modifiers
---------------------+--------------------------------+-----
------------------------------------------------------------
id | integer | not null
default nextval('prescription_herbs_id_seq'::regclass)
prescription_id | integer | not null
herb_id | integer | not null
weight | integer | not null
created_at | timestamp(0) without time zone | not null
updated_at | timestamp(0) without time zone | not null
deleted_at | timestamp(0) without time zone |
price | numeric(10,5) |
special_manufacture | character varying(255) |
cost | numeric(10,5) |
pharmacy_id | integer |
Indexes:
"prescription_herbs_pkey" PRIMARY KEY, btree (id)
"prescription_herbs_hid" btree (herb_id)
"prescription_herbs_prid" btree (prescription_id)

TABLE 2: (4406 records)

\d pharmacy_herbs
Table "public.pharmacy_herbs"
Column | Type |
Modifiers
-------------+--------------------------------+-------------
------------------------------------------------
id | integer | not null default
nextval('pharmacy_herbs_id_seq'::regclass)
pharmacy_id | integer |
herb_id | integer |
cost | numeric(10,5) |
price | numeric(10,5) |
no | character varying(255) |
deleted_at | timestamp(0) without time zone |
created_at | timestamp(0) without time zone | not null
updated_at | timestamp(0) without time zone | not null
name | character varying(255) |
Indexes:
"pharmacy_herbs_pkey" PRIMARY KEY, btree (id)
"pharmacy_herbs_herb_id" btree (herb_id)

### BUG: ###

I ran a SQL:
select pha.id,ph.herb_id,pha.name,ph.weight
from prescription_herbs as ph *left join* pharmacy_herbs as pha on
ph.herb_id=pha.herb_id
where ph.prescription_id=116285 and ph.deleted_at is null and
pha.deleted_at is null and pha.pharmacy_id=22 order by ph.herb_id;

Expected:

It should have 10 rows because the SQL "select * from
prescription_herbs as ph where ph.prescription_id=116285 and ph.deleted_at
is null" returned 10 rows and I'm using *LEFT JOIN *in the above SQL.

Actual Result:

It returned only 9 rows and the result is the same as JOIN (not LEFT
JOIN).

### More info: ###

I explain the SQL:
explain select pha.id,ph.herb_id,pha.name,ph.weight
from prescription_herbs as ph left join pharmacy_herbs as pha on
ph.herb_id=pha.herb_id
where ph.prescription_id=116285 and ph.deleted_at is null and
pha.deleted_at is null and pha.pharmacy_id=22 order by ph.herb_id;

I got:

Sort (cost=131.73..131.76 rows=10 width=24)
Sort Key: ph.herb_id
-> * Hash Join* (cost=122.02..131.56 rows=10 width=24)
Hash Cond: (ph.herb_id = pha.herb_id)
-> Index Scan using prescription_herbs_prid on
prescription_herbs ph (cost=0.43..9.68 rows=23 width=8)
Index Cond: (prescription_id = 116285)
Filter: (deleted_at IS NULL)
-> Hash (cost=113.08..113.08 rows=681 width=20)
-> Seq Scan on pharmacy_herbs pha (cost=0.00..113.08
rows=681 width=20)
Filter: ((deleted_at IS NULL) AND (pharmacy_id =
22))

I think the above* "Hash Join" SHOULD BE "Hash Left Join"*, right?

I tried to explain another SQL:
explain select * from doctors d left join prescriptions p on d.id
=p.doctor_id;

I got:
Hash Right Join (cost=2159.33..31453.58 rows=130330 width=2936)
Hash Cond: (p.doctor_id = d.id)
-> Seq Scan on prescriptions p (cost=0.00..9273.30 rows=130330
width=495)
-> Hash (cost=576.37..576.37 rows=5037 width=2441)
-> Seq Scan on doctors d (cost=0.00..576.37 rows=5037
width=2441)

The "Hash Right Join" is the correct node.

Any help is very appreciated! WAITING...

Thanks,
Kaijiang

This is not a bug. The conditions of the "right" table pha - basically
(pha.pharmacy_id=22) and secondary ((pha.deleted_at is null) )- that you
have put in the WHERE clause make the query act as if it was an INNER join.

Move them to the ON and you'll get your 10 rows.

select ...
from prescription_herbs as ph *left join* pharmacy_herbs as pha
on ph.herb_id=pha.herb_id and pha.deleted_at is null and
pha.pharmacy_id=22
where ph.prescription_id=116285 and ph.deleted_at is null
order by ...

The (pha.deleted_at is null) part may give different results depending
on where it is placed (ON vs WHERE) but my guess - since you want all the
10 rows of the left table - is that it should be in ON, too

Pantelis

#7Kaijiang Chen
chenkaijiang@gmail.com
In reply to: Kaijiang Chen (#6)
Re: Urgent - SQL left join bug?

The pg planner / optimizer is awesome! It optimizes the 1st SQL's left join
into inner join (and their are equivalent in that SQL) and made me come to
the silly conclusion that it is a "bug"...

On Wed, Jun 21, 2017 at 4:30 PM, Kaijiang Chen <chenkaijiang@gmail.com>
wrote:

Show quoted text

Thank you PG guys! I fix the SQL and everything works so well!

On Wed, Jun 21, 2017 at 4:28 PM, Kaijiang Chen <chenkaijiang@gmail.com>
wrote:

The correct SQL should be:

select pha.id,ph.herb_id,pha.name,ph.weight
from prescription_herbs as ph left join pharmacy_herbs as pha on
ph.herb_id=pha.herb_id *and pha.pharmacy_id=22*
where ph.prescription_id=116285 and ph.deleted_at is null and
pha.deleted_at is null order by ph.herb_id;

On Wed, Jun 21, 2017 at 4:26 PM, Kaijiang Chen <chenkaijiang@gmail.com>
wrote:

Hi, Pantelis and Heikki, thank you very much for such a quick response!

I got it. I was so silly.....

On Wed, Jun 21, 2017 at 4:23 PM, Pantelis Theodosiou <ypercube@gmail.com

wrote:

On Wed, Jun 21, 2017 at 9:06 AM, Kaijiang Chen <chenkaijiang@gmail.com>
wrote:

Hi, I'm running PostgreSQL 9.4.10 on CentOS 6.5. It looks like that I
found a bug with left join. It is very URGENT since it is running in the
production servers.

### Conditions: ###

I have 2 tables:

TABLE 1: (2171209 records)
\d prescription_herbs
Table
"public.prescription_herbs"
Column | Type |
Modifiers
---------------------+--------------------------------+-----
------------------------------------------------------------
id | integer | not null
default nextval('prescription_herbs_id_seq'::regclass)
prescription_id | integer | not null
herb_id | integer | not null
weight | integer | not null
created_at | timestamp(0) without time zone | not null
updated_at | timestamp(0) without time zone | not null
deleted_at | timestamp(0) without time zone |
price | numeric(10,5) |
special_manufacture | character varying(255) |
cost | numeric(10,5) |
pharmacy_id | integer |
Indexes:
"prescription_herbs_pkey" PRIMARY KEY, btree (id)
"prescription_herbs_hid" btree (herb_id)
"prescription_herbs_prid" btree (prescription_id)

TABLE 2: (4406 records)

\d pharmacy_herbs
Table "public.pharmacy_herbs"
Column | Type |
Modifiers
-------------+--------------------------------+-------------
------------------------------------------------
id | integer | not null default
nextval('pharmacy_herbs_id_seq'::regclass)
pharmacy_id | integer |
herb_id | integer |
cost | numeric(10,5) |
price | numeric(10,5) |
no | character varying(255) |
deleted_at | timestamp(0) without time zone |
created_at | timestamp(0) without time zone | not null
updated_at | timestamp(0) without time zone | not null
name | character varying(255) |
Indexes:
"pharmacy_herbs_pkey" PRIMARY KEY, btree (id)
"pharmacy_herbs_herb_id" btree (herb_id)

### BUG: ###

I ran a SQL:
select pha.id,ph.herb_id,pha.name,ph.weight
from prescription_herbs as ph *left join* pharmacy_herbs as pha on
ph.herb_id=pha.herb_id
where ph.prescription_id=116285 and ph.deleted_at is null and
pha.deleted_at is null and pha.pharmacy_id=22 order by ph.herb_id;

Expected:

It should have 10 rows because the SQL "select * from
prescription_herbs as ph where ph.prescription_id=116285 and ph.deleted_at
is null" returned 10 rows and I'm using *LEFT JOIN *in the above SQL.

Actual Result:

It returned only 9 rows and the result is the same as JOIN (not LEFT
JOIN).

### More info: ###

I explain the SQL:
explain select pha.id,ph.herb_id,pha.name,ph.weight
from prescription_herbs as ph left join pharmacy_herbs as pha on
ph.herb_id=pha.herb_id
where ph.prescription_id=116285 and ph.deleted_at is null and
pha.deleted_at is null and pha.pharmacy_id=22 order by ph.herb_id;

I got:

Sort (cost=131.73..131.76 rows=10 width=24)
Sort Key: ph.herb_id
-> * Hash Join* (cost=122.02..131.56 rows=10 width=24)
Hash Cond: (ph.herb_id = pha.herb_id)
-> Index Scan using prescription_herbs_prid on
prescription_herbs ph (cost=0.43..9.68 rows=23 width=8)
Index Cond: (prescription_id = 116285)
Filter: (deleted_at IS NULL)
-> Hash (cost=113.08..113.08 rows=681 width=20)
-> Seq Scan on pharmacy_herbs pha (cost=0.00..113.08
rows=681 width=20)
Filter: ((deleted_at IS NULL) AND (pharmacy_id =
22))

I think the above* "Hash Join" SHOULD BE "Hash Left Join"*, right?

I tried to explain another SQL:
explain select * from doctors d left join prescriptions p on d.id
=p.doctor_id;

I got:
Hash Right Join (cost=2159.33..31453.58 rows=130330 width=2936)
Hash Cond: (p.doctor_id = d.id)
-> Seq Scan on prescriptions p (cost=0.00..9273.30 rows=130330
width=495)
-> Hash (cost=576.37..576.37 rows=5037 width=2441)
-> Seq Scan on doctors d (cost=0.00..576.37 rows=5037
width=2441)

The "Hash Right Join" is the correct node.

Any help is very appreciated! WAITING...

Thanks,
Kaijiang

This is not a bug. The conditions of the "right" table pha - basically
(pha.pharmacy_id=22) and secondary ((pha.deleted_at is null) )- that you
have put in the WHERE clause make the query act as if it was an INNER join.

Move them to the ON and you'll get your 10 rows.

select ...
from prescription_herbs as ph *left join* pharmacy_herbs as pha
on ph.herb_id=pha.herb_id and pha.deleted_at is null and
pha.pharmacy_id=22
where ph.prescription_id=116285 and ph.deleted_at is null
order by ...

The (pha.deleted_at is null) part may give different results depending
on where it is placed (ON vs WHERE) but my guess - since you want all the
10 rows of the left table - is that it should be in ON, too

Pantelis