I don't understand something...

Started by Alexander Pyhalovover 14 years ago5 messagesgeneral

alp@rsu.ru

over 14 years ago

Hello.
I was asked a simple question. We have table employees:
\d employees
Table "public.employees"
Column | Type |
Modifiers
----------------+-----------------------------+-----------------------------------------------------------------
employee_id | integer | not null default
nextval('employees_employee_id_seq'::regclass)
first_name | character varying(20) |
last_name | character varying(25) | not null
email | character varying(25) | not null
phone_number | character varying(20) |
hire_date | timestamp without time zone | not null
job_id | character varying(10) | not null
salary | numeric(8,2) |
commission_pct | numeric(2,2) |
manager_id | integer |
department_id | integer |
Indexes:
"employees_pkey" PRIMARY KEY, btree (employee_id)
"emp_email_uk" UNIQUE, btree (email)
"emp_department_ix" btree (department_id)
"emp_job_ix" btree (job_id)
"emp_manager_ix" btree (manager_id)
"emp_name_ix" btree (last_name, first_name)
Check constraints:
"emp_salary_min" CHECK (salary > 0::numeric)
Foreign-key constraints:
"employees_department_id_fkey" FOREIGN KEY (department_id)
REFERENCES departments(department_id)
"employees_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobs(job_id)
"employees_manager_id_fkey" FOREIGN KEY (manager_id) REFERENCES
employees(employee_id)
Referenced by:
TABLE "departments" CONSTRAINT "dept_mgr_fk" FOREIGN KEY
(manager_id) REFERENCES employees(employee_id)
TABLE "employees" CONSTRAINT "employees_manager_id_fkey" FOREIGN
KEY (manager_id) REFERENCES employees(employee_id)
TABLE "job_history" CONSTRAINT "job_history_employee_id_fkey"
FOREIGN KEY (employee_id) REFERENCES employees(employee_id)

Now we want to select count of all employees who doesn't have any
subordinates (query 1):
SELECT count(employee_id) from employees o where not exists (select 1
from employees where manager_id=o.employee_id);
count
-------
89
(1 row)

We can select count of all managers (query 2):
SELECT count(employee_id) from employees where employee_id in (select
manager_id from employees);
count
-------
18
(1 row)

But if we reformulate the first query in the same way, answer is
different (query 3):
SELECT count(employee_id) from employees where employee_id not in
(select manager_id from employees) (query 3);
count
-------
0
(1 row)

I don't understand why queries 1 and 3 give different results. They
seems to be the same... Could someone explain the difference?

--
Best regards,
Alexander Pyhalov,
system administrator of Computer Center of Southern Federal University

Achilleas Mantzios

achill@matrix.gatewaynet.com

over 14 years ago

In reply to: Alexander Pyhalov (#1)

Re: I don't understand something...

Alexander, that's a classic one,
rewrite your last query as :

SELECT count(employee_id) from employees where employee_id not in
(select manager_id from employees WHERE manager_id IS NOT NULL);

NULLS semantics are sometimes not so obvious.

Στις Monday 03 October 2011 09:33:12 ο/η Alexander Pyhalov έγραψε:

Hello.
I was asked a simple question. We have table employees:
\d employees
Table "public.employees"
Column | Type |
Modifiers
----------------+-----------------------------+-----------------------------------------------------------------
employee_id | integer | not null default
nextval('employees_employee_id_seq'::regclass)
first_name | character varying(20) |
last_name | character varying(25) | not null
email | character varying(25) | not null
phone_number | character varying(20) |
hire_date | timestamp without time zone | not null
job_id | character varying(10) | not null
salary | numeric(8,2) |
commission_pct | numeric(2,2) |
manager_id | integer |
department_id | integer |
Indexes:
"employees_pkey" PRIMARY KEY, btree (employee_id)
"emp_email_uk" UNIQUE, btree (email)
"emp_department_ix" btree (department_id)
"emp_job_ix" btree (job_id)
"emp_manager_ix" btree (manager_id)
"emp_name_ix" btree (last_name, first_name)
Check constraints:
"emp_salary_min" CHECK (salary > 0::numeric)
Foreign-key constraints:
"employees_department_id_fkey" FOREIGN KEY (department_id)
REFERENCES departments(department_id)
"employees_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobs(job_id)
"employees_manager_id_fkey" FOREIGN KEY (manager_id) REFERENCES
employees(employee_id)
Referenced by:
TABLE "departments" CONSTRAINT "dept_mgr_fk" FOREIGN KEY
(manager_id) REFERENCES employees(employee_id)
TABLE "employees" CONSTRAINT "employees_manager_id_fkey" FOREIGN
KEY (manager_id) REFERENCES employees(employee_id)
TABLE "job_history" CONSTRAINT "job_history_employee_id_fkey"
FOREIGN KEY (employee_id) REFERENCES employees(employee_id)

Now we want to select count of all employees who doesn't have any
subordinates (query 1):
SELECT count(employee_id) from employees o where not exists (select 1
from employees where manager_id=o.employee_id);
count
-------
89
(1 row)

We can select count of all managers (query 2):
SELECT count(employee_id) from employees where employee_id in (select
manager_id from employees);
count
-------
18
(1 row)

But if we reformulate the first query in the same way, answer is
different (query 3):
SELECT count(employee_id) from employees where employee_id not in
(select manager_id from employees) (query 3);
count
-------
0
(1 row)

I don't understand why queries 1 and 3 give different results. They
seems to be the same... Could someone explain the difference?

--
Best regards,
Alexander Pyhalov,
system administrator of Computer Center of Southern Federal University

--
Achilleas Mantzios

Alexander Pyhalov

alp@rsu.ru

over 14 years ago

In reply to: Achilleas Mantzios (#2)

Re: I don't understand something...

On 10/03/2011 11:19, Achilleas Mantzios wrote:

Alexander, that's a classic one,
rewrite your last query as :

SELECT count(employee_id) from employees where employee_id not in
(select manager_id from employees WHERE manager_id IS NOT NULL);

NULLS semantics are sometimes not so obvious.

Thanks.
It's confusing, that IN check works as expected, but NOT IN works this
way...

--
Best regards,
Alexander Pyhalov,
system administrator of Computer Center of Southern Federal University

hubert depesz lubaczewski

depesz@depesz.com

over 14 years ago

In reply to: Alexander Pyhalov (#3)

Re: I don't understand something...

On Mon, Oct 03, 2011 at 11:48:45AM +0400, Alexander Pyhalov wrote:

On 10/03/2011 11:19, Achilleas Mantzios wrote:

Alexander, that's a classic one,
rewrite your last query as :

SELECT count(employee_id) from employees where employee_id not in
(select manager_id from employees WHERE manager_id IS NOT NULL);

NULLS semantics are sometimes not so obvious.

Thanks.
It's confusing, that IN check works as expected, but NOT IN works
this way...

If I might plug:
http://www.depesz.com/index.php/2008/08/13/nulls-vs-not-in/

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

Alban Hertroys

haramrae@gmail.com

over 14 years ago

In reply to: Alexander Pyhalov (#1)

Re: I don't understand something...

On 3 October 2011 08:33, Alexander Pyhalov <alp@rsu.ru> wrote:

Now we want to select count of all employees who doesn't have any
subordinates (query 1):
SELECT count(employee_id) from employees o where not exists (select 1 from
employees where manager_id=o.employee_id);
count
-------
89
(1 row)

We can select count of all managers (query 2):
SELECT count(employee_id) from employees where employee_id in (select
manager_id from employees);
count
-------
18
(1 row)

But if we reformulate the first query in the same way, answer is different
(query 3):
SELECT count(employee_id) from employees where employee_id not in (select
manager_id from employees) (query 3);
count
-------
0
(1 row)

I don't understand why queries 1 and 3 give different results. They seems to
be the same... Could someone explain the difference?

That's because NOT IN returns NULL if there are any NULLs in the list.
As the WHERE-clause requires something to evaluate to either true or
false (NULL won't do), you (correctly) get false if someone is a
manager, but also if _anyone_ is NOT a manager.

That's an artefact of how 3-valued logic is implemented in the SQL standard.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.