BUG #7509: x NOT IN (select x from z) extremely slow in compare to select x from y except select x from z;

Started by Stefan de Koninkover 13 years ago2 messagesbugs
Jump to latest
#1Stefan de Konink
stefan@konink.de

The following bug has been logged on the website:

Bug reference: 7509
Logged by: Stefan de Konink
Email address: stefan@konink.de
PostgreSQL version: 9.1.5
Operating system: Linux
Description:

The following is relatively fast:

bag-2012-aug=# explain select count(*) from (select kvk from kvk_normal
except select kvk from bag_kvk) as x;
QUERY PLAN

----------------------------------------------------------------------------------------------------------
Aggregate (cost=1110465.88..1110465.89 rows=1 width=0)
-> Subquery Scan on x (cost=1042163.45..1102413.23 rows=3221060
width=0)
-> SetOp Except (cost=1042163.45..1070202.63 rows=3221060
width=8)
-> Sort (cost=1042163.45..1056183.04 rows=5607836 width=8)
Sort Key: "*SELECT* 1".kvk
-> Append (cost=0.00..183539.72 rows=5607836
width=8)
-> Subquery Scan on "*SELECT* 1"
(cost=0.00..122902.20 rows=3221060 width=8)
-> Seq Scan on kvk_normal
(cost=0.00..90691.60 rows=3221060 width=8)
-> Subquery Scan on "*SELECT* 2"
(cost=0.00..60637.52 rows=2386776 width=8)
-> Seq Scan on bag_kvk
(cost=0.00..36769.76 rows=2386776 width=8)

The 'normal' case basically doesn't finish:

bag-2012-aug=# explain select count(*) from (select kvk_normal.kvk from
kvk_normal where kvk_normal.kvk not in (select bag_kvk.kvk from bag_kvk)) as
x;
QUERY PLAN

-------------------------------------------------------------------------------------
Aggregate (cost=103065293697.97..103065293697.98 rows=1 width=0)
-> Seq Scan on kvk_normal (cost=0.00..103065289671.65 rows=1610530
width=0)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=0.00..58027.64 rows=2386776 width=8)
-> Seq Scan on bag_kvk (cost=0.00..36769.76 rows=2386776
width=8)
(6 rows)

Table size is 3.2mil rows in adres, and 2.3mil rows in bag_kvk.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stefan de Konink (#1)
Re: BUG #7509: x NOT IN (select x from z) extremely slow in compare to select x from y except select x from z;

stefan@konink.de writes:

The following is relatively fast:
bag-2012-aug=# explain select count(*) from (select kvk from kvk_normal
except select kvk from bag_kvk) as x;

The 'normal' case basically doesn't finish:
bag-2012-aug=# explain select count(*) from (select kvk_normal.kvk from
kvk_normal where kvk_normal.kvk not in (select bag_kvk.kvk from bag_kvk)) as
x;

NOT IN is difficult to optimize, as well as hard to use, because of its
rather bizarre behavior for nulls. You might consider using NOT EXISTS
instead.

regards, tom lane