"not in" clause too slow?

Started by Ottavio Campanaover 18 years ago7 messagesgeneral

Ottavio Campana

ottavio@campana.vi.it

over 18 years ago

mytable has 1857 rows, copy_mytable is a copy of mytable and I want to
know which new rows have been entered. I used the where id not in, and
the query works. My problem is that if I run the same command on another
table with 378415 rows, it is terribly slow. I ran explain analyze on
the first table, just to see how is works, and I have two questions:

1) from explain analyze output, does the planner scan mytable and for
each element runs a sec sqn on copy_mytable?

2) how can I speed it up? by using indexes? or by changing the query?

db=# EXPLAIN ANALYZE select * from mytable where id not in (select id
from copy_mytable);
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
Seq Scan on mytable (cost=53.21..148.34 rows=925 width=96) (actual
time=9.813..9.813 rows=0 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on copy_mytable (cost=0.00..48.57 rows=1857 width=4)
(actual time=0.031..3.132 rows=1857 loops=1)
Total runtime: 10.291 ms

a.hertroys@magproductions.nl

over 18 years ago

In reply to: Ottavio Campana (#1)

Re: "not in" clause too slow?

Ottavio Campana wrote:

2) how can I speed it up? by using indexes? or by changing the query?

Do you have indices on mytable.id and copy_mytable.id?
Does using NOT EXISTS get you any better results?

db=# EXPLAIN ANALYZE select * from mytable where id not in (select id
from copy_mytable);
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
Seq Scan on mytable (cost=53.21..148.34 rows=925 width=96) (actual
time=9.813..9.813 rows=0 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on copy_mytable (cost=0.00..48.57 rows=1857 width=4)
(actual time=0.031..3.132 rows=1857 loops=1)
Total runtime: 10.291 ms

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

Ottavio Campana

ottavio@campana.vi.it

over 18 years ago

In reply to: Alban Hertroys (#2)

Re: "not in" clause too slow?

Alban Hertroys ha scritto:

Ottavio Campana wrote:

2) how can I speed it up? by using indexes? or by changing the query?

Do you have indices on mytable.id and copy_mytable.id?
Does using NOT EXISTS get you any better results?

mytable.id is primary key.

I create copy_mytable with
create table copy_mytable as (select * from mytable);

the planer behavior does not change no matter if I create and index on
copy_mytable.id or not.

Show quoted text

db=# EXPLAIN ANALYZE select * from mytable where id not in (select id
from copy_mytable);
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
Seq Scan on mytable (cost=53.21..148.34 rows=925 width=96) (actual
time=9.813..9.813 rows=0 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on copy_mytable (cost=0.00..48.57 rows=1857 width=4)
(actual time=0.031..3.132 rows=1857 loops=1)
Total runtime: 10.291 ms

Rodrigo De León

rdeleonp@gmail.com

over 18 years ago

In reply to: Ottavio Campana (#1)

Re: "not in" clause too slow?

On 9/21/07, Ottavio Campana <ottavio@campana.vi.it> wrote:

My problem is that if I run the same command on another
table with 378415 rows, it is terribly slow.

How much is "terribly slow"?

Did you VACUUM ANALYZE?

Anyways, try this:

SELECT * FROM MYTABLE T1
LEFT JOIN COPY_MYTABLE T2
ON T1.ID = T2.ID
WHERE T2.ID IS NULL

panther-d@freemail.hu

over 18 years ago

In reply to: Ottavio Campana (#3)

Re: "not in" clause too slow?

Ottavio Campana <ottavio@campana.vi.it> írta:

mytable.id is primary key.

I create copy_mytable with
create table copy_mytable as (select * from mytable);

the planer behavior does not change no matter if I create and index
on copy_mytable.id or not.

Try this, with an index on copy_mytable.id:

SELECT orig.* FROM mytable AS orig WHERE NOT EXISTS (
SELECT 1 FROM copy_mytable AS copy WHERE copy.id = orig.id
);

Regards,
Denes Daniel

Olvasd az [origo]-t a mobilodon: mini magazinok a Mobizin-en
___________________________________________________
www.t-mobile.hu/mobizin

Ottavio Campana

ottavio@campana.vi.it

over 18 years ago

In reply to: Alban Hertroys (#2)

Re: "not in" clause too slow?

Alban Hertroys ha scritto:

Ottavio Campana wrote:

2) how can I speed it up? by using indexes? or by changing the query?

Do you have indices on mytable.id and copy_mytable.id?
Does using NOT EXISTS get you any better results?

Eventually I had to select not all the table fields but only the primary
key and successively loop on the table again. Thus I've been able to
exploit the indexes.

Martijn van Oosterhout

kleptog@svana.org

over 18 years ago

In reply to: Ottavio Campana (#1)

Re: "not in" clause too slow?

On Fri, Sep 21, 2007 at 12:09:50PM +0200, Ottavio Campana wrote:

2) how can I speed it up? by using indexes? or by changing the query?

Note that NOT IN cannot be optimised in the same way as NOT EXISTS due
to the different ways they handle NULL. In particular if the subquery
of the NOT IN produces a NULL *anywhere* it will always return FALSE,
hence it often needs to scan the entire subquery even when an index
might be better. You might know this cannot happen, but postgres can't
always tell. NOT EXISTS doesn't have this problem.

Blame the SQL standard if you like.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.