Damn slow query

Started by Magnus Naeslund(f)over 23 years ago7 messages

Hello, i've got this query that's really slow...
Figure this:

testdb=> select now() ; select gid from bs where gid not in ( select x
from z2test ); select now();
now
-------------------------------
2002-10-09 22:37:21.234627+02
(1 row)

gid
----------
<lotsa rows>
(524 rows)
now
-------------------------------
2002-10-09 23:20:53.227844+02
(1 row)

That's 45 minutes i don't wanna spend in there...
I got indexes:

testdb=> \d bs_gid_idx
Index "bs_gid_idx"
Column | Type
--------+-----------------------
gid | character varying(16)
online | smallint
btree

testdb=> \d z2test_x_idx;
Index "z2test_x_idx"
Column | Type
--------+-----------------------
x | character varying(16)
btree

Rowcounts are:

testdb=> select count(*) from bs ; select count(*) from z2test ;
count
-------
25376
(1 row)

count
-------
19329
(1 row)

The bs table have many other columns besides the gid one, the z2test
table only has the x column.

How can i speed this query up?
It never scans by the indexes.
I know it's a lot of iterations anyway i do it, but this is too damn
slow.

I can't profile anything at this box, because it's in production state,
but if you really want me to, i'll do it tomorrow on another box.

Magnus

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Programmer/Networker [|] Magnus Naeslund
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

#2Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Magnus Naeslund(f) (#1)
Re: Damn slow query

On Wed, 9 Oct 2002, Magnus Naeslund(f) wrote:

Hello, i've got this query that's really slow...
Figure this:

testdb=> select now() ; select gid from bs where gid not in ( select x
from z2test ); select now();

Per FAQ suggestion, try something like
select gid from bs where not exists (select * from z2test where
z2test.x=bs.gid);
to see if it is faster.

#3Joe Conway
mail@joeconway.com
In reply to: Magnus Naeslund(f) (#1)
Re: Damn slow query

Magnus Naeslund(f) wrote:

Hello, i've got this query that's really slow...
Figure this:

testdb=> select now() ; select gid from bs where gid not in ( select x
from z2test ); select now();

"IN (subselect)" is notoriously slow (in fact it is an FAQ). Can you rewrite
this as:

select b.gid from bs b where not exists (select 1 from z2test z where z.x =
b.gid);

or possibly:

select b.gid from bs b left join z2test z on z.x = b.gid where z.x IS NULL;

HTH,

Joe

In reply to: Magnus Naeslund(f) (#1)
Re: Damn slow query

Joe Conway <mail@joeconway.com> wrote:

"IN (subselect)" is notoriously slow (in fact it is an FAQ). Can you
rewrite this as:

...

Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote:

Per FAQ suggestion, try something like

...

Thanks alot, below are the results on your suggestions, quite an
dramatic differance (but this is another box, faster, and running 7.3b2
so the 45 minutes doesn't hold here, but it took more than 10 minutes
before i stopped the original query).

Is this an todo item, or should every user figure this out (yeah i know
i should have read the FAQ when it went so totally bad).
The NOT IN it seems quite natural here, but then again, i don't think as
the db as you do :)

mag=> \timing
Timing is on.
mag=> explain analyze select count(gid) from bs where not exists (
select * from z2test where z2test.x=bs.gid );
Aggregate (cost=129182.18..129182.18 rows=1 width=9) (actual
time=590.90..590.90 rows=1 loops=1)
-> Seq Scan on bs (cost=0.00..129150.46 rows=12688 width=9) (actual
time=42.57..590.46 rows=524 loops=1)
Filter: (NOT (subplan))
SubPlan
-> Index Scan using z2temp_x_idx on z2test (cost=0.00..5.07
rows=1 width=9) (actual time=0.02..0.02 rows=1 loops=25376)
Index Cond: (x = $0)
Total runtime: 591.01 msec

Time: 592.25 ms

mag=> EXPLAIN analyze select count(b.gid) from bs b left join z2test z
on z.x = b.gid where z.x IS NULL;
Aggregate (cost=1703.65..1703.65 rows=1 width=18) (actual
time=370.31..370.31 rows=1 loops=1)
-> Hash Join (cost=346.61..1640.21 rows=25376 width=18) (actual
time=75.45..369.91 rows=524 loops=1)
Hash Cond: ("outer".gid = "inner".x)
Filter: ("inner".x IS NULL)
-> Seq Scan on bs b (cost=0.00..595.76 rows=25376 width=9)
(actual time=0.01..34.20 rows=25376 loops=1)
-> Hash (cost=298.29..298.29 rows=19329 width=9) (actual
time=43.82..43.82 rows=0 loops=1)
-> Seq Scan on z2test z (cost=0.00..298.29 rows=19329
width=9) (actual time=0.02..22.69 rows=19329 loops=1)
Total runtime: 370.42 msec

Time: 371.90 ms
mag=>

Magnus

#5Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Magnus Naeslund(f) (#4)
Re: Damn slow query

Magnus Naeslund(f) wrote:

Joe Conway <mail@joeconway.com> wrote:

"IN (subselect)" is notoriously slow (in fact it is an FAQ). Can you
rewrite this as:

...

Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote:

Per FAQ suggestion, try something like

...

Thanks alot, below are the results on your suggestions, quite an
dramatic differance (but this is another box, faster, and running 7.3b2
so the 45 minutes doesn't hold here, but it took more than 10 minutes
before i stopped the original query).

Is this an todo item, or should every user figure this out (yeah i know
i should have read the FAQ when it went so totally bad).
The NOT IN it seems quite natural here, but then again, i don't think as
the db as you do :)

We already have a TODO item:

* Allow Subplans to use efficient joins(hash, merge) with upper variable

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
In reply to: Bruce Momjian (#5)
Re: Damn slow query

Bruce Momjian <pgman@candle.pha.pa.us> wrote:

We already have a TODO item:

* Allow Subplans to use efficient joins(hash, merge) with upper
variable

Cool.
One thing to note here is that the JOIN query that Joe suggested is both
faster than the subselect thing (no suprise) but also don't care if
z2test has an index on it or not.
The subselect query started taking huge amount of time again if i
dropped the z2test_x_idx ...

So if the todo could somehow figure out that that subselect should be an
JOIN instead of an NOT EXISTS query, that would be great, because the
index on z2test isn't that super-obvious (i think, because i know the
data is tiny).

Magnus

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Programmer/Networker [|] Magnus Naeslund
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

#7Joe Conway
mail@joeconway.com
In reply to: Bruce Momjian (#5)
Re: Damn slow query

Magnus Naeslund(f) wrote:

One thing to note here is that the JOIN query that Joe suggested is both
faster than the subselect thing (no suprise) but also don't care if
z2test has an index on it or not.

It's worth noting though that JOIN is not always the fastest method. I've
found situations where NOT EXISTS was significantly faster than the LEFT JOIN
method (although both are usually orders of magnatude faster than NOT IN).

Joe