Instances where enable_seqscan = false is good

Started by Ow Mun Hengover 18 years ago7 messagesgeneral
Jump to latest
#1Ow Mun Heng
Ow.Mun.Heng@wdc.com

Same query, executed twice, once using seqscan enabled and the other
with it disabled. Difference is nearly night and day.

How can I persuade PG to use the index w/o resorting to setting seqscan
= false

(actually, I don't know what are the pro or cons - I read posts from the
archives far back as 2005 but that was dealing with inconsistencies in
the text string eg: warwa(s/z/etc..) which caused it to pick seq scans.)

PPl in IRC suggested setting default_statistics = 100 but I didn't find
that useful nor helpful. Also, tables has been vacuum and analysed.

Using Seq-scans
---------------

QUERY PLAN
--------------------------------------------------------------------------------------------------
Limit (cost=4430.53..50173.70 rows=1000 width=47) (actual
time=21832.092..43771.536 rows=228 loops=1)
-> Hash Join (cost=4430.53..260866.77 rows=5606 width=47) (actual
time=21832.088..43770.927 rows=228 loops=1)
Hash Cond: ((trz.number)::text = (drv.number)::text)
-> Seq Scan on zone trz (cost=0.00..233254.27 rows=6148222
width=39) (actual time=22.807..31891.591 rows=6181910 loops=1)
Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL))
-> Hash (cost=4414.39..4414.39 rows=1291 width=24) (actual
time=171.911..171.911 rows=12591 loops=1)
-> Bitmap Heap Scan on drv (cost=30.44..4414.39
rows=1291 width=24) (actual time=62.980..142.594 rows=12591 loops=1)
Recheck Cond: ((code)::text = 'NRN15'::text)
-> Bitmap Index Scan on idx_drv (cost=0.00..30.11
rows=1291 width=0) (actual time=62.199..62.199 rows=12649 loops=1)
Index Cond: ((code)::text = 'NRN15'::text)
Total runtime: 43772.045 ms
(11 rows)

set enable_seqscan = false;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------
--
Limit (cost=0.00..69314.54 rows=1000 width=47) (actual
time=122.920..553.538 rows=228 loops=1)
-> Nested Loop (cost=0.00..388646.63 rows=5607 width=47) (actual
time=122.915..552.956 rows=228 loops=1)
-> Index Scan using idx_drv on drv (cost=0.00..5077.64
rows=1293 width=24) (actual time=38.164..110.933 rows=12591 loops=1)
Index Cond: ((code)::text = 'NRN15'::text)
-> Index Scan using idx_trz_sn on zone trz (cost=0.00..295.10
rows=120 width=39) (actual time=0.021..0.021 rows=0 loops=12591)
Index Cond: ((drv.number)::text = (trz.number)::text)
Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL))
Total runtime: 553.964 ms
(8 rows)

Attachments:

pg_seq_scan.txttext/plain; charset=UTF-8; name=pg_seq_scan.txtDownload
#2Bruce Momjian
bruce@momjian.us
In reply to: Ow Mun Heng (#1)
Re: Instances where enable_seqscan = false is good

"Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes:

Same query, executed twice, once using seqscan enabled and the other
with it disabled. Difference is nearly night and day.

How can I persuade PG to use the index w/o resorting to setting seqscan
= false

The usual knob to fiddle with is random_page_cost. If your database fits
mostly in memory you may want to turn it down from the default of 4 to
something closer to 1. Perhaps 2 or even 1.5 or so. But don't do it based on a
single query under testing conditions, use a wide variety of queries under
production conditions.

QUERY PLAN
--------------------------------------------------------------------------------------------------
Limit (cost=4430.53..50173.70 rows=1000 width=47) (actual time=21832.092..43771.536 rows=228 loops=1)
-> Hash Join (cost=4430.53..260866.77 rows=5606 width=47) (actual time=21832.088..43770.927 rows=228 loops=1)

The difference between the predicted and actual rows is suspicious. let's look
lower down to see where it comes from.

Hash Cond: ((trz.number)::text = (drv.number)::text)
-> Seq Scan on zone trz (cost=0.00..233254.27 rows=6148222 width=39) (actual time=22.807..31891.591 rows=6181910 loops=1)
Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL))

This part looks ok 615k versus 618k is pretty good.

-> Hash (cost=4414.39..4414.39 rows=1291 width=24) (actual time=171.911..171.911 rows=12591 loops=1)

Ah, this is off by an order of magnitude, that's bad.

-> Bitmap Heap Scan on drv (cost=30.44..4414.39 rows=1291 width=24) (actual time=62.980..142.594 rows=12591 loops=1)
Recheck Cond: ((code)::text = 'NRN15'::text)
-> Bitmap Index Scan on idx_drv (cost=0.00..30.11 rows=1291 width=0) (actual time=62.199..62.199 rows=12649 loops=1)
Index Cond: ((code)::text = 'NRN15'::text)

So you might want to increase the statistics target for the "code" column.

Incidentally the way this is written makes me wonder what data type "code" is
defined as.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#3Ow Mun Heng
Ow.Mun.Heng@wdc.com
In reply to: Bruce Momjian (#2)
Re: Instances where enable_seqscan = false is good

On Mon, 2007-09-03 at 11:31 +0100, Gregory Stark wrote:

"Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes:

How can I persuade PG to use the index w/o resorting to setting seqscan
= false

The usual knob to fiddle with is random_page_cost. If your database fits
mostly in memory you may want to turn it down from the default of 4 to
something closer to 1.

I tried down to 0.4 before it resorted to using the index. The DB
shouldn't fit into memory (I think) that table alone has ~8million rows
at ~1.5G size

-> Hash (cost=4414.39..4414.39 rows=1291 width=24) (actual time=171.911..171.911 rows=12591 loops=1)

Ah, this is off by an order of magnitude, that's bad.

having said so, still don't understand why..

-> Bitmap Heap Scan on drv (cost=30.44..4414.39 rows=1291 width=24) (actual time=62.980..142.594 rows=12591 loops=1)
Recheck Cond: ((code)::text = 'NRN15'::text)
-> Bitmap Index Scan on idx_drv (cost=0.00..30.11 rows=1291 width=0) (actual time=62.199..62.199 rows=12649 loops=1)
Index Cond: ((code)::text = 'NRN15'::text)

So you might want to increase the statistics target for the "code" column.

Err.. how come? (newbie) it's scanning the index there. What's bad is
that it's using Seq_scans on the "zone" table.

Incidentally the way this is written makes me wonder what data type "code" is
defined as.

code is defined as varchar(5) data type. I'm changing all of the normal
char(n) to varchar(n) columns..

BTW, thanks for helping. Not using seq scans does really make a huge
difference as you can clearly see from the timing.

Total runtime: 43772.045 ms
Total runtime: 553.964 ms

#4Ow Mun Heng
Ow.Mun.Heng@wdc.com
In reply to: Ow Mun Heng (#3)
Re: Instances where enable_seqscan = false is good

On Tue, 2007-09-04 at 10:06 +0800, Ow Mun Heng wrote:

On Mon, 2007-09-03 at 11:31 +0100, Gregory Stark wrote:

"Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes:

-> Bitmap Heap Scan on drv (cost=30.44..4414.39 rows=1291 width=24) (actual time=62.980..142.594 rows=12591 loops=1)
Recheck Cond: ((code)::text = 'NRN15'::text)
-> Bitmap Index Scan on idx_drv (cost=0.00..30.11 rows=1291 width=0) (actual time=62.199..62.199 rows=12649 loops=1)
Index Cond: ((code)::text = 'NRN15'::text)

So you might want to increase the statistics target for the "code" column.

Err.. how come? (newbie) it's scanning the index there. What's bad is
that it's using Seq_scans on the "zone" table.

Code is now increased to Stat level of 100

----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=25113.04..30733.57 rows=1000 width=47) (actual time=19666.832..39961.032 rows=228 loops=1)
-> Hash Join (cost=25113.04..324620.08 rows=53288 width=47) (actual time=19666.826..39960.437 rows=228 loops=1)
Hash Cond: ((trz.number)::text = (drv.number)::text)
-> Seq Scan on zone trz (cost=0.00..234363.75 rows=6394431 width=39) (actual time=17.635..29164.929 rows=6222984 loops=1)
Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL))
-> Hash (cost=24965.72..24965.72 rows=11785 width=24) (actual time=215.851..215.851 rows=12591 loops=1)
-> Bitmap Heap Scan on drv (cost=243.76..24965.72 rows=11785 width=24) (actual time=50.910..188.894 rows=12591 loops=1)
Recheck Cond: ((code)::text = 'NRN15'::text)
-> Bitmap Index Scan on idx_drv (cost=0.00..240.82 rows=11785 width=0) (actual time=49.180..49.180 rows=12591 loops=1)
Index Cond: ((code)::text = 'NRN15'::text)
Total runtime: 39961.703 ms

Does seem to be slightly better (from 43772ms)

trz.number stat level increased to 100 & code to 100

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=25113.04..30733.57 rows=1000 width=47) (actual time=22152.398..42430.820 rows=228 loops=1)
-> Hash Join (cost=25113.04..324620.08 rows=53288 width=47) (actual time=22152.392..42430.212 rows=228 loops=1)
Hash Cond: ((trz.number)::text = (drv.number)::text)
-> Seq Scan on zone trz (cost=0.00..234363.75 rows=6394431 width=39) (actual time=11.840..28808.222 rows=6222984 loops=1)
Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL))
-> Hash (cost=24965.72..24965.72 rows=11785 width=24) (actual time=2646.652..2646.652 rows=12591 loops=1)
-> Bitmap Heap Scan on drv (cost=243.76..24965.72 rows=11785 width=24) (actual time=50.628..2600.132 rows=12591 loops=1)
Recheck Cond: ((code)::text = 'NRN15'::text)
-> Bitmap Index Scan on idx_drvl (cost=0.00..240.82 rows=11785 width=0) (actual time=38.436..38.436 rows=12591 loops=1)
Index Cond: ((code)::text = 'NRN15'::text)
Total runtime: 42431.358 ms

hmm..not much difference..

What else can be done?? Many Thanks..

#5Bruce Momjian
bruce@momjian.us
In reply to: Ow Mun Heng (#3)
Re: Instances where enable_seqscan = false is good

"Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes:

On Mon, 2007-09-03 at 11:31 +0100, Gregory Stark wrote:

"Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes:

How can I persuade PG to use the index w/o resorting to setting seqscan
= false

The usual knob to fiddle with is random_page_cost. If your database fits
mostly in memory you may want to turn it down from the default of 4 to
something closer to 1.

I tried down to 0.4 before it resorted to using the index. The DB
shouldn't fit into memory (I think) that table alone has ~8million rows
at ~1.5G size

Values under 1 are nonsensical. Basically being as low as 1 means you're
telling the database that a random access i/o takes the same amount of time as
a sequential i/o. (Actually we have sequential_page_cost now so I guess
instead of "1" I should say "the same as sequential_page_cost" but I'm
assuming you haven't modified sequential_page_cost from the default of 1 have
you?)

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#6Ow Mun Heng
Ow.Mun.Heng@wdc.com
In reply to: Bruce Momjian (#5)
Re: Instances where enable_seqscan = false is good

On Tue, 2007-09-04 at 05:15 +0100, Gregory Stark wrote:

"Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes:

On Mon, 2007-09-03 at 11:31 +0100, Gregory Stark wrote:

"Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes:

How can I persuade PG to use the index w/o resorting to setting seqscan
= false

The usual knob to fiddle with is random_page_cost. If your database fits
mostly in memory you may want to turn it down from the default of 4 to
something closer to 1.

I tried down to 0.4 before it resorted to using the index. The DB
shouldn't fit into memory (I think) that table alone has ~8million rows
at ~1.5G size

Values under 1 are nonsensical.

exactly, might as well use enable_seqscan=false. So it's still default
at 4

Basically being as low as 1 means you're
telling the database that a random access i/o takes the same amount of time as
a sequential i/o. (Actually we have sequential_page_cost now so I guess
instead of "1" I should say "the same as sequential_page_cost" but I'm
assuming you haven't modified sequential_page_cost from the default of 1 have
you?)

Have not changed anything in that area. Question is.. Do I need to? or
should I try out something just to see how it is?
(any) Recommendations would be good.

#7Bruce Momjian
bruce@momjian.us
In reply to: Ow Mun Heng (#6)
Re: Instances where enable_seqscan = false is good

"Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes:

Have not changed anything in that area. Question is.. Do I need to? or
should I try out something just to see how it is?
(any) Recommendations would be good.

Sorry, I don't have all the original plans. Can you post the explain analyze
with and without enable_seqscan now that the stats are giving good
predictions?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com