Understanding sequential versus index scans.

Started by Robert Jamesover 16 years ago10 messagesgeneral
Jump to latest
#1Robert James
srobertjames@gmail.com

Hi. I notice that when I do a WHERE x, Postgres uses an index, and when I
do WHERE y, it does so as well, but when I do WHERE x OR y, it doesn't. Why
is this so? And how can I shut this off?
select * from dict
where
word in (select substr('moon', 0, generate_series(3,length('moon')))) --
this is my X above
OR word like 'moon%' -- this is my Y above

Seq Scan on dict (cost=0.02..2775.66 rows=30422 width=24) (actual
time=16.635..28.580 rows=8 loops=1)
Filter: ((hashed subplan) OR ((word)::text ~~ 'moon%'::text))
SubPlan
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.014..0.019 rows=2
loops=1)
Total runtime: 28.658 ms
(Using just X or Y alone uses the index, and completes in 0.150 ms)
Is this a bug?

#2Robert James
srobertjames@gmail.com
In reply to: Robert James (#1)
Re: Understanding sequential versus index scans.

PS Running "PostgreSQL 8.2.1 on i686-pc-mingw32, compiled by GCC gcc.exe
(GCC) 3.4.2 (mingw-special)"

On Sun, Jul 19, 2009 at 6:58 PM, Robert James <srobertjames@gmail.com>wrote:

Show quoted text

Hi. I notice that when I do a WHERE x, Postgres uses an index, and when I
do WHERE y, it does so as well, but when I do WHERE x OR y, it doesn't. Why
is this so? And how can I shut this off?
select * from dict
where
word in (select substr('moon', 0, generate_series(3,length('moon')))) --
this is my X above
OR word like 'moon%' -- this is my Y above

Seq Scan on dict (cost=0.02..2775.66 rows=30422 width=24) (actual
time=16.635..28.580 rows=8 loops=1)
Filter: ((hashed subplan) OR ((word)::text ~~ 'moon%'::text))
SubPlan
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.014..0.019
rows=2 loops=1)
Total runtime: 28.658 ms
(Using just X or Y alone uses the index, and completes in 0.150 ms)
Is this a bug?

#3Bruce Momjian
bruce@momjian.us
In reply to: Robert James (#2)
Re: Understanding sequential versus index scans.

On Sun, Jul 19, 2009 at 11:59 PM, Robert James<srobertjames@gmail.com> wrote:

PS Running "PostgreSQL 8.2.1 on i686-pc-mingw32, compiled by GCC gcc.exe
(GCC) 3.4.2 (mingw-special)"

On Sun, Jul 19, 2009 at 6:58 PM, Robert James <srobertjames@gmail.com>
wrote:

Hi.  I notice that when I do a WHERE x, Postgres uses an index, and when I
do WHERE y, it does so as well, but when I do WHERE x OR y, it doesn't.  Why
is this so? And how can I shut this off?
select * from dict
where
 word in (select substr('moon', 0, generate_series(3,length('moon')))) --
this is my X above
 OR word like 'moon%' -- this is my Y above
Seq Scan on dict (cost=0.02..2775.66 rows=30422 width=24) (actual
time=16.635..28.580 rows=8 loops=1)
 Filter: ((hashed subplan) OR ((word)::text ~~ 'moon%'::text))
 SubPlan
 -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.014..0.019
rows=2 loops=1)
Total runtime: 28.658 ms
(Using just X or Y alone uses the index, and completes in 0.150 ms)
Is this a bug?

Well there are known bugs in 8.2.1 -- that's why the current 8.2
release is 8.2.13.

The next step here is to set enable_seqscan=off and run explain
analyze again. You may have to adjust some costs to sync the estimated
cost with actual run-time.

--
greg
http://mit.edu/~gsstark/resume.pdf

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert James (#1)
Re: Understanding sequential versus index scans.

Robert James <srobertjames@gmail.com> writes:

Hi. I notice that when I do a WHERE x, Postgres uses an index, and when I
do WHERE y, it does so as well, but when I do WHERE x OR y, it
doesn't.

It can use indexes for OR conditions, but not for arbitrary OR
conditions...

select * from dict
where
word in (select substr('moon', 0, generate_series(3,length('moon')))) --
this is my X above
OR word like 'moon%' -- this is my Y above

... and that one is pretty arbitrary. You might have some luck with
using a UNION instead, viz

select * from dict where X
union all
select * from dict where Y

regards, tom lane

#5John R Pierce
pierce@hogranch.com
In reply to: Robert James (#1)
Re: Understanding sequential versus index scans.

Robert James wrote:

Hi. I notice that when I do a WHERE x, Postgres uses an index, and
when I do WHERE y, it does so as well, but when I do WHERE x OR y, it
doesn't. Why is this so? And how can I shut this off?

maybe its because you have no index on (X OR Y) ? or maybe because the
analyzer thinks that X or Y includes enough rows that a sequential scan
is more effective ?

#6Robert James
srobertjames@gmail.com
In reply to: Tom Lane (#4)
Re: Understanding sequential versus index scans.

UNION was better, but still 5 times as slow as either query done
individually.
set enable_seqscan=off didn't help at all - it was totally ignored
Is there anything else I can do?

On Sun, Jul 19, 2009 at 7:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Robert James <srobertjames@gmail.com> writes:

Hi. I notice that when I do a WHERE x, Postgres uses an index, and when

I

do WHERE y, it does so as well, but when I do WHERE x OR y, it
doesn't.

It can use indexes for OR conditions, but not for arbitrary OR
conditions...

select * from dict
where
word in (select substr('moon', 0, generate_series(3,length('moon')))) --
this is my X above
OR word like 'moon%' -- this is my Y above

... and that one is pretty arbitrary. You might have some luck with
using a UNION instead, viz

select * from dict where X
union all
select * from dict where Y

regards, tom lane

#7Robert James
srobertjames@gmail.com
In reply to: Robert James (#6)
Re: Understanding sequential versus index scans.

Is there anyway to tell Postgres "Run these two queries, and union their
results, but don't change the plan as to a UNION - just run them
separately"?
Something seems funny to me that running a UNION should be twice as slow as
running the two queries one after the other.

On Sun, Jul 19, 2009 at 8:10 PM, Robert James <srobertjames@gmail.com>wrote:

Show quoted text

UNION was better, but still 5 times as slow as either query done
individually.
set enable_seqscan=off didn't help at all - it was totally ignored
Is there anything else I can do?

On Sun, Jul 19, 2009 at 7:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert James <srobertjames@gmail.com> writes:

Hi. I notice that when I do a WHERE x, Postgres uses an index, and when

I

do WHERE y, it does so as well, but when I do WHERE x OR y, it
doesn't.

It can use indexes for OR conditions, but not for arbitrary OR
conditions...

select * from dict
where
word in (select substr('moon', 0, generate_series(3,length('moon'))))

--

this is my X above
OR word like 'moon%' -- this is my Y above

... and that one is pretty arbitrary. You might have some luck with
using a UNION instead, viz

select * from dict where X
union all
select * from dict where Y

regards, tom lane

#8Scott Marlowe
scott.marlowe@gmail.com
In reply to: Robert James (#6)
Re: Understanding sequential versus index scans.

On Sun, Jul 19, 2009 at 6:10 PM, Robert James<srobertjames@gmail.com> wrote:

UNION was better, but still 5 times as slow as either query done
individually.
set enable_seqscan=off didn't help at all - it was totally ignored
Is there anything else I can do?

Did you try union, or union all?

#9Robert James
srobertjames@gmail.com
In reply to: Scott Marlowe (#8)
Re: Understanding sequential versus index scans.

Yes, I had done UNION. UNION ALL achives the expected plan and speed! Thank
you!
BTW, this is interesting, because there are only about 5 or 6 rows max
returned from both queries - but I guess the planner expects more and hence
changes the plan to remove duplicates.

On Sun, Jul 19, 2009 at 9:05 PM, Scott Marlowe <scott.marlowe@gmail.com>wrote:

Show quoted text

On Sun, Jul 19, 2009 at 6:10 PM, Robert James<srobertjames@gmail.com>
wrote:

UNION was better, but still 5 times as slow as either query done
individually.
set enable_seqscan=off didn't help at all - it was totally ignored
Is there anything else I can do?

Did you try union, or union all?

#10Bruce Momjian
bruce@momjian.us
In reply to: Robert James (#9)
Re: Understanding sequential versus index scans.

On Mon, Jul 20, 2009 at 2:22 PM, Robert James<srobertjames@gmail.com> wrote:

BTW, this is interesting, because there are only about 5 or 6 rows max
returned from both queries - but I guess the planner expects more and hence
changes the plan to remove duplicates.

If you sent the plans for the various attempts we might be able to
explain what's going on.

--
greg
http://mit.edu/~gsstark/resume.pdf